AWK facts for kids

This page is about the programming language. For other uses, see AWK (disambiguation).

Quick facts for kids
AWK
Major implementations


Paradigm	Scripting, procedural, data-driven
Designed by	Alfred Aho, Peter Weinberger, and Brian Kernighan
First appeared	1977; 48 years ago (1977)

Stable release	IEEE Std 1003.1-2008 (POSIX) / 1985

Typing discipline	none; can handle strings, integers and floating-point numbers; regular expressions
OS	Cross-platform
awk, GNU Awk, mawk, nawk, MKS AWK, Thompson AWK (compiler), Awka (compiler)
Dialects
old awk oawk 1977, new awk nawk 1985, GNU Awk gawk
Influenced by
C, sed, SNOBOL
Influenced
Tcl, AMPL, Perl, Korn Shell (ksh93, dtksh, tksh), Lua

AWK (pronounced "awk") is a special computer language. It's made for working with text, like words and sentences. People often use it to pull out specific information from text or to make reports. Think of it like a smart filter for text. AWK is a standard tool on most Unix-like operating systems, which are types of computer operating systems.

The AWK language is a scripting language that works with text data. It can read text from files or from a "pipeline" (where one program sends its output to another program). AWK helps you find, change, or organize text, like making a neat report from a messy list. It uses strings (groups of characters), associative arrays (lists where you can use names instead of numbers to find things), and regular expressions (special patterns to find text). Even though AWK was designed for quick, short tasks, people have used it to write big, complex programs too!

AWK was created in the 1970s at Bell Labs, a famous research company. Its name comes from the first letters of its creators' last names: Alfred Aho, Peter Weinberger, and Brian Kernighan. The name sounds like the bird called an auk, which is shown on the cover of The AWK Programming Language book. When you see awk written in all lowercase letters, it usually means the computer program that runs AWK scripts.

How AWK Started
How AWK Programs Work
Common AWK Commands
Simple AWK Examples
AWK Versions and Implementations
See also

How AWK Started

AWK was first made in 1977 by Alfred Aho, Peter J. Weinberger, and Brian Kernighan. They wanted a tool that could easily work with both numbers and text. AWK was also inspired by another programming language that helped find patterns in data.

AWK was one of the first tools in Version 7 Unix that could do calculations within a pipeline (a way to connect programs). It became a very important part of Unix systems. Later, in 1985–88, AWK was updated and improved. This led to a popular version called GNU AWK, or gawk, which is often found on Linux distributions. Brian Kernighan also released his own updated version called nawk (New AWK) in the 1990s.

AWK came after a similar tool called sed (from 1974). Both were made for text processing and could do quick, one-line programs. AWK's ability to handle text patterns and its simple style inspired the creation of the Perl language in 1987. In the 1990s, Perl became very popular for text processing, often used alongside or instead of AWK.

How AWK Programs Work

An AWK program is like a set of instructions. It looks at your input text line by line. For each line, it checks if certain conditions are true. If a condition is true, AWK performs an action.

Here's what an AWK program generally looks like:

condition { action }
condition { action }
...

The condition is usually a rule, and the action is what AWK should do. By default, AWK reads your input one line at a time. It checks each line against your conditions. If a line matches a condition, the action for that condition runs. You can skip either the condition or the action. If there's no condition, the action runs for every line. If there's no action, AWK simply prints the line.

You can also use special conditions like `BEGIN` or `END`.

`BEGIN`: The action here runs before AWK starts reading any lines.
`END`: The action here runs after AWK has read all the lines.

AWK also uses regular expressions, which are like special codes to find patterns in text. For example, `/word/` would find any line containing "word".

Common AWK Commands

AWK commands are the instructions you put in the action part of your program. They can include calling functions, setting variables, or doing math. AWK has many built-in commands, and different versions of AWK might have even more.

The print Command

The `print` command is used to show text. After printing, it usually adds a new line, like pressing Enter.

`print`: This command shows the entire current line of text.
`print $1`: This shows only the first "field" (or part) of the current line.
`print $1, $3`: This shows the first and third fields, separated by a space.

In AWK, `$0` means the entire line. So, `print` and `print $0` do the same thing. You can also print the results of math or functions:

/example_pattern/ {
    print 3+2         # This will print 5
    print sin(1)      # This will print the sine of 1
}

You can even send the output to a file or another program:

/pattern/ {
    print "Hello" > "output.txt"  # Sends "Hello" to a file named output.txt
    print "Date" | "date"         # Sends "Date" to the 'date' command
}

Special Built-in Variables

AWK has special variables that hold useful information:

`NR`: Stands for "Number of Records." This counts how many lines AWK has read so far from all files.
`FNR`: Stands for "File Number of Records." This counts how many lines AWK has read from the current file. It resets for each new file.
`NF`: Stands for "Number of Fields." This tells you how many fields (parts) are in the current line. So, `$NF` means the very last field on the line.
`FILENAME`: This variable holds the name of the file AWK is currently reading from.
`FS`: Stands for "Field Separator." This is the character AWK uses to split a line into fields (like spaces or commas). By default, it's any space or tab.
`RS`: Stands for "Record Separator." This is the character that separates lines (usually a new line).
`OFS`: Stands for "Output Field Separator." This is the character AWK puts between fields when it prints them (usually a space).
`ORS`: Stands for "Output Record Separator." This is the character AWK puts after each printed line (usually a new line).

Variables and Rules

You can create your own variables in AWK using letters, numbers, and underscores.

Math: `+`, `-`, `*`, `/` work as usual.
Joining text: To join two pieces of text, just put them next to each other. For example, `print "Hello" "World"` prints "HelloWorld".
Comments: You can add notes to your code using `#`. Anything after `#` on that line is ignored.

Your Own Functions

You can also write your own functions in AWK, just like in other programming languages.

function add_three (number) {
    return number + 3
}

You can then use this function in your program:

(pattern) {
   print add_three(10)     # This will print 13
}

Functions can also have their own temporary variables that only exist inside that function.

Simple AWK Examples

Hello World

This is a classic first program that just prints "Hello, world!":

BEGIN {
        print "Hello, world!"
        exit
}

The `BEGIN` block runs before any input is read, and `exit` stops the program.

Find Long Lines

This program prints any line that has more than 80 characters:

length($0) > 80

Remember, if there's no action, AWK prints the whole line by default.

Count Words

This program counts the number of lines, words, and characters in your input, similar to the `wc` command in Unix:

{
    words += NF
    chars += length + 1 # add 1 for the newline character
}
END { print NR, words, chars }

The first part runs for every line. `words += NF` adds the number of fields (words) in the current line to the `words` total. `chars += length + 1` adds the length of the line plus one for the hidden newline character. The `END` block then prints the total lines (`NR`), words, and characters.

Sum the Last Number on Each Line

If you have a list of numbers, this program adds up the last number on each line:

{ s += $NF }
END { print s + 0 }

`$NF` means the value of the last field on the current line. So, `s += $NF` adds that last number to a running total `s`. The `END` block prints the final total. Adding `+ 0` makes sure that if there were no numbers, it prints `0` instead of nothing.

Print Specific Lines

This example prints lines 1, 2, 3, then skips line 4, then prints lines 5, 6, 7, and so on:

NR % 4 == 1, NR % 4 == 3 { printf "%6d  %s\n", NR, $0 }

`NR` is the current line number.
`%` is the modulo operator, which gives you the remainder after division.
`NR % 4 == 1` is true for lines 1, 5, 9, etc.
`NR % 4 == 3` is true for lines 3, 7, 11, etc.

The program starts printing when the first condition (`NR % 4 == 1`) is true and stops when the second condition (`NR % 4 == 3`) is true. The `printf` function prints the line number (padded to 6 spaces) and then the line itself.

For example, if the input is: Rome Florence Milan Naples Turin Venice

The program prints: 1 Rome 2 Florence 3 Milan 5 Turin 6 Venice

Printing Parts of a File

You can use similar ideas to print from the beginning of a file or until the end. For example, to print everything from a line that says "--cut here--" until the end of the file:

 /^--cut here--$/, 0

Count Word Frequencies

This program counts how many times each word appears in your input:

BEGIN {
    FS="[^a-zA-Z]+"
}
{
    for (i=1; i<=NF; i++)
        words[tolower($i)]++
}
END {
    for (i in words)
        print i, words[i]
}

The `BEGIN` block sets the `FS` (Field Separator) to mean "any sequence of non-alphabetic characters." This helps AWK split the text into words.
The main part loops through each word (`$i`) on every line, converts it to lowercase (`tolower($i)`), and adds 1 to its count in a special list called `words`.
The `END` block then goes through the `words` list and prints each word along with how many times it appeared.

AWK Versions and Implementations

AWK was first released in 1977. Later, in 1985, its creators added new features, like the ability to define your own functions. This updated version was sometimes called "new awk" or nawk.

Here are some important versions of AWK:

BWK awk (also known as nawk): This is the version created by Brian Kernighan, one of the original authors. Many systems like Android, FreeBSD, and macOS use this version.
gawk (GNU awk): This is a very popular free version of AWK. It's often included with Linux systems. gawk has extra features, like tools for debugging and for working with international languages.
mawk: This is a very fast version of AWK.
awka: This tool can turn AWK scripts into C code, which makes them run even faster.