TrumanWong

awk

Programming language for text and data processing

Supplementary instructions

awk is a programming language used for text and data processing under linux/unix. Data can come from standard input (stdin), one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/Unix. It is used from the command line, but more often as a script. Awk has many built-in functions, such as arrays, functions, etc. This is the same as the C language. Flexibility is the biggest advantage of awk.

awk command format and options

grammatical form

awk [options] 'script' var=value file(s)
awk [options] -f scriptfile var=value file(s)

Common command options

  • -F fs fs specifies the input delimiter. fs can be a string or a regular expression, such as -F:. The default delimiter is continuous spaces or tabs.
  • -v var=value Assign a user-defined variable and pass the external variable to awk
  • -f scripfile Read awk commands from script files
  • -m[fr] val Sets an intrinsic limit on the val value. The -mf option limits the maximum number of blocks allocated to val; the -mr option limits the maximum number of records. These two functions are extended functions of the Bell Labs version of awk and are not applicable in standard awk.

awk modes and operations

Awk scripts are composed of modes and operations.

model

The mode can be any of the following:

*/Regular expression/: Use an expanded set of wildcard characters.

  • Relational expression: Use operators to perform operations, which can be comparison tests of strings or numbers.
  • Pattern matching expression: use operators ~ (match) and !~ (not match).
  • BEGIN statement block, pattern statement block, END statement block: see the working principle of awk

operate

An operation consists of one or more commands, functions, and expressions, separated by newlines or semicolons, and located within curly brackets. The main parts are:

  • Variable or array assignment
  • Output command
  • Built-in functions
  • Control flow statements

Basic structure of awk script

awk 'BEGIN{ print "start" } pattern{ commands } END{ print "end" }' file

An awk script usually consists of three parts: BEGIN statement block, general statement block that can use pattern matching, and END statement block. These three parts are optional. Any part does not need to appear in the script. Scripts are usually enclosed in single quotes, for example:

awk 'BEGIN{ i=0 } { i++ } END{ print i }' filename

How awk works

awk 'BEGIN{ commands } pattern{ commands } END{ commands }'
  • The first step: execute the statements in the BEGIN{ commands } statement block;

  • Step 2: Read a line from the file or standard input (stdin), and then execute the pattern{ commands } statement block, which scans the file line by line and repeats this process from the first line to the last line until the file is completely read. Completed.

  • Step 3: When reading to the end of the input stream, execute the END{ commands } statement block.

    BEGIN statement block is executed before awk starts reading rows from the input stream. This is an optional statement block. Statements such as variable initialization and printing the header of the output table can usually be written in BEGIN. statement block.

    END statement block is executed after awk has read all the lines** from the input stream. For example, information summary such as printing the analysis results of all lines is completed in the END statement block, which is also An optional block of statements.

    The common commands in the pattern statement block are the most important part, and they are also optional. If the pattern statement block is not provided, { print } will be executed by default, that is, each line read will be printed. This statement block will be executed for each line read by awk.

    Example

echo -e "A line 1\nA line 2" | awk 'BEGIN{ print "Start" } { print } END{ print "End" }'
Start
A line 1
A line 2
End

When print is used without arguments, it prints the current line. When print arguments are separated by commas, spaces are used as delimiters when printing. In the print statement block of awk, double quotes are used as splicing characters, for example:

echo | awk '{ var1="v1"; var2="v2"; var3="v3"; print var1,var2,var3; }'
v1 v2 v3

Use double quotes to splice:

echo | awk '{ var1="v1"; var2="v2"; var3="v3"; print var1"="var2"="var3; }'
v1=v2=v3

{ } is similar to a loop body, which iterates each line in the file. Usually variable initialization statements (such as: i=0) and statements for printing the file header are placed in the BEGIN statement block, and statements such as the printed results are placed in END statement block.

awk built-in variables (predefined variables)

Note: [A][N][P][G] represents the first tool that supports variables, [A]=awk, [N]=nawk, [P]=POSIXawk, [G]=gawk

  **$n** The nth field of the current record. For example, if n is 1, it means the first field, and if n is 2, it means the second field.
  **$0** This variable contains the text content of the current line during execution.
[N] **ARGC** Number of command line arguments.
[G] **ARGIND** The position of the current file in the command line (counting from 0).
[N] **ARGV** Array containing command line arguments.
[G] **CONVFMT** Number conversion format (default is %.6g).
[P] **ENVIRON** Associative array of environment variables.
[N] **ERRNO** Description of the last system error.
[G] **FIELDWIDTHS** List of field widths (space separated).
[A] **FILENAME** The name of the current input file.
[P] **FNR** Same as NR, but relative to the current file.
[A] **FS** Field separator (default is any space).
[G] **IGNORECASE** If true, perform a case-ignoring match.
[A] **NF** represents the number of fields, which corresponds to the current number of fields during execution.
[A] **NR** represents the number of records, corresponding to the current line number during execution.
[A] **OFMT** Number output format (default value is %.6g).
[A] **OFS** Output field separator (default is a space).
[A] **ORS** Output record delimiter (default is a newline).
[A] **RS** Record separator (default is a newline character).
[N] **RSART** The first position of the string matched by the match function.
[N] **RLENGTH** The length of the string matched by the match function.
[N] **SUBSEP** Array subscript delimiter (default value is 34).

escape sequence

\\ \self
\$ escape$
\t tab character
\b backspace character
\r carriage return character
\n newline character
\c cancel line breaks

Example

echo -e "line1 f2 f3\nline2 f4 f5\nline3 f6 f7" | awk '{print "Line No:"NR", No of fields:"NF, "$0="$0, "$1="$1, "$2 ="$2, "$3="$3}'
Line No:1, No of fields:3 $0=line1 f2 f3 $1=line1 $2=f2 $3=f3
Line No:2, No of fields:3 $0=line2 f4 f5 $1=line2 $2=f4 $3=f5
Line No:3, No of fields:3 $0=line3 f6 f7 $1=line3 $2=f6 $3=f7

Use print $NF to print the last field in a line, use $(NF-1) to print the second to last field, and so on:

echo -e "line1 f2 f3\n line2 f4 f5" | awk '{print $NF}'
f3
f5
echo -e "line1 f2 f3\n line2 f4 f5" | awk '{print $(NF-1)}'
f2
f4

Print the second and third fields of each line:

awk '{ print $2,$3 }' filename

Count the number of lines in the file:

awk 'END{ print NR }' filename

The above command only uses the END statement block. When reading each line, awk will update NR to the corresponding line number. When it reaches the last line, the value of NR is the line number of the last line, so the NR in the END statement block is The number of lines in the file.

An example of accumulating the first field value in each row:

seq 5 | awk 'BEGIN{ sum=0; print "Sum:" } { print $1"+"; sum+=$1 } END{ print "equal to"; print sum }'
sum:
1+
2+
3+
4+
5+
equal
15

Pass the external variable value to awk

With the -v option it is possible to pass external values (not from stdin) to awk:

VAR=10000
echo | awk -v VARIABLE=$VAR '{ print VARIABLE }'

Another way to pass external variables:

var1="aaa"
var2="bbb"
echo | awk '{ print v1,v2 }' v1=$var1 v2=$var2

Use when input comes from a file:

awk '{ print v1,v2 }' v1=$var1 v2=$var2 filename

In the above method, the variables are separated by spaces as the command line parameters of awk and follow the BEGIN, {} and END statement blocks.

Find process pid

netstat -antup | grep 7770 | awk '{ print $NF NR}' | awk '{ print $1}'

awk operation and judgment

As one of the characteristics of a programming language, awk supports a variety of operations, which are basically the same as those provided by the C language. Awk also provides a series of built-in operation functions (such as log, sqr, cos, sin, etc.) and some functions for operating (operations) on strings (such as length, substr, etc.). References to these functions greatly improve the computing capabilities of awk. As part of the conditional transfer instructions, relational judgment is a function that every programming language has, and awk is no exception. Awk allows a variety of tests. As a style matching, it also provides pattern matching expressions ~ (matching) and !~ (no match). As an extension to testing, awk also supports logical operators.

Arithmetic operators

Operator Description
+ - Add, subtract
* / & Multiplication, division and remainder
+ - ! Unary addition, subtraction and logical negation
^ *** Exponentiation
++ -- increase or decrease, as a prefix or suffix

example:

awk 'BEGIN{a="b";print a++,++a;}'
0 2

Note: For all operations used as arithmetic operators, the operands are automatically converted to numerical values, and all non-numeric values become 0.

Assignment operator

Operator Description
= += -= *= /= %= ^= **= Assignment statement

example:

a+=5; Equivalent to: a=a+5; Other similar types

Logical Operators

Operator Description
|| Logical OR
&& Logical AND

example:

awk 'BEGIN{a=1;b=2;print (a>5 && b<=2),(a>5 || b<=2);}'
0 1

Regular operators

Operator Description
~ !~ Matching regular expressions and not matching regular expressions
^ Beginning of line
$ end of line
. Any single character except a newline character
* Zero or more leading characters
.* all characters
[] Any character in the character group
[^] Negates each character in the character group (does not match every character in the character group)
^[^] Lines starting with characters other than those in the character group
[a-z] lowercase letters
[A-Z] Uppercase letters
[a-Z] Lowercase and uppercase letters
[0-9] Number
\< The prefix words are generally separated by spaces or special characters, and continuous strings are regarded as words.
\> word ending

Regular expressions need to be surrounded by /regular/

example:

awk 'BEGIN{a="100testa";if(a ~ /^100*/){print "ok";}}'
OK

Relational operators

Operator Description
< <= > >= != == Relational operators

example:

awk 'BEGIN{a=11;if(a >= 9){print "ok";}}'
OK

Note: > < can be used as a string comparison or a numerical comparison. The key is that if the operand is a string, it will be converted into a string comparison. Only when both are numbers are converted into numerical comparisons. String comparison: Compare according to ASCII code order.

Other operators

Operator Description
$ Field reference
Spaces String concatenation
?: C conditional expression
in Whether a key value exists in the array

example:

awk 'BEGIN{a="b";print a=="b"?"ok":"err";}'
OK
awk 'BEGIN{a="b";arr[0]="b";arr[1]="c";print (a in arr);}'
0
awk 'BEGIN{a="b";arr[0]="b";arr["b"]="c";print (a in arr);}'
1

Operation level priority table

!The higher the level, the higher the priority. The higher the level, the priority

awk advanced input and output

Read the next record

The use of next statement in awk: match line by line in the loop. If next is encountered, the current line will be skipped and the following statement will be ignored directly. And match the next line. The next statement is generally used to merge multiple rows:

cat text.txt
a
b
c
d
e

awk 'NR%2==1{next}{print NR,$0;}' text.txt
2b
4d

When the record line number is divided by 2 and the remainder is 1, the current line is skipped. The following print NR,$0 will not be executed either. Starting from the next line, the program begins to determine the value of NR%2. At this time, the record line number is :2, and the following statement block will be executed: 'print NR,$0'

The analysis found that the lines containing "web" need to be skipped, and then the content needs to be merged into one line with the following lines:

cat text.txt
web01[192.168.2.100]
httpd ok
tomcat ok
sendmail ok
web02[192.168.2.101]
httpd ok
postfix ok
web03[192.168.2.102]
mysqld OK
httpd ok
0
awk '/^web/{T=$0;next;}{print T":"t,$0;}' text.txt
web01[192.168.2.100]: httpd ok
web01[192.168.2.100]: tomcat ok
web01[192.168.2.100]: sendmail ok
web02[192.168.2.101]: httpd ok
web02[192.168.2.101]: postfix ok
web03[192.168.2.102]: mysqld ok
web03[192.168.2.102]: httpd ok

Simply read a record

awk getline usage: Output redirection requires the getline function. getline gets input from standard input, a pipe, or another input file other than the file currently being processed. It is responsible for getting the content of the next line from the input and assigning values to built-in variables such as NF, NR and FNR. The getline function returns 1 if a record is obtained, 0 if the end of the file is reached, and -1 if an error occurs, such as a failure to open the file.

Getline syntax: getline var, the variable var contains the content of a specific line.

Overall, the usage instructions for awk getline are:

  • **When there are no redirection characters | or < around it: ** getline acts on the current file, reads the first line of the current file and gives it to the following variable var or $0 (no variable ), it should be noted that since awk has read a line before processing getline, the return results obtained by getline are alternate lines.
  • **When there are redirection characters | or < on the left and right: ** getline acts on the directed input file. Since the file has just been opened, it has not been read into a line by awk, but is read by getline. Then getline returns the first line of the file, not every other line.

Example:

Execute the Linux date command and pipe the output to getline, then assign the output to the custom variable out and print it:

awk 'BEGIN{ "date" | getline out; print out }' test

Execute the shell's date command and output it to getline through the pipe. Then getline reads from the pipe and assigns the input to out. The split function converts the variable out into the array mon, and then prints the second element of the array mon:

awk 'BEGIN{ "date" | getline out; split(out,mon); print mon[2] }' test

The output of the command ls is passed to getline as input, and the loop causes getline to read a line from the output of ls and print it to the screen. There is no input file here, because the BEGIN block is executed before opening the input file, so the input file can be ignored.

awk 'BEGIN{ while( "ls" | getline) print }'

Close file

Awk allows you to close an input or output file in a program by using awk's close statement.

close("filename")

filename can be a file opened by getline, or it can be stdin, a variable containing the filename, or the exact command used by getline. Or an output file, which can be stdout, a variable containing the filename or the exact command using a pipe.

Output to a file

Awk allows you to output results to a file in the following ways:

echo | awk '{printf("hello word!n") > "datafile"}'
# or
echo | awk '{printf("hello word!n") >> "datafile"}'

Set field delimiter

The default field delimiter is a space, you can use -F "delimiter" to explicitly specify a delimiter:

awk -F: '{ print $NF }' /etc/passwd
# or
awk 'BEGIN{ FS=":" } { print $NF }' /etc/passwd

In the BEGIN statement block, you can use OFS="delimiter" to set the delimiter of the output field.

Flow control statement

In the while, do-while and for statements of Linux awk, break and continue statements are allowed to be used to control the process flow, and statements such as exit are also allowed to be used to exit. break interrupts the currently executing loop and jumps outside the loop to execute the next statement. if is a process selection usage. In awk, flow control statements, grammatical structures, and c language types. With these statements, many shell programs can actually be handed over to awk, and the performance is very fast. The following is the usage of each statement.

Conditional judgment statement

if(expression)
   Statement 1
else
   Statement 2

Statement 1 in the format can be multiple statements. In order to facilitate judgment and reading, it is best to enclose multiple statements with {}. The awk branch structure allows nesting, and its format is:

if(expression)
   {statement 1}
else if(expression)
   {Statement 2}
else
   {Statement 3}

Example:

awk 'BEGIN{
test=100;
if(test>90){
   print "very good";
   }
   else if(test>60){
     print "good";
   }
   else{
     print "no pass";
   }
}'

very good

Each command statement can be ended with ; semicolon.

loop statement

# while statement

while(expression)
   {statement}

Example:

awk 'BEGIN{
test=100;
total=0;
while(i<=test){
   total+=i;
   i++;
}
print total;
}'
5050

# for loop

There are two formats for for loops:

Format 1:

for(variable in array)
   {statement}

Example:

awk 'BEGIN{
for(k in ENVIRON){
   print k"="ENVIRON[k];
}

}'
TERM=linux
G_BROKEN_FILENAMES=1
SHLVL=1
pwd=/root/text
...
logname=root
HOME=/root
SSH_CLIENT=192.168.1.21 53087 22

Note: ENVIRON is an awk constant and a sub-typical array.

Format 2:

for (variable; condition; expression)
   {statement}

Example:

awk 'BEGIN{
total=0;
for(i=0;i<=100;i++){
   total+=i;
}
print total;
}'
5050

# do loop

do
{statement} while(condition)

example:

awk 'BEGIN{
total=0;
i=0;
do {total+=i;i++;} while(i<=100)
   print total;
}'
5050

Other statements

  • break Causes the program loop to exit when the break statement is used in a while or for statement.
  • continue When the continue statement is used in a while or for statement, causes the program loop to move to the next iteration.
  • next causes the next input line to be read and returns to the top of the script. This avoids performing additional operations on the current input line.
  • The exit statement causes the main input loop to exit and transfers control to END, if END exists. If no END rule is defined, or an exit statement is applied in END, the execution of the script is terminated.

Array application

Arrays are the soul of awk, and the most indispensable part of text processing is its array processing. Because array indexes (subscripts) can be numbers and strings, arrays in awk are called associative arrays (associative arrays). Arrays in awk do not have to be declared in advance, nor do they have to declare their size. Array elements are initialized with 0 or an empty string, depending on the context.

Definition of array

Numbers are used as array indexes (subscripts):

Array[1]="sun"
Array[2]="kai"

String as array index (subscript):

Array["first"]="www"
Array["last"]="name"
Array["birth"]="1987"

When using print Array[1], it will print out sun; when using print Array[2], it will print out kai; when using print["birth"], it will get 1987.

Read the value of the array

{ for(item in array) {print array[item]}; } #The order of output is random
{ for(i=1;i<=len;i++) {print array[i]}; } #Len is the length of the array

Array related functions

Get the array length:

awk 'BEGIN{info="it is a test";lens=split(info,tA," ");print length(tA),lens;}'
4 4

length returns the length of the string and array, split splits the string into an array, and returns the length of the array obtained by splitting.

awk 'BEGIN{info="it is a test";split(info,tA," ");print asort(tA);}'
4

asort sorts the array and returns the array length.

**Output array content (unordered, ordered output): **

awk 'BEGIN{info="it is a test";split(info,tA," ");for(k in tA){print k,tA[k];}}'
4 tests
1 it
2 is
3 a

for...in output, because the array is an associative array, it is unordered by default. So what you get through for...in is an unordered array. If you need to get an ordered array, you need to get it through subscripting.

awk 'BEGIN{info="it is a test";tlen=split(info,tA," ");for(k=1;k<=tlen;k++){print k,tA[k];}}'
1 it
2 is
3 a
4 tests

Note: Array subscripts start from 1, which is different from C arrays.

Determine the existence of key value and delete key value:

# Wrong judgment method:
awk 'BEGIN{tB["a"]="a1";tB["b"]="b1";if(tB["c"]!="1"){print "no found";};for (k in tB){print k,tB[k];}}'
no found
a a1
b b1
c

A strange problem occurs above. tB["c"] is not defined, but when looping, it is found that the key value already exists and its value is empty. It should be noted here that the awk array is an associative array, as long as its key is referenced through the array , the sequence will be automatically created.

# Correct judgment method:
awk 'BEGIN{tB["a"]="a1";tB["b"]="b1";if( "c" in tB){print "ok";};for(k in tB){print k,tB[k];}}'
a a1
b b1

if(key in array) uses this method to determine whether the array contains the key key value.

#Delete key value:
awk 'BEGIN{tB["a"]="a1";tB["b"]="b1";delete tB["a"];for(k in tB){print k,tB[k];} }'
b b1

delete array[key] can be deleted, corresponding to the sequence value of the array key.

Use of two-dimensional and multi-dimensional arrays

Awk's multi-dimensional array is essentially a one-dimensional array. To be more precise, awk does not support multi-dimensional arrays in storage. Awk provides an access method that logically simulates a two-dimensional array. For example, access like array[2,4]=1 is allowed. Awk uses a special string SUBSEP(\034) as the split field. In the above example, the key value stored in the associative array array is actually 2\0344.

Similar to the member testing of one-dimensional arrays, multi-dimensional arrays can use the syntax if ((i,j) in array), but the subscripts must be placed in parentheses. Similar to the iteration of one-dimensional arrays, multi-dimensional arrays use syntax such as for (item in array) to traverse the array. Unlike one-dimensional arrays, multi-dimensional arrays must use the split() function to access the individual subscript components.

awk 'BEGIN{
for(i=1;i<=9;i++){
   for(j=1;j<=9;j++){
     tarr[i,j]=i*j; print i,"*",j,"=",tarr[i,j];
   }
}
}'
1 * 1 = 1
1 * 2 = 2
1 * 3 = 3
1 * 4 = 4
1 * 5 = 5
1 * 6 = 6
...
9 * 6 = 54
9 * 7 = 63
9 * 8 = 72
9 * 9 = 81

The contents of the array can be obtained through the array[k,k2] reference.

Another way:

awk 'BEGIN{
for(i=1;i<=9;i++){
   for(j=1;j<=9;j++){
     tarr[i,j]=i*j;
   }
}
for(m in tarr){
   split(m,tarr2,SUBSEP); print tarr2[1],"*",tarr2[2],"=",tarr[m];
}
}'

Built-in functions

Awk's built-in functions are mainly divided into the following three types: arithmetic functions, string functions, other general functions, and time functions.

Arithmetic functions

Format Description
atan2( y, x ) Returns the arctangent of y/x.
cos( x ) Returns the cosine of x; x is in radians.
sin( x ) Returns the sine of x; x is in radians.
exp( x ) Returns the x power function.
log( x ) Returns the natural logarithm of x.
sqrt( x ) Returns the square root of x.
int( x ) Returns the value of x truncated to an integer.
rand( ) Returns any number n where 0 <= n < 1.
srand( [expr] ) Sets the rand function's seed value to the value of the Expr argument, or a time of day if the Expr argument is omitted. Returns the previous seed value.

for example:

awk 'BEGIN{OFMT="%.3f";fs=sin(1);fe=exp(10);fl=log(10);fi=int(3.1415);print fs,fe,fl,fi;} '
0.841 22026.466 2.303 3

OFMT sets the output data format to retain 3 decimal places.

Get random numbers:

awk 'BEGIN{srand();fr=int(100*rand());print fr;}'
78
awk 'BEGIN{srand();fr=int(100*rand());print fr;}'
31
awk 'BEGIN{srand();fr=int(100*rand());print fr;}'
41

String functions

Format Description
gsub(Ere, Repl, [In]) This performs exactly like the sub function, except that all specific values of the regular expression are substituted.
sub(Ere, Repl, [In]) Replaces the first occurrence of the extended regular expression specified by the Ere parameter in the string specified by the In parameter with the string specified by the Repl parameter. The sub function returns the number of substitutions. & (ampersands) occurring in the string specified by the Repl parameter are replaced by the string specified by the In parameter that matches the extended regular expression specified by the Ere parameter. If the In parameter is not specified, the default value is the entire record ($0 record variable).
index(String1, String2) Returns the position in the string specified by the String1 parameter (in which the parameter specified by String2 appears), numbered starting from 1. If the String2 parameter does not occur within the String1 parameter, 0 (zero) is returned.
length [(String)] Returns the length (in characters) of the string specified by the String parameter. If no String argument is given, the length of the entire record ($0 record variable) is returned.
blength [(String)] Returns the length, in bytes, of the string specified by the String parameter. If no String argument is given, the length of the entire record ($0 record variable) is returned.
substr(String, M, [ N ] ) Returns a substring with the number of characters specified by the N argument. The substring is obtained from the string specified by the String parameter, whose characters begin at the position specified by the M parameter. The M parameter is specified with the first character in the String parameter as number 1. If the N parameter is not specified, the length of the substring will be the length from the position specified by the M parameter to the end of the String parameter.
match(String, Ere) Returns the position (as a character) in the string specified by the String parameter in which the extended regular expression specified by the Ere parameter occurs, numbered starting from 1, or if the Ere parameter does not occur 0 (zero). RSTART special variable is set to the return value. The RLENGTH special variable is set to the length of the matched string, or -1 (minus one) if no match is found.
split(String, A, [Ere]) Split the parameter specified by the String parameter into array elements A[1], A[2], . . ., A[n], and return the value of the n variable. This separation can be done by an extended regular expression specified by the Ere parameter, or by the current field separator (FS special variable) if no Ere parameter is given. Elements in the A array are created with string values unless the context indicates that a particular element should also have a numeric value.
tolower(String) Returns the string specified by the String parameter, with each uppercase character in the string changed to lowercase. The mapping between uppercase and lowercase is defined by the LC_CTYPE category of the current locale.
toupper( String ) Returns the string specified by the String parameter, with each lowercase character in the string changed to uppercase. The mapping between uppercase and lowercase is defined by the LC_CTYPE category of the current locale.
sprintf(Format, Expr, Expr, . . . ) Format the expression specified by the Expr parameter according to the printf subroutine format string specified by the Format parameter and return the resulting string.

Note: Ere can be a regular expression.

gsub,sub use

awk 'BEGIN{info="this is a test2010test!";gsub(/[0-9]+/,"!",info);print info}'
this is a test!test!

Find a regular expression that satisfies info in info, replace /[0-9]+/ with "", and assign the replaced value to info. If the info value is not given, the default is $0

Find string (used by index)

awk 'BEGIN{info="this is a test2010test!";print index(info,"test")?"ok":"no found";}'
OK

Not found, returns 0

Regular expression matching search (match use)

awk 'BEGIN{info="this is a test2010test!";print match(info,/[0-9]+/)?"ok":"no found";}'
OK

Intercept string (used by substr)

[wangsl@centos5 ~]$ awk 'BEGIN{info="this is a test2010test!";print substr(info,4,10);}'
s is a tes

Starting from the 4th character, intercept 10 length strings

String splitting (used by split)

awk 'BEGIN{info="this is a test";split(info,tA," ");print length(tA);for(k in tA){print k,tA[k];}}'
4
4 tests
1 this
2 is
3 a

Split info and dynamically create array tA. What is more interesting here is the awk for...in loop, which is an unordered loop. It is not subscripted from the array 1...n, so you need to pay attention when using it.

Formatted string output (used by sprintf)

Format string format:

The format string includes two parts: one part is normal characters, these characters will be output as they are; the other part is formatting specified characters, starting with "%", followed by one or several specified characters, used to determine the output Content format.

format description format description
%d Decimal signed integer %u Decimal unsigned integer
%f floating point number %s string
%c single character %p pointer value
%e Floating point number in exponential form %x %X Unsigned integer in hexadecimal
%o Unsigned octal integer %g Automatically select the appropriate representation
awk 'BEGIN{n1=124.113;n2=-1.224;n3=1.2345; printf("%.2f,%.2u,%.2g,%X,%on",n1,n2,n3,n1,n1); }'
124.11,18446744073709551615,1.2,7C,174

General functions

Format Description
close(Expression) Use the same Expression parameter with a string value to close a file or pipe opened by a print or printf statement or by calling the getline function. Returns 0 if the file or pipe was closed successfully; otherwise returns non-zero. The close statement is necessary if you intend to write to a file and later read it in the same program.
system(command) Execute the command specified by the Command parameter and return the exit status. Equivalent to system subroutine.
Expression | getline [ Variable ] Reads an input record from the piped stream of output from the command specified by the Expression parameter and assigns the record's value to the variable specified by the Variable parameter. Creates a stream that has the value of the Expression parameter as its command name if it is not currently open. The stream created is equivalent to calling the popen subroutine, in which case the Command parameter takes the value of the Expression parameter and the Mode parameter is set to a value of r. As long as the stream remains open and the Expression parameter evaluates to the same string, each subsequent call to the getline function reads another record. If the Variable parameter is not specified, the $0 record variable and the NF special variable are set to the records read from the stream.
getline [ Variable ] < Expression Reads the next record of the input from the file specified by the Expression parameter and sets the variable specified by the Variable parameter to the value of that record. As long as the stream remains open and the Expression parameter evaluates to the same string, each subsequent call to the getline function reads another record. If the Variable parameter is not specified, the $0 record variable and the NF special variable are set to the records read from the stream.
getline [ Variable ] Sets the variable specified by the Variable parameter to the next input record read from the current input file. If the Variable parameter is not specified, the $0 record variable is set to the value of that record, and the NF, NR, and FNR special variables are also set.

Open external file (close usage)

awk 'BEGIN{while("cat /etc/passwd"|getline){print $0;};close("/etc/passwd");}'
root❌0:0:root:/root:/bin/bash
bin❌1:1:bin:/bin:/sbin/nologin
daemon❌2:2:daemon:/sbin:/sbin/nologin

Read external files line by line (getline usage method)

awk 'BEGIN{while(getline < "/etc/passwd"){print $0;};close("/etc/passwd");}'
root❌0:0:root:/root:/bin/bash
bin❌1:1:bin:/bin:/sbin/nologin
daemon❌2:2:daemon:/sbin:/sbin/nologin
awk 'BEGIN{print "Enter your name:";getline name;print name;}'
Enter your name:
chengmo
chengmo

Calling external applications (system usage method)

awk 'BEGIN{b=system("ls -al");print b;}'
total 42092
drwxr-xr-x 14 chengmo chengmo 4096 09-30 17:47 .
drwxr-xr-x 95 root root 4096 10-08 14:01 ..

bThe return value is the execution result.

Time function

Format Description
Function name Description
mktime(YYYY MM dd HH MM ss[DST]) Generate time format
strftime([format [, timestamp]]) Format time output and convert timestamp into time string specific format, see the table below.
systime() Get the timestamp and return the number of whole seconds from January 1, 1970 to the current time (excluding leap years)

Create a specified time (used by mktime)

awk 'BEGIN{tstamp=mktime("2001 01 01 12 12 12");print strftime("%c",tstamp);}'
Monday, January 1, 2001 12:12:12
awk 'BEGIN{tstamp1=mktime("2001 01 01 12 12 12");tstamp2=mktime("2001 02 01 0 0 0");print tstamp2-tstamp1;}'
2634468

Find the time difference between two time periods and introduce the use of strftime

awk 'BEGIN{tstamp1=mktime("2001 01 01 12 12 12");tstamp2=systime();print tstamp2-tstamp1;}'
308201392

strftime date and time format specifier

Format Description
%a Abbreviation of day of the week (Sun)
%A The complete writing of the day of the week (Sunday)
%b Abbreviation of month name (Oct)
%B The full spelling of the month name (October)
%c local date and time
%d Decimal date
%D Date 08/20/99
%e Date, if there is only one digit, a space will be added
%H Hour in decimal 24-hour format
%I Hour in decimal 12-hour format
%j Day of the year from January 1st
%m month in decimal
%M Minutes in decimal
%p 12-hour notation (AM/PM)
%S Seconds in decimal
%U The number of the week in the year in decimal notation (Sunday is the beginning of the week)
%w Day of the week in decimal (Sunday is 0)
%W The number of the week in the year in decimal notation (Monday is the beginning of the week)
%x Reset local date (08/20/99)
%X Reset local time (12:00:00)
%y Two-digit year (99)
%Y Current month
%% Percent sign (%)