TrumanWong

sort

Sort all lines in a text file.

Summary

sort [OPTION]... [FILE]...
sort [OPTION]... --files0-from=F

The main purpose

  • Sort the contents of all input files and output them.
  • When there is no file or the file is -, read standard input.

Options

Sorting options:

-b, --ignore-leading-blanks Ignore leading blanks.
-d, --dictionary-order only considers whitespace, letters, and numbers.
-f, --ignore-case Consider lowercase letters as uppercase letters.
-g, --general-numeric-sort Sort numerically.
-i, --ignore-nonprinting exclude non-printable characters.
-M, --month-sort Sort by non-month, January, December.
-h, --human-numeric-sort Sort according to storage capacity (note the use of uppercase letters, for example: 2K 1G).
-n, --numeric-sort Sort numerically.
-R, --random-sort Sort randomly, but group identical rows.
--random-source=FILE Get a random length of bytes from FILE.
-r, --reverse Sort results in reverse order.
--sort=WORD Sort according to WORD, where: general-numeric is equivalent to -g, human-numeric is equivalent to -h, month is equivalent to -M, numeric is equivalent to -n, random is equivalent to -R, version Equivalent to -V.
-V, --version-sort Natural sorting of (version) numbers in text.

other options:

--batch-size=NMERGE Merge up to NMERGE inputs at a time; use temporary files for any excess.
-c, --check, --check=diagnose-first Check whether the input is sorted, this operation does not perform sorting.
-C, --check=quiet, --check=silent Like the -c option, but does not output the first unsorted line.
--compress-program=PROG Use PROG to compress temporary files; use PROG -d to decompress.
--debug Comment out the lines used for sorting, sending alerts of suspicious usage to stderr.
--files0-from=F Read all NUL-terminated file names from file F; if F is - , then read the names from standard input.
-k, --key=KEYDEF Sort by key; KEYDEF gives position and type.
-m, --merge Merge sorted files without sorting them afterwards.
-o, --output=FILE Write results to FILE instead of standard output.
-s, --stable Stable sorting by disabling the final comparison.
-S, --buffer-size=SIZE Use SIZE as the memory buffer size.
-t, --field-separator=SEP Use SEP as the column separator.
-T, --temporary-directory=DIR Use DIR as the temporary directory instead of $TMPDIR or /tmp; use this option multiple times to specify multiple temporary directories.
--parallel=N Change the number of concurrently running sorts to N.
-u, --unique When -c is used at the same time, the sorting is strictly checked; when -c is not used at the same time, the result of deduplication after sorting is output.
-z, --zero-terminated Set the line terminator to NUL (empty) instead of newline.
--help Display help information and exit.
--version Display version information and exit.


The format of KEYDEF is: F[.C][OPTS][,F[.C][OPTS]], indicating the starting to ending position.
F represents the number of the column
C means
OPTS is one or more characters in [bdfgiMhnRrV], used to override the current sorting options.
Use the --debug option to diagnose incorrect usage.


SIZE can have the following multiplicative suffixes:
% 1% of memory;
b 1;
K 1024 (default);
The remaining M, G, T, P, E, Z, Y can be deduced by analogy.

Parameters

FILE (optional): The file to be processed, which can be any number.

return value

Returning 0 indicates success, returning a non-zero value indicates failure.

example

sort compares each line of the file/text as a unit. The comparison principle is to compare the ASCII code values from the first character backward, and finally output them in ascending order.

root@[mail text]# cat sort.txt
aaa:10:1.1
ccc:30:3.3
ddd:40:4.4
bbb:20:2.2
eee:50:5.5
eee:50:5.5

[root@mail text]# sort sort.txt
aaa:10:1.1
bbb:20:2.2
ccc:30:3.3
ddd:40:4.4
eee:50:5.5
eee:50:5.5

To ignore identical lines use the -u option or uniq:

[root@mail text]# cat sort.txt
aaa:10:1.1
ccc:30:3.3
ddd:40:4.4
bbb:20:2.2
eee:50:5.5
eee:50:5.5

[root@mail text]# sort -u sort.txt
aaa:10:1.1
bbb:20:2.2
ccc:30:3.3
ddd:40:4.4
eee:50:5.5

[root@mail text]# uniq sort.txt
aaa:10:1.1
ccc:30:3.3
ddd:40:4.4
bbb:20:2.2
eee:50:5.5

Use of -n, -r, -k, -t options of sort:

[root@mail text]# cat sort.txt
AAA:BB:CC
aaa:30:1.6
ccc:50:3.3
ddd:20:4.2
bbb:10:2.5
eee:40:5.4
eee:60:5.1

# Arrange column BB in numerical order from small to large:
[root@mail text]# sort -nk 2 -t: sort.txt
AAA:BB:CC
bbb:10:2.5
ddd:20:4.2
aaa:30:1.6
eee:40:5.4
ccc:50:3.3
eee:60:5.1

# Arrange the CC column numbers from large to small:
# -n is to sort by numerical size, -r is in reverse order, -k specifies the column to be sorted, -t specifies the column separator to be a colon
[root@mail text]# sort -nrk 3 -t: sort.txt
eee:40:5.4
eee:60:5.1
ddd:20:4.2
ccc:50:3.3
bbb:10:2.5
aaa:30:1.6
AAA:BB:CC

Interpretation and examples of the -k option:

In-depth explanation of -k option:

FStart.CStart Modifier,FEnd.CEnd Modifier
-------Start--------,-------End--------
  FStart.CStart option, FEnd.CEnd option

This syntax format can be divided into two parts by the commas ,, the Start part and the End part. The Start part consists of three parts, the Modifier part is the option part we mentioned before; Let’s focus on FStart and C.Start in the Start part; C.Start can be omitted. If omitted, it means starting from the beginning of this domain. FStart.CStart, where FStart represents the field used, and CStart represents the first character of sorting in the FStart field. Similarly, in the End section, you can set FEnd.CEnd. If you omit .CEnd or set it to 0, it means ending to the last character of this field.

Example: Sort from the second letter of the company’s English name:

$ sort -t ' ' -k 1.2 facebook.txt
Baidu 100 5000
Sohu 100 4500
google 110 5000
Guge 50 3000

Interpretation: -k 1.2 is used, which means to sort the string starting from the second character of the first field to the last character of this field. You will find that baidu is on the top of the list because the second letter is a. The second characters of sohu and google are both o, but the h of sohu comes before the o of google, so they are ranked second and third respectively. Guge can only be ranked fourth.

Example: Sort only by the second letter of the company's English name. If the same is done, sort by employee salary in descending order:

$ sort -t ' ' -k 1.2,1.2 -nrk 3,3 facebook.txt
Baidu 100 5000
google 110 5000
Sohu 100 4500
Guge 50 3000

Interpretation: Since only the second letter is sorted, we use the expression -k 1.2,1.2, which means that we only sort the second letter (if you ask me why not using -k 1.2 ? Of course not, because you omitted the End part, which means you will sort the string from the second letter to the last character of the field). To sort employee salaries, we also used -k 3,3, which is the most accurate expression, indicating that we only sort this field, because if you omit the following 3, it becomes that we sort the 3rd The contents are sorted from the beginning of each field to the position of the last field.

Notice

  1. About the difference between -g and -n options: stackoverflow

  2. For learning about this complex command, it is recommended that you read the info document and refer to blogs, Q&A websites, etc.

  3. This command is a command in the GNU coreutils package. For related help information, please see man -s 1 shuf, info coreutils 'shuf invocation'.