TrumanWong

csplit

Split a large file into smaller fragments

Supplementary instructions

csplit command is used to split a large file into small fragments and save each divided fragment into a file. Fragmented files are named like "xx00", "xx01". The csplit command is a variant of split. Split can only split files based on file size or number of lines, but csplit can split files based on the characteristics of the file itself.

grammar

csplit(options)(parameters)

Options

-b <output format> or --suffix-format=<output format>: The default output format has file names such as xx00, xx01, etc. The user can change the output file name by changing the <output format>;
-f <Output prefix string> or --prefix=<Output prefix string>: The file name of the default output prefix string is xx00, xx01, etc. If the output prefix string is specified as "hello" , the output file name will become hello00, hello, 01...
-k or --keep-files: Keep files even if an error occurs or execution is interrupted, and files that have been output and saved cannot be deleted;
-n <Number of output file names> or --digits=<Number of output file names>: The default number of output file names. The file names are xx00, xx01...if the user specifies the output file name. If the number is "3", the output file name will become xx000, xx001, etc.;
-q or -s or --quiet or --silent: do not display the instruction execution process;
-z or --elide-empty-files: Delete files with a length of 0 Byte.

Parameters

  • File: Specify the original file to be divided;
  • Pattern: Specify the matching pattern when splitting files.

Example

Sample test file server.log

cat server.log
SERVER-1
[con] 10.10.10.1 suc
[con] 10.10.10.2 fai
[dis] 10.10.10.3 pen
[con] 10.10.10.4 suc
SERVER-2
[con] 10.10.10.5 suc
[con] 10.10.10.6 fai
[dis] 10.10.10.7 pen
[con] 10.10.10.8 suc
SERVER-3
[con] 10.10.10.9 suc
[con] 10.10.10.10 fai
[dis] 10.10.10.11 pen
[con] 10.10.10.12 suc

Server.log needs to be divided into server1.log, server2.log, and server3.log. The contents of these files are taken from different SERVER parts in the original file:

[root@localhost split]# csplit server.log /SERVER/ -n2 -s {*} -f server -b "%02d.log"; rm server00.log
[root@localhost split]# ls
server01.log server02.log server03.log server.log

Command details:

/[Regular expression]/ #Match text patterns, such as /SERVER/, from the first line to the matching line containing SERVER.
{*} #Indicates that the split is repeated based on matching until the end of the file. Use the form of {integer} to specify the number of split executions.
-s #Silent mode, no other information is printed.
-n #Specify the number of digits in the split file name suffix. For example, 01, 02, 03, etc.
-f #Specify the split file name prefix.
-b #Specify the suffix format. For example, %02d.log is similar to the printf parameter format in C language.
rm server00.log # is to delete the first file, because the first file after split has no content, and the matching word is located in the first line of the file.