Bash

Splitting Files

Introduction#

Sometimes it’s useful to split a file into multiple separate files. If you have large files, it might be a good idea to break it into smaller chunks

Split a file

Running the split command without any options will split a file into 1 or more separate files containing up to 1000 lines each.

split file

This will create files named xaa, xab, xac, etc, each containing up to 1000 lines. As you can see, all of them are prefixed with the letter x by default. If the initial file was less than 1000 lines, only one such file would be created.

To change the prefix, add your desired prefix to the end of the command line

split file customprefix

Now files named customprefixaa, customprefixab, customprefixac etc. will be created

To specify the number of lines to output per file, use the -l option. The following will split a file into a maximum of 5000 lines

split -l5000 file

OR

split --lines=5000 file

Alternatively, you can specify a maximum number of bytes instead of lines. This is done by using the -b or --bytes options. For example, to allow a maximum of 1MB

split --bytes=1MB file

We can use sed with w option to split a file into mutiple files. Files can be split by specifying line address or pattern.

Suppose we have this source file that we would like to split:

cat -n sourcefile

1 On the Ning Nang Nong
2 Where the Cows go Bong!
3 and the monkeys all say BOO!
4 There’s a Nong Nang Ning
5 Where the trees go Ping!
6 And the tea pots jibber jabber joo.
7 On the Nong Ning Nang

Command to split the file by line number:

sed '1,3w f1
> 4,7w f2' sourcefile

This writes line1 to line3 into file f1 and line4 to line7 into file f2, from the sourcefile.

cat -n f1

1 On the Ning Nang Nong
2 Where the Cows go Bong!
3 and the monkeys all say BOO!

cat -n f2 

1 There’s a Nong Nang Ning
2 Where the trees go Ping!
3 And the tea pots jibber jabber joo.
4 On the Nong Ning Nang

Command to split the file by context/pattern:

sed '/Ning/w file1
> /Ping/w file2' sourcefile

This splits the sourcefile into file1 and file2. file1 contains all lines that match Ning, file2 contains lines that match Ping.

cat file1

On the Ning Nang Nong
There’s a Nong Nang Ning
On the Nong Ning Nang

cat file2

Where the trees go Ping!


This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow