Monday, November 29, 2010

The Linux split command

Why use split?  Sometimes, you’ll want to split a file simply because it’s faster to run a command or script against it. Also, you may need to move a file from one system to another and by breaking a 20gb file into forty 500 megabyte pieces, it might be much easier to move the data around. You can also break it into chunks and write to a CD and easily restore to actual file when needed.
One option for dealing with huge files is to break them into more manageable chunks and move or process these chunks independently. The command to use for this is called split and it works with text or binary files. Split can divide files into chunks that contain a certain number of lines.
The following will create a series of 200,000-line files, giving them the default names for split files – xaa, xab, xac and so on:
$ split -l 200000 largelogfile
- The following example gives more meaningful names to each chunk and will result in a series of files called log_aa, log_ab, log_ac and so on:
$ split -l 200000 largelogfile log_
If you’re going to split a binary file such as movies and mp3 files then it’s just as easy as splitting a text file. The following command will split the WMV file, movie1.wmv, into a series of 10 kilobyte chunks and and name them wmv_aa, wmv_ab, wmv_ac and so on.
$ split -b 10k movie1.wmv wmv_
So now that you’ve split the file into many smaller ones, you can easily use the cat command to restore all smaller chunks into original text file or binary. Here is a quick example of breaking a file into smaller pieces and restoring it to original:
$ -rw-r--r--  1 root root 47704 May  8 15:11 design.jpg
Another example.
$ split -b 10000 design.jpg logo_
The output is:
-rw-rw-r--  1 root root 10000 May  8 15:38 logo_aa
-rw-rw-r--  1 root root 10000 May  8 15:38 logo_ab
-rw-rw-r--  1 root root 10000 May  8 15:38 logo_ac
-rw-rw-r--  1 root root 10000 May  8 15:38 logo_ad
-rw-rw-r--  1 root root       7704 May  8 15:38 logo_ae
To piece the file back together, use the cat command and a wild card.
$ cat logo* > design_new.jpg
We now have two JPEG files, the original and the reconstituted file:
-rw-r--r--   1 root root 47704 May  8 15:11 design.jpg
-rw-rw-r--  1 root root 47704 May  8 15:43 design_new.jpg
Have fun.

No comments:

Post a Comment

Error: Too many open files

Couple days ago, I ran into this error on CentOS 5.2.  To my understanding this error can occur on all flavors of l...