Practical File Splitting for Large Files in Linux

In a Linux environment, there are often scenarios for transferring large files. You can use the split command to divide a file into a specified number of chunks, and then use the cat command to merge them back together. This process is simple and reliable. Here are the specific steps:

1. Splitting the File into 10 Chunks

Use the split command with the -n option to specify the number of splits (10 chunks). The syntax is as follows:

split -n 10 source_file prefix_of_split_files

Example:

Assuming the file to be split is archive.tar.gz, and the prefix for the split files is package_, execute:

split -n 10 archive.tar.gz package_

Note:

  • split -n 10: Forces the file to be evenly divided into 10 chunks (if the file size is not a multiple of 10, the last chunk will be slightly smaller, which does not affect merging).
  • After splitting, 10 files will be generated, named package_aa, package_ab, package_ac, …, package_aj (sorted in alphabetical order to ensure correct merging).

2. Merging the 10 Chunks Back into the Original File

Use the cat command to concatenate the files in the order they were split. The syntax is as follows:

cat prefix_of_split_files* > merged_file_name

Example:

To merge the files from package_aa to package_aj into merged_archive.tar.gz, execute:

cat package_* > merged_archive.tar.gz

Note:

  • package_*: The wildcard matches all files prefixed with package_ and concatenates them in alphabetical order (aa→ab→…→aj), ensuring consistency with the original file.
  • The merged file will have the exact same content as the original file (you can verify this using md5sum: calculate the MD5 values of both the original and merged files, and if they match, the process was successful).

3. Verifying the Merge Result

Use md5sum to compare the hash values of the original and merged files to confirm integrity:

# Calculate the MD5 of the original file
md5sum archive.tar.gz

# Calculate the MD5 of the merged file
md5sum merged_archive.tar.gz

If the two MD5 values are the same, it indicates that there was no data loss during the splitting and merging process.

Conclusion

  • Splitting:split -n 10 source_file prefix
  • Merging:cat prefix* > merged_file

This method is applicable to any file (compressed packages, documents, images, etc.) and does not require additional tools, as it is natively supported by Linux systems.

Leave a Comment