split Command in Linux

Introduction

In this tutorial, you'll discover the power of the Linux split command for efficiently dividing large files into smaller, easily managed segments. This command is an essential tool for any systemadmin, allowing you to split files by size, line count, or custom criteria. This is extremely helpful when handling substantial log files, database backups, or any large data that needs to be processed or transferred in smaller, more digestible portions. We'll begin with a clear understanding of the split command's core function and then progress to customizing its options to precisely control how a file is divided. This lab provides real-world examples, empowering you to become adept at using the split command for all your text processing and file management tasks.

Understand the Purpose of the split Command

In this section, we'll explore the core purpose and practical applications of the split command within a Linux environment. The split command serves as a robust tool for breaking down large files into more manageable, smaller pieces.

The primary benefit of using the split command is the ability to handle files that are too large to work with directly or transfer efficiently. This is particularly useful for systemadmin when dealing with extensive log files that consume too much memory, large database backups that need to be archived, or other types of massive datasets that require processing or distribution in smaller, easily handled segments.

To illustrate the split command, let's first create a large file for demonstration purposes:

head -n 10000 /dev/urandom > large_file.txt

This command generates a file named large_file.txt containing 10,000 lines of randomly generated data.

Now, let's apply the split command to divide this file into smaller, more workable chunks:

split -b 1m large_file.txt split_

This command effectively divides the large_file.txt file into multiple files, each capped at a maximum size of 1 megabyte (1m). The resulting filenames will be prefixed with "split_".

Example output:

split_aa
split_ab
split_ac
split_ad

In the subsequent section, we'll delve into customizing the split command options, providing you with granular control over the file splitting process.

Split a File into Multiple Parts

This section focuses on the practical application of the split command to divide a file into multiple distinct parts.

Continuing with the large_file.txt file created earlier, we'll explore splitting this file into smaller segments using the split command with diverse options:

## Split the file into 5 equal-sized parts
split -n 5 large_file.txt split_part_

## Split the file into parts with a maximum size of 500 KB
split -b 500k large_file.txt split_part_

## Split the file into parts with a line-based approach
split -l 1000 large_file.txt split_part_

Example output:

split_part_aa
split_part_ab
split_part_ac
split_part_ad
split_part_ae

The first command divides the file into precisely 5 equal-sized portions. The second command splits the file into segments, each with a maximum size of 500 KB. The third command divides the file based on line count, ensuring each resulting file contains no more than 1,000 lines.

You can verify the successful creation of these files using the ls command:

ls -l split_*

Example output:

-rw-r--r-- 1 labex labex 2000000 Apr 12 12:34 split_part_aa
-rw-r--r-- 1 labex labex 2000000 Apr 12 12:34 split_part_ab
-rw-r--r-- 1 labex labex 2000000 Apr 12 12:34 split_part_ac
-rw-r--r-- 1 labex labex 2000000 Apr 12 12:34 split_part_ad
-rw-r--r-- 1 labex labex 2000000 Apr 12 12:34 split_part_ae

In the next step, we'll explore how to further customize the split command options to align with specific requirements.

Customize the Split Command Options

This section focuses on customizing the split command options to match unique use cases.

The split command offers a range of options that grant precise control over the file splitting process. Here are some illustrative examples:

## Split the file into parts with a prefix of "custom_"
split -d -a 2 -b 1m large_file.txt custom_

## Split the file into parts with a prefix of "custom_" and a suffix of ".txt"
split -d -a 2 -b 1m -d large_file.txt custom_.txt

## Split the file into parts with a custom suffix
split -d -a 2 -b 1m large_file.txt part_

## Split the file into parts based on a specific pattern
split -d -a 2 -l 1000 large_file.txt pattern_

Let's examine these examples in detail:

The -d option instructs split to use numerical suffixes instead of the default alphabetical suffixes.
The -a 2 option dictates the length of the suffix, setting it to 2 characters.
The -b 1m option specifies that each resulting file should have a maximum size of 1 megabyte.
The -d option in the second example, combined with .txt, ensures that the output files receive a numeric suffix and the .txt extension.
The part_ prefix in the third example dictates that the output files will be named part_00, part_01, and so on.
The pattern_ prefix in the fourth example instructs the command to name the resulting files pattern_00, pattern_01, and so on.

Experiment with these options to determine the optimal method for splitting files based on your specific requirements and preferences. This is especially useful for a systemadmin when dealing with configuration files where patterns must remain in specific files.

Summary

This tutorial provided a comprehensive understanding of the purpose and application of the split command in Linux. This command is a valuable tool for systemadmin, empowering you to break down large files into smaller, more manageable units, which is particularly beneficial when working with substantial log files, database backups, or any large data requiring processing or transfer in smaller chunks. You learned how to employ the split command with various options to divide a file into multiple parts, including dividing a file into a predefined number of equal-sized portions, splitting files based on maximum size constraints, and splitting files based on line count criteria. Mastering the split command enhances your ability to efficiently manage and manipulate large datasets within the Linux environment, making you a more effective systemadmin.

400+ Linux Commands

Tác giả

Nguyễn Hoàng Long

Tôi là một chuyên gia System Administrator (SysAdmin) & DevOps Engineer với hơn 10 năm kinh nghiệm trong lĩnh vực quản trị hệ thống, bảo mật mạng, và tối ưu hạ tầng Cloud. Tôi đã từng làm việc tại các tập đoàn công nghệ lớn và tham gia triển khai nhiều hệ thống High Availability (HA), Load Balancing, Database, container và CI/CD giúp doanh nghiệp hoạt động ổn định với hiệu suất cao. Bài này tôi viết với thời gian đọc khoảng 5 phút. I also wrote a Vietnamese version.

split Command in Linux

Introduction

Understand the Purpose of the split Command

Split a File into Multiple Parts

Customize the Split Command Options

Summary

Tác giả

join Command in Linux

fmt Command in Linux

fold Command in Linux

expand Command in Linux

col Command in Linux

patch Command in Linux

diff Command in Linux

wc Command in Linux

tail Command in Linux

head Command in Linux