uniq Command in Linux

Introduction to the uniq Command

This lab provides a practical guide to using the uniq command in Linux for efficient text processing. You will learn how to eliminate duplicate lines from files and count the frequency of unique lines. Mastering the uniq command is essential for system administrators (systemadmins) and anyone working with text data in a Linux environment. This tool allows you to streamline data cleaning and analysis. By the end of this lab, you'll have a solid understanding of the uniq command's purpose, syntax, and its application in real-world scenarios.

This tutorial covers these key areas:

Understanding the Purpose and Syntax of the uniq Command
Removing Duplicate Lines from a File Using uniq
Counting the Occurrences of Unique Lines with uniq

Understanding the Purpose and Syntax of the uniq Command

This section focuses on the core functionality and syntax of the uniq command in Linux. The primary role of uniq is to filter out repeated adjacent lines from a file or standard input.

The fundamental syntax of the uniq command is as follows:

uniq [OPTION]... [INPUT_FILE [OUTPUT_FILE]]

Here's a breakdown of the most common options:

-c: Prepend each line with the number of times it occurs in the input.
-d: Display only the lines that are duplicated.
-u: Show only the lines that appear uniquely (no duplicates).
-i: Perform case-insensitive comparisons.
-f N: Skip the first N fields of each line during comparison.
-s N: Ignore the first N characters of each line when comparing.

To illustrate, let's create a simple text file containing duplicate lines:

echo -e "apple\norange\napple\nbanana\norange" > sample.txt

Expected file content:

apple
orange
apple
banana
orange

Now, use the uniq command to remove consecutive duplicate lines:

uniq sample.txt

Resultant output:

apple
orange
banana

In the example above, uniq eliminated the consecutive duplicate "apple" and "orange" lines.

Remove Duplicate Lines from a File

In this section, you'll learn how to use uniq to clean a file by removing duplicate entries. Keep in mind uniq only compares adjacent lines.

Start by creating a file with duplicated lines:

echo -e "apple\norange\napple\nbanana\norange\napple" > sample.txt

The file 'sample.txt' will contain:

apple
orange
apple
banana
orange
apple

Running the uniq command directly on this file:

uniq sample.txt

Gives the output:

apple
orange
apple
banana
orange
apple

As you can see, uniq only removes *adjacent* duplicates. To remove all duplicate lines, regardless of their position, you must first sort the file.

Use the sort command in combination with uniq to remove all duplicates:

sort sample.txt | uniq

This will output:

apple
banana
orange

The sort command arranges the lines alphabetically, placing duplicates next to each other, which then allows uniq to remove them.

Count the Occurrences of Unique Lines

This section demonstrates how to determine the frequency of each unique line in a file using uniq.

Begin with the same example file:

echo -e "apple\norange\napple\nbanana\norange\napple" > sample.txt

Contents of 'sample.txt':

apple
orange
apple
banana
orange
apple

To count the occurrences, use the -c option:

uniq -c sample.txt

This will output:

   3 apple
   1 banana
   2 orange

The number preceding each line indicates the count of that particular unique line.

To sort the output by count, pipe the result to the sort command with the -n (numeric sort) option:

uniq -c sample.txt | sort -n

The sorted output will be:

   1 banana
   2 orange
   3 apple

The -n flag in sort ensures the output is sorted numerically from the lowest to highest count.

Summary

This lab provided a comprehensive overview of the uniq command in Linux. You learned how to use uniq to remove duplicate lines, leveraging options like -c for counting occurrences, -d for showing only duplicates, and -i for case-insensitive comparisons. A key takeaway is that uniq only processes adjacent lines. To remove all duplicates, you must first sort the input using the sort command, ensuring all identical lines are next to each other before applying uniq. This makes sort | uniq a powerful combination for cleaning and analyzing text data within a Linux systemadmin environment. Furthermore, understanding the nuances of tools like uniq is crucial for any user working with Linux systems, from basic scripting to advanced system administration tasks, even tasks requiring root privileges.

400+ Linux Commands

Tác giả

Nguyễn Hoàng Long

Tôi là một chuyên gia System Administrator (SysAdmin) & DevOps Engineer với hơn 10 năm kinh nghiệm trong lĩnh vực quản trị hệ thống, bảo mật mạng, và tối ưu hạ tầng Cloud. Tôi đã từng làm việc tại các tập đoàn công nghệ lớn và tham gia triển khai nhiều hệ thống High Availability (HA), Load Balancing, Database, container và CI/CD giúp doanh nghiệp hoạt động ổn định với hiệu suất cao. Bài này tôi viết với thời gian đọc khoảng 4 phút. I also wrote a Vietnamese version.

uniq Command in Linux

Introduction to the uniq Command

Understanding the Purpose and Syntax of the uniq Command

Remove Duplicate Lines from a File

Count the Occurrences of Unique Lines

Summary

Tác giả

tr Command in Linux

head Command in Linux

tail Command in Linux

wc Command in Linux

diff Command in Linux

sort Command in Linux

paste Command in Linux

cut Command in Linux

awk Command in Linux

sed Command in Linux