uniq Command in Linux

Introduction to the uniq Command

This lab provides a practical guide to using the uniq command in Linux for efficient text processing. You will learn how to eliminate duplicate lines from files and count the frequency of unique lines. Mastering the uniq command is essential for system administrators (systemadmins) and anyone working with text data in a Linux environment. This tool allows you to streamline data cleaning and analysis. By the end of this lab, you'll have a solid understanding of the uniq command's purpose, syntax, and its application in real-world scenarios.

This tutorial covers these key areas:

  • Understanding the Purpose and Syntax of the uniq Command
  • Removing Duplicate Lines from a File Using uniq
  • Counting the Occurrences of Unique Lines with uniq

Understanding the Purpose and Syntax of the uniq Command

This section focuses on the core functionality and syntax of the uniq command in Linux. The primary role of uniq is to filter out repeated adjacent lines from a file or standard input.

The fundamental syntax of the uniq command is as follows:

uniq [OPTION]... [INPUT_FILE [OUTPUT_FILE]]

Here's a breakdown of the most common options:

  • -c: Prepend each line with the number of times it occurs in the input.
  • -d: Display only the lines that are duplicated.
  • -u: Show only the lines that appear uniquely (no duplicates).
  • -i: Perform case-insensitive comparisons.
  • -f N: Skip the first N fields of each line during comparison.
  • -s N: Ignore the first N characters of each line when comparing.

To illustrate, let's create a simple text file containing duplicate lines:

echo -e "apple\norange\napple\nbanana\norange" > sample.txt

Expected file content:

apple
orange
apple
banana
orange

Now, use the uniq command to remove consecutive duplicate lines:

uniq sample.txt

Resultant output:

apple
orange
banana

In the example above, uniq eliminated the consecutive duplicate "apple" and "orange" lines.

Remove Duplicate Lines from a File

In this section, you'll learn how to use uniq to clean a file by removing duplicate entries. Keep in mind uniq only compares adjacent lines.

Start by creating a file with duplicated lines:

echo -e "apple\norange\napple\nbanana\norange\napple" > sample.txt

The file 'sample.txt' will contain:

apple
orange
apple
banana
orange
apple

Running the uniq command directly on this file:

uniq sample.txt

Gives the output:

apple
orange
apple
banana
orange
apple

As you can see, uniq only removes *adjacent* duplicates. To remove all duplicate lines, regardless of their position, you must first sort the file.

Use the sort command in combination with uniq to remove all duplicates:

sort sample.txt | uniq

This will output:

apple
banana
orange

The sort command arranges the lines alphabetically, placing duplicates next to each other, which then allows uniq to remove them.

Count the Occurrences of Unique Lines

This section demonstrates how to determine the frequency of each unique line in a file using uniq.

Begin with the same example file:

echo -e "apple\norange\napple\nbanana\norange\napple" > sample.txt

Contents of 'sample.txt':

apple
orange
apple
banana
orange
apple

To count the occurrences, use the -c option:

uniq -c sample.txt

This will output:

   3 apple
   1 banana
   2 orange

The number preceding each line indicates the count of that particular unique line.

To sort the output by count, pipe the result to the sort command with the -n (numeric sort) option:

uniq -c sample.txt | sort -n

The sorted output will be:

   1 banana
   2 orange
   3 apple

The -n flag in sort ensures the output is sorted numerically from the lowest to highest count.

Summary

This lab provided a comprehensive overview of the uniq command in Linux. You learned how to use uniq to remove duplicate lines, leveraging options like -c for counting occurrences, -d for showing only duplicates, and -i for case-insensitive comparisons. A key takeaway is that uniq only processes adjacent lines. To remove all duplicates, you must first sort the input using the sort command, ensuring all identical lines are next to each other before applying uniq. This makes sort | uniq a powerful combination for cleaning and analyzing text data within a Linux systemadmin environment. Furthermore, understanding the nuances of tools like uniq is crucial for any user working with Linux systems, from basic scripting to advanced system administration tasks, even tasks requiring root privileges.

400+ Linux Commands