Finding uniquiness in a list! Remove duplicates, print duplicated, show only duplicates…the possibilities are endless! Well, actually there is a finite number of posibilities, but that involves combinatorics. And I’m too lazy for that right now. Anyway… uniq is pretty slick. Check it out!

Overview

Before we get on to uniqs usage and examples, heres the a short text file that’ll be used.

# distros.txt

Fedora
Ubuntu
Centos
Gentoo
Arch
openSUSE
Ubuntu
PureOS
Android
Gentoo
Ubuntu
Debian
Alpine
Clear
Clear

Usage

…reads the specified input_file comparing adjacent lines, and writes a copy of each unique input line to the output_file. -- uniq(1)

The key part of that description is comparing adjacent lines. Within our example file, the only two lines that are the same that are adjacent is Clear, the last two lines of the file. As this file is, the uniq will only find Clear as duplicates, even though Ubuntu and Gentoo are also duplicates. This is why you’ll usually see the uniq command in conjunction with the sort command. Usually in the form of sort | uniq. There’s a few differences in the options between the GNU version and the BSD versions on my system. I’ll do my best to explain and differentiate.

option long-form version description
-c both count the number of times the line appeared.
-d both only output repeated lines.
-D --repeated GNU print all repeated lines.
-i --ignore-case both read input as as insensitive. Note that --ignore-case only works on GNU uniq.
-u --unique both only print unique lines.
-z --zero-terminated GNU line delimiter is NUL, not newline.

There are more options for the uniq command and I encourage you to read the man pages and explore some more. The examples that follow will give you a general idea of the uniq command and how to use it in most situations.

Examples

Here’s the basic usage for the uniq command. Notice how there are still duplicate lines in the output.

uniq distros.txt
Fedora
Ubuntu
Red Hat
Centos
Gentoo
Arch
openSUSE
Ubuntu
PureOS
Android
Gentoo
Ubuntu
Debian
Alpine
Clear

Using the sort command, delivers a more desired result. This will be the base for most of the commands that follow.

sort distros.txt | uniq
Alpine
Android
Arch
Centos
Clear
Debian
Fedora
Gentoo
PureOS
Red Hat
Ubuntu
openSUSE

Counting the number of occurrences for each item. Piping another sort -n at the end of this would put the output in order of occurrences ascending.

sort distros.txt | uniq -c
   1 Alpine
   1 Android
   1 Arch
   1 Centos
   2 Clear
   1 Debian
   1 Fedora
   2 Gentoo
   1 PureOS
   1 Red Hat
   3 Ubuntu
   1 openSUSE

Viewing only the lines that are not unique.

sort distros.txt | uniq -d
Clear
Gentoo
Ubuntu

On GNU versions of uniq, you can view all the lines that have duplicates, not just the line itself.

sort distros.txt | guniq -D
Clear
Clear
Gentoo
Gentoo
Ubuntu
Ubuntu
Ubuntu

If you wanted to list only the items that are unique, use the -u flag. This way Ubuntu, Clear, and Gentoo will be omitted.

sort distros.txt | uniq -u
Alpine
Android
Arch
Centos
Debian
Fedora
PureOS
Red Hat
openSUSE

With the help of a few extra utilities we can find out the most repeated item in a list.

sort distros.txt | uniq -c | sort -rn | head -n | awk '{print $2}'
Ubuntu

References