Command of the Day: uniq
Finding uniquiness in a list! Remove duplicates, print duplicated, show only duplicates…the possibilities are endless! Well, actually there is a finite number of posibilities, but that involves combinatorics. And I’m too lazy for that right now. Anyway… uniq
is pretty slick. Check it out!
Overview⌗
Before we get on to uniq
s usage and examples, heres the a short text file that’ll be used.
# distros.txt
Fedora
Ubuntu
Centos
Gentoo
Arch
openSUSE
Ubuntu
PureOS
Android
Gentoo
Ubuntu
Debian
Alpine
Clear
Clear
Usage⌗
…reads the specified input_file comparing adjacent lines, and writes a copy of each unique input line to the output_file. -- uniq(1)
The key part of that description is comparing adjacent lines.
Within our example file, the only two lines that are the same that are adjacent is Clear, the last two lines of the file.
As this file is, the uniq
will only find Clear as duplicates, even though Ubuntu and Gentoo are also duplicates.
This is why you’ll usually see the uniq
command in conjunction with the sort
command.
Usually in the form of sort | uniq
.
There’s a few differences in the options between the GNU version and the BSD versions on my system.
I’ll do my best to explain and differentiate.
option | long-form | version | description |
---|---|---|---|
-c |
both | count the number of times the line appeared. | |
-d |
both | only output repeated lines. | |
-D |
--repeated |
GNU | print all repeated lines. |
-i |
--ignore-case |
both | read input as as insensitive. Note that --ignore-case only works on GNU uniq . |
-u |
--unique |
both | only print unique lines. |
-z |
--zero-terminated |
GNU | line delimiter is NUL, not newline. |
There are more options for the uniq
command and I encourage you to read the man pages and explore some more.
The examples that follow will give you a general idea of the uniq
command and how to use it in most situations.
Examples⌗
Here’s the basic usage for the uniq
command.
Notice how there are still duplicate lines in the output.
uniq distros.txt
Fedora
Ubuntu
Red Hat
Centos
Gentoo
Arch
openSUSE
Ubuntu
PureOS
Android
Gentoo
Ubuntu
Debian
Alpine
Clear
Using the sort
command, delivers a more desired result.
This will be the base for most of the commands that follow.
sort distros.txt | uniq
Alpine
Android
Arch
Centos
Clear
Debian
Fedora
Gentoo
PureOS
Red Hat
Ubuntu
openSUSE
Counting the number of occurrences for each item.
Piping another sort -n
at the end of this would put the output in order of occurrences ascending.
sort distros.txt | uniq -c
1 Alpine
1 Android
1 Arch
1 Centos
2 Clear
1 Debian
1 Fedora
2 Gentoo
1 PureOS
1 Red Hat
3 Ubuntu
1 openSUSE
Viewing only the lines that are not unique.
sort distros.txt | uniq -d
Clear
Gentoo
Ubuntu
On GNU versions of uniq
, you can view all the lines that have duplicates, not just the line itself.
sort distros.txt | guniq -D
Clear
Clear
Gentoo
Gentoo
Ubuntu
Ubuntu
Ubuntu
If you wanted to list only the items that are unique, use the -u
flag.
This way Ubuntu, Clear, and Gentoo will be omitted.
sort distros.txt | uniq -u
Alpine
Android
Arch
Centos
Debian
Fedora
PureOS
Red Hat
openSUSE
With the help of a few extra utilities we can find out the most repeated item in a list.
sort distros.txt | uniq -c | sort -rn | head -n | awk '{print $2}'
Ubuntu
References⌗
- [1] uniq | Wikipedia
- [2] uniq(1) - Linux man page | die.net