Probably should have done this in opposite order, as a precursor to the uniq command. But who cares. If you’ve follwed yesterday’s post, then you’ve had some exposure to the sort command. Let’s dive into it and get some more use out of it!

Overview

Another old command, first released with the GNU coreutils in 1988 (according to source code copyright), and the first version released by Bell Labs in 1971! The purpose of the sort command is to sort or merge records (lines) of text and binary files. By default, the delimiter for the command is a space, but can be overridden in the options. Sort is made to be fast, so the main algorithm used is merge sort. It would be a sad day if it used bubble sort by default T_T.

As before, I’ll be using the distros.txt file from the uniq examples. There will be one change however, I’ll be adding some spaces in front of a couple of entries.

Fedora
Ubuntu
Centos
Gentoo
Arch
openSUSE
Ubuntu
PureOS
Android
Gentoo
 Ubuntu
Debian
Alpine
  Clear
Clear

Fun fact: The sort command is one of the few utilities that has multi-threaded components.

Usage

I won’t cover every aspect of the sort command as there are too many options to cover (24 short / 30 long). In the charts below, I’ve added in some of the more common use cases that you’ll prolly use sort for. Perhaps not as comprehensive as one would expect, but I haven’t found myself using the flags other than the these, YMMV. As always, I encourage you to check out the man page, man 1 sort, and explore the remaining options. The tables below show are all command line flags that can be passed, but I’ve separated them out into two categories.

Command Line Options

option long form description
-o --output rather than print to STDOUT, print to file
-u --unique does not print duplicated lines. similar to sort | uniq however, this was not a base feature and was added in after uniq was created.

Sorting Options

option long form description
-b --ignore-leading-blanks don’t treat leading spaces when sorting
-f --ignore-case compare using lower case letters instead of actual lettering
-g --general-numeric-sort unlike -n, can sort using floating point numbers
-n --numeric-sort sort numerically
-r --reverse sort in descending order
-R --random-sort randomizes input by hashing values. uses /dev/random for hashing.

Examples

In a basic example, it might not be the result you’d expect. Notice how Clear is first and openSUSE is at the end. Remember that a space is considered lower than most ASCII values and is why its at the beginning of the list. In ASCII, capital letters come before lower case ones (numeric value), which is why openSUSE is at the end.

sort distros.txt
  Clear
 Ubuntu
Alpine
Android
Arch
Centos
Clear
Debian
Fedora
Gentoo
Gentoo
PureOS
Ubuntu
Ubuntu
openSUSE

We can rectify the space issue with the -b flag. The entries with a leading space are still printed first, but they are in order with the rest of the data. Although, openSUSE is still at the bottom of the list. Makes sense, but I would like to make it sorted with the rest of the entries.

Alpine
Android
Arch
Centos
  Clear
Clear
Debian
Fedora
Gentoo
Gentoo
PureOS
 Ubuntu
Ubuntu
Ubuntu
openSUSE

This is probably the most common way I use sort.

sort -bf distros.txt
Alpine
Android
Arch
Centos
  Clear
Clear
Debian
Fedora
Gentoo
Gentoo
openSUSE
PureOS
 Ubuntu
Ubuntu
Ubuntu

…unless I’m trying to sort numerically. Let’s take an example from the uniq page and count the instances. We can then pass the -n option and sort will order by the lines in ascending order. It’s not clear from this example, but I’ll put a table below showing how sort interacts with numbers normally.

sort -bf distros.txt | uniq -c | sort -n
   1   Clear
   1  Ubuntu
   1 Alpine
   1 Android
   1 Arch
   1 Centos
   1 Clear
   1 Debian
   1 Fedora
   1 PureOS
   1 openSUSE
   2 Gentoo
   2 Ubuntu

sort sort -n
1 1
10 2
2 10
200 200

Using the previous example, let’s make it so the it shows the most common lines first. Get that list into descending order.

sort -bf distros.txt | uniq -c | sort -n -r
   2 Ubuntu
   2 Gentoo
   1 openSUSE
   1 PureOS
   1 Fedora
   1 Debian
   1 Clear
   1 Centos
   1 Arch
   1 Android
   1 Alpine
   1  Ubuntu
   1   Clear

We can show only unique values by passing the -u option. This is the same as running sort distros.txt | uniq. I find myself using the latter rather than the former. There might be an instance where sort -u is not an option for that version of sort, but chances are uniq is a command on the system. Might not always be the case, but I want to believe its more compatible. Notice how Clear is displayed in the output but Ubuntu is not. I’m not sure why this is happens, I haven’t dived into the code to figure it out.

sort -bfu distros.txt
Alpine
Android
Arch
Centos
  Clear
Debian
Fedora
Gentoo
openSUSE
PureOS
Ubuntu

If you’re looking to shuffle stuff up, use the random option!

sort -R distros.txt
Clear
openSUSE
Alpine
Android
Centos
Debian
Arch
Ubuntu
Ubuntu
Fedora
 Ubuntu
Gentoo
Gentoo
  Clear
PureOS

Finally, let’s save our sorted output to a file. You could always use the > if you’re feeling spicy.

sort -bfu distros.txt --output=distros-sorted.txt
ls
distros-sorted.txt distros.txt

References