Overview

The wc command (or word-count), is another one of those gems that I find myself using more often in scripts than I expect. The initial release was in November of ‘71 (48 years ago!) and a release has been part of the Free Software Foundation since 1985. Perhaps even more amazing, is that it continues to be developed today! At this time of writing wc has had three commits directly related to it within the past year. I know, I know, not extremely active development. But still impressive nonetheless.

Usage

wc reads the specified file or and it can read from STDIN. It will then display the number of lines, words, and bytes. Along with the filename(s) if they were provided. If more than one file is passed into wc, then it will give a grand total at the end. You have the option to limit the result to just the number of line, word, char, or byte in the file. A quick note, here is what wc considers to be a word:

…the wc utility was documented to define a word as a “maximal string of characters delimited by <space>, <tab> or <newline> characters”. – wc(1)

That means that wc is great for ASCII letters and languages that use spaces. However, texts that use other writing styles (lookin’ at you Japanese >.>) would need a proper morphological analyzer. A quick example of what I mean.

“This beer tastes great” comes to 4 words, 23 bytes.

“このビールがおいしい” comes to 1 word, 31 bytes. When really its 3 words, one particle.

Examples

Throughout the examples, I plan on using the rockyou password data dump. If you’d like to follow along w/ these examples, feel free to grab a copy. Mind you, its a bit of a large file (133MB), so if you’d like, just write up a simple .txt file.

A basic example with all its elements.

wc rockyou.txt
 14344391 14445388 139921497 rockyou.txt
 ^ lines  ^ words  ^ bytes   ^ filename

Let’s check how many passwords are actually in the list. Notice the use of the < redirect. Making the text from rockyou.txt act as standard input. This way, the file name is omitted and we are left with only a number.

echo "there are $(wc -l < rockyou.txt) passwords in this text file."
there are  14344391 passwords in this text file.

Turns out I wanted to know how many passwords contained spaces.

grep " " rockyou.txt | wc -l
   70620

In case you ever needed to know the size of a file (in bytes). ls -l and wc -c produce the same size.

ls -l rockyou.txt
-rw-r--r-- 1 user user 139921497 Feb 24 18:38 rockyou.txt
wc -c rockyou.txt
 139921497 rockyou.txt

If you run the GNU version of wc, there’s an extension -L which displays the count of the longest line in the file.

wc -L rockyou.txt
285 rockyou.txt
Note there is not some password in here that’s 285 characters long. I looked it up and its some long <div>. Perhaps some XSS attempt back in the day, I’m not sure. But for those who care… here it is:

<div align=\\\\\\'center\\\\\\' style=\\\\\\'font:bold 11px Verdana; width:310px\\\\\\'><a style=\\\\\\'background-color:#eeeeee;display:block;width:310px;border:solid 2px black; padding:5px\\\\\\' href=\\\\\\'http://www.musik-live.net\\\\\\' target=\\\\\\'_blank\\\\\\'>Playing/Tangga

Conclusion

I hope you’ll give wc a chance next time you need to write up some bash script, or will use it in a convenient one-liner. I find myself usually using wc whenever I need to count the output of something. Whether its how many files I have in my Downloads folder ls ~/Downloads | wc -l, or when I need to see how many web dynos I have running on heroku heroku ps:exec --status -a not-a-real-app 2>/dev/null | grep -i "web\." | wc -l, I can always count on wc.

References