wc command (or word-count), is another one of those gems that I find myself using more often in scripts than I expect.
The initial release was in November of ‘71 (48 years ago!) and a release has been part of the Free Software Foundation since 1985.
Perhaps even more amazing, is that it continues to be developed today! At this time of writing
wc has had three commits directly related to it within the past year.
I know, I know, not extremely active development. But still impressive nonetheless.
wc reads the specified file or and it can read from STDIN.
It will then display the number of lines, words, and bytes.
Along with the filename(s) if they were provided.
If more than one file is passed into
wc, then it will give a grand total at the end.
You have the option to limit the result to just the number of line, word, char, or byte in the file.
A quick note, here is what
wc considers to be a word:
…the wc utility was documented to define a word as a “maximal string of characters delimited by <space>, <tab> or <newline> characters”. – wc(1)
That means that
wc is great for ASCII letters and languages that use spaces.
However, texts that use other writing styles (lookin’ at you Japanese >.>) would need a proper morphological analyzer.
A quick example of what I mean.
“This beer tastes great” comes to 4 words, 23 bytes.
“このビールがおいしい” comes to 1 word, 31 bytes. When really its 3 words, one particle.
Throughout the examples, I plan on using the rockyou password data dump.
If you’d like to follow along w/ these examples, feel free to grab a copy.
Mind you, its a bit of a large file (133MB), so if you’d like, just write up a simple
A basic example with all its elements.
wc rockyou.txt 14344391 14445388 139921497 rockyou.txt ^ lines ^ words ^ bytes ^ filename
Let’s check how many passwords are actually in the list.
Notice the use of the
Making the text from
rockyou.txt act as standard input.
This way, the file name is omitted and we are left with only a number.
echo "there are $(wc -l < rockyou.txt) passwords in this text file." there are 14344391 passwords in this text file.
Turns out I wanted to know how many passwords contained spaces.
grep " " rockyou.txt | wc -l 70620
In case you ever needed to know the size of a file (in bytes).
ls -l and
wc -c produce the same size.
ls -l rockyou.txt -rw-r--r-- 1 user user 139921497 Feb 24 18:38 rockyou.txt wc -c rockyou.txt 139921497 rockyou.txt
If you run the GNU version of
wc, there’s an extension
-L which displays the count of the longest line in the file.
Note there is not some password in here that’s 285 characters long. I looked it up and its some long
wc -L rockyou.txt 285 rockyou.txt
<div>. Perhaps some XSS attempt back in the day, I’m not sure. But for those who care… here it is:
<div align=\\\\\\'center\\\\\\' style=\\\\\\'font:bold 11px Verdana; width:310px\\\\\\'><a style=\\\\\\'background-color:#eeeeee;display:block;width:310px;border:solid 2px black; padding:5px\\\\\\' href=\\\\\\'http://www.musik-live.net\\\\\\' target=\\\\\\'_blank\\\\\\'>Playing/Tangga
I hope you’ll give
wc a chance next time you need to write up some bash script, or will use it in a convenient one-liner.
I find myself usually using
wc whenever I need to count the output of something.
Whether its how many files I have in my Downloads folder
ls ~/Downloads | wc -l, or when I need to see how many web dynos I have running on heroku
heroku ps:exec --status -a not-a-real-app 2>/dev/null | grep -i "web\." | wc -l, I can always count on
-  wc (Unix) | Wikipedia
-  wc Commit History | Github
-  Source Code: wc | GNU.org
-  MeCab: Yet Another Part-of-Speech and Morphological Analyzer