Overview

Boy oh boy, where to begin with grep. This one took me down a rabbit hole. Not the examples, or how-to-use, but the history, which I’ll get to in a moment. grep is one of those tools that I feel has always been there; like with ls and cd. It is a command I’ve come to rely on for almost anything I do on the command line. If you haven’t used grep before, let me say this: grep is powerful, like SUPER powerful. It’s so vast and complex that people have written multiple books about it. So before we dive in, remember that this is an overview. It won’t cover everything, because I cant. But I hope to give you some building blocks to which you can use to build greater things.

Now for a Bit of History

The more I dove into the history of grep the more I kept digging, as I found it extremely captivating. You might not be interested in it at all >.> so before we dive into the history, I’m giving you a chance to click on a link to a new section. Jump to usage, or maybe you want to jump to grep alternatives and variants, or you can just skip right to the examples.

You still here? Last chance! Okay…here we go.


The original utility was created by Ken Thompson back at Bell Labs in 1975. The man page for it is still available online. While not true, grep does have a history of being created overnight, but we’ll circle back to that in a moment.

Part of the core functionality for grep was part of the ed text editor (also developed by Ken Thompson). The issue with ed and searching for regular expressions is that you couldn’t do it with large files due to the fact that the data was held in memory. Thompson had the idea to strip that functionality out into its own separate program. He kept it to himself and called it s (for search) or something similar (classic Unix command names). When Unix was being heavily developed, Thompson didn’t want to be the benevolent dictator for life. He wanted to make sure that there as a consensus, or an agreement between the other developers of what would be put in the /bin directory. So he kept it to himself, in his own directory.

His boss, Doug McIlroy, came to him and said it would be wonderful if there was a program to search through files. Ken said he think about it over over night, and would try to come up with something. >.> He took his code, and fixed up the bugs he meant to fix. About an hours worth of work, and the following day presented it to Doug McIlroy. Ecstatic about this, it was put in the /bin directory.

Afterwards, an actual name for the program needed to be created. This comes back to the ed text editor. In ed, there’s command g for global. So you can type g/stuff/d and it will search globally, find stuff and the d is for delete. If you can see where this is going, you can use p for print, and stuff can be any regular expression. So it comes to g/re/p global / regular expression / print, and thus the name stuck.

Remember, this is for the Unix operating system. This is proprietary software. Now comes along the GNU Foundation. They want to make a free (as in freedom) and open source replacement for Unix software without copying its proprietary source code. I don’t know too much history about GNU grep but know that Mike Haertel created the GNU version sometime in 1988. However, there is a mention of grep in the January 1987 issue of GNU’s Bulletin.

Then comes the BSD version of grep. I know, I know. If you’re curious about which version you’re using you can type grep -v and it will tell you if you’re using a BSD version or the GNU version. The earliest I can find this version is ~2001 from the OpenBSD repo. The version of grep in FreeBSD comes from a project called freegrep, created by James Howard, and the OpenBSD version. As of this writing, grep on FreeBSD has almost exactly the same functionality as GNU grep, except it’s under a BSD licence instead of the GPL.

If you want to know more, I recommend you check out this article by Benjamin Rualthanzauva, as he had a few correspondence with Ken and a few others. There’s also a wonderful interview with Ken Thompson. I’ve queued it up to the part about grep but the whole interview is fascinating. There’s also a great segment from Computerphile on grep as well with Brian Kernighan

Alternatives and Variants

Because of its huge success ੭•̀ω•́)੭̸*✩⁺˚ There are bound to be similar programs trying to solve the same problem in different ways. I’ve put a list of a few different programs that you might want to look into after you’ve tried your hand at grep. Who knows, maybe one of them satisfies a use-case that grep cannot.

  • egrep: stands for Extended grep. This version had a better mechanism when it came to searching for regular expressions. While you can still find egrep on your system, its functionality has been implemented into the standard tool using grep -E.
  • fgrep: stands for Fixed grep. This was a version of grep that focused solely on matching strings, as it did not recognize regular expressions. Similar to egrep the functionality of fgrep has also been incorporated and can be invoked with grep -F.
  • git grep: a git command that is very similar to grep. The key difference is that it will only search through files that are tracked by the git repo. It has other git only features that can be incorporated into the search.
  • ugrep: universal grep. It considers itself a high-performance file system search utility. It also has the ability to search through compressed archives, eliminating the need for tools like zgrep, bzgrep, or xzgrep (I’ll cover these in a different article).
  • ack: a search tool designed for developers. It is best used over large heterogeneous trees of source code.
  • The Silver Searcher: also known as ag. Is a A code searching tool similar to ack, with a focus on speed.
  • ripgrep: or the rg command. Fastest searching tool. Check out the benchmarks on their Github page.

There are a lot of reasons as to why you would want to choose one of these other tools over grep. Especially if you’re a heavy user of regexps and the command line. Just remember, that if you’re using a default system, grep will always be there while these other tools probably will not. Feel free to use these, but it’s always good to have a foundation, so I encourage you to keep on reading, and try out grep before you jump ship.

GNU grep vs BSDgrep

Just like when I covered yes, the GNU version seems to be faster than the version of grep that’s on my macOS machine. According to one user, GNU grep can be up to ten times faster than its macOS counterpart. Wondering why that is? Fourtnately, Mike Haertel (author of GNU grep) commented on the FreeBSD mailing list to shed some insight on the topic. I recommend reading the entire post, but I’ll quickly sum up why it’s so fast. First, GNU grep doesn’t read every byte! Crazy right? Second, for every byte that it does read, the machine code executed is very minimal. I’m not sure how this is done, but I’m guessing there are some compilation optimizations in place. Third, it uses the Boyer–Moore string-search algorithm, and unrolls it’s inner loop. Finally, it uses Unix sockets for raw input, and avoids copying data after it has been read. You put that all together, and you have a very quick program!

Usage

Like most *nix tools of those days, grep is able to read data from STDIN as well as file(s), which means it’s useful on its own and within a pipe chain. The basic usage is like this:

grep expression files

grep also comes with a multitude of options and there’s a lot to cover.

grep
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
	[-e pattern] [-f file] [--binary-files=value] [--color=when]
	[--context[=num]] [--directories=action] [--label] [--line-buffered]
	[--null] [pattern] [file ...]

To not make this post any longer than it already is, I’ll go over the options in the cover photo above. For a quick refresher in case you didn’t pay attention.

Flag Description Explanation/Example
-i search case insensitive void == VOID == vOId
-r recursive search searches through all files and directories
-l display only names of files Even if there are multiple matches per file, -l will only return the filename once
-E interpret pattern as an extended regular expression if you’re used to regular expressions and prefer using something like -E 'voi+' instead of 'voi*'. Otherwise the regex chars need to be prepended with a \
-v selected lines are those not matching grep -v RUN Dockerfile prints the Dockerfile without any RUN commands
-F interpret pattern as is a*b will be treated exactly how it is; no regex, only instances where those chars appear together
-C combination of -A and -B using -C will show lines before and after a match, by default shows two lines, but has an optional number argument
-A show context after match using -A 2 would show the line containing the match, and then show the next two lines in the file
-B show context before match using -B 2 would show the line containing the match, and then show the previous two lines in the file
-o prints only the matching part of a line normally grep will print out the entire line that matches. While not so useful for specific words, it’s very helpful with regexps where the matches will be piped into another utility.
-a treat all files as ASCII text if binary files match, grep will normally print Binary file ... matches, instead this will force to show the match

Examples

If you want to follow along with these examples, you can clone the pkg/rock repo from Github.

git clone git@github.com:pkg/rock && cd rock

Let’s start with a basic search for apache in the LICENSE file.

grep apache LICENSE
                        http://www.apache.org/licenses/
       http://www.apache.org/licenses/LICENSE-2.0

That doesn’t look right. We found a few instances with apache in it, but I know this project is under the Apache licence. There should be more instances within the file…like the title. Using the -i we can search for apache in any form.

grep -i apache LICENSE
                                 Apache License
                        http://www.apache.org/licenses/
   APPENDIX: How to apply the Apache License to your work.
      To apply the Apache License to your work, attach the following
   Licensed under the Apache License, Version 2.0 (the "License");
       http://www.apache.org/licenses/LICENSE-2.0

Much better! Usually within projects like these, the licence is printed as header in the files. Searching recursively, we can check all the files to see if it’s displayed. To make sure the match we find is at the top of the file, we can use the -n flag to display line numbers. One more thing, let’s use the --include flag to say we want grep to only search in .go files.

NOTE: When using the -n flag, the default algorithm grep uses is changed, and this can cause a performance decrease. However, this is a very small repo, so it’s nothing to worry about, but keep that in consideration if you’re searching through gigs of data.

grep -rin apache --include="*.go" *
cmd/search.go:2:// Licensed under the Apache License, Version 2.0 (the "License");
cmd/search.go:6://     http://www.apache.org/licenses/LICENSE-2.0
cmd/tag.go:2:// Licensed under the Apache License, Version 2.0 (the "License");
cmd/tag.go:6://     http://www.apache.org/licenses/LICENSE-2.0
cmd/root.go:2:// Licensed under the Apache License, Version 2.0 (the "License");
cmd/root.go:6://     http://www.apache.org/licenses/LICENSE-2.0
main.go:2:// Licensed under the Apache License, Version 2.0 (the "License");
main.go:6://     http://www.apache.org/licenses/LICENSE-2.0

A bit rough and kind of daunting if we only wanted to see the files that actually contained apache. We can limit it in such a way using the -l option

grep -ril apache *
LICENSE
cmd/search.go
cmd/tag.go
cmd/root.go
main.go

Okay okay, that’s some of the easy stuff, let’s get into something a little more fun. I want to say right now, that I am not great at regex. So don’t flame me if my basic knowledge isn’t the best, it’ll work for demonstration purposes.

For example, let’s say we wanted to find all the function definitions within this source code. A quick grep -r func * could do it, but that will find inline functions as well. To narrow the search down further, I want to find all the private functions throughout the code. Since Go uses lowercase letters at the beginning of function names to make them private, we can use a simple regex to help us out. Let’s use the -E option as well. For this one, I’m going to also use the --colour=auto flag so you can see what grep is matching.

grep --colour=auto -r -E 'func [a-z].[A-Za-z0-9_]+' *
cmd/search.go:func doSearch(p PackageSearchQuery) error {
cmd/search.go:func init() {
cmd/tag.go:func init() {
cmd/tag.go:func tag(repo *git.Repository, major, minor, patch uint64) error {
cmd/tag.go:func deleteTag(repo *git.Repository, tag string) error {
cmd/tag.go:func tagString(major, minor, patch uint64) string {
cmd/root.go:func init() {
cmd/root.go:func initConfig() {
main.go:func main() {

Let’s clean up our results a little bit further, because we only care about the pattern that matches, not the parameters. We can do that by using the -o flag

grep --colour=auto -ro -E 'func [a-z].[A-Za-z0-9_]+' *
cmd/search.go:func doSearch
cmd/search.go:func init
cmd/tag.go:func init
cmd/tag.go:func tag
cmd/tag.go:func deleteTag
cmd/tag.go:func tagString
cmd/root.go:func init
cmd/root.go:func initConfig
main.go:func main

I know that init functions and main functions are often used (well main has to be…) so I don’t want them in my results. We could do some fancy cut and then do a sort -u, but instead we can use grep -v to remove them. I’m also using grep -e. This allows for multiple expressions to be matched.

grep -ro -E 'func [a-z].[A-Za-z0-9_]+' * | grep -v -e main -e init
cmd/search.go:func doSearch
cmd/tag.go:func tag
cmd/tag.go:func deleteTag
cmd/tag.go:func tagString

Exercises for the Reader

Because that sounds more fun than typing out a few more grep commands.

  • The examples above found all the private functions within the code base, see if you can rework the regex so it shows only public facing functions. Hint: There should only be one.

    Answer
    grep --colour=auto --include='*.go' -roE 'func [A-Z][a-zA-Z0-9_]+' .
    

  • How would you rework grep -ri client. to return only cmd/search.go: resp, err := client.Do(req)? Why is it returning two results?

    Answer
    grep -riF client. *
    

    With regular expressions the . is considered a wildcard character matching anything. If you get into file globing, you’ll notice . can be used the same way.

  • Usually, there is a comment above a function to give a brief description of what it does or why it’s there. How would you go about showing code above what is matched? How would you show only comments?

    Answer
    # Above (or before)
    grep -r -B2 -E '^func' .
    
    # Only comments
    grep -rE "^(//|/\*(.*)\*/)" *
    

    Sadly, grep has an issue with searching across multiple lines. The answer above will find comments starting with // and inline comments like /* this is a comment */.

Few Extra Tidbits

Sometimes people use grep in mysterious ways. During my research, I found someone using grep to recover deleted files on their system. This is certainly the first time I’ve seen something like this, but apparently not the first time something like this has been done. Either way, to me it was extraordinary.

Brian Kernighan teaches a course at Princeton and one of the assignments is to take the source code of ed.c and turn it into grep. For those of you brave enough, or in search of a fun challenge, here is a link to the assignment and a link to the ed source code. According to Brian, his students “a couple of advantages…”

First, they new what the target was. Somebody had already done grep, so they knew what it was supposed to look like. They only had to replicate that behaviour. And the other thing was, it was written in C. The original grep was written in PDP-11 assembly language. Of course they all had one grave disadvantage… none of them were Ken Thompson.

  Brian Kernighan

Conclusion

I find where grep really shines is in scripts and when parsing data. This is a command I use daily in my job as a software developer; whether it be searching through source code, automating some tasks, or trying to find a random file I forgot about where I can only remember a few key words and I have to search through the entire /home directory…phew! Let me just say this, if you takeaway anything from this article, it’s that you should just download that image from above. I was unsure of how many examples to provide, and what flags to cover. When I saw that image, I felt like that was perfect amount. They are all flags that I find myself using, especially when I’m grep-ing through some source code. There will be some corner cases where you’ll need to dive deeper into the usage of grep and regexps! I’ve only scratched the surface on what this wonderful program can do, but I’m hoping this will get you going and you find yourself using it more often.

References