# Searching for Content from the Terminal

Finding content is probably the most useful CLI skill to acquire. It comes in handy all the time as a developer: perhaps you're chasing a bug, or just trying to look at a foreign codebase to figure out how a new feature works.

Searching through text files generally falls into two camps: Content-based, in which you're searching for specific content; and Attribute-based, in which you're searching for files that meet some criteria.

# Universal Approach vs Modern Tooling

The universal approaches for content-based and attribute-based searches is to use grep and find, respectively. They'll do in a pinch since they're already installed in most Linux environments.

However, there are two new tools vying to usurp those classics, they are ripgrep and fd. Compared to the classic tools, they are both:

  1. Written in Rust and target musl instead of glibc, so that makes them fully static binaries. That means you can copy those same binaries into ubuntu, centos, or arch and they should just work out of the box, with no installation step or dependencies. Good when you lack root access.
  2. Highly concurrent, using all your CPU cores by default rather than just one. This means more speed.
  3. Aware of git repos and by default don't scan your .git/ folder, and anything else in your .gitignore file.
    • No moreĀ find . -type f | egrep -v "\.git" | xargs grep -l "search_pattern"
  4. Using regex by default. (As opposed to fixed strings, or globs.)
  5. Use color by default.

Although fd does not allow for the complex querying that find allows, it can do all the most common use-cases more quickly (speed) and with less typing (ergonomic). And besides, if you really need the complexity of AND, OR, NOT statements then you're probably writing a script, and not just using tools adhoc. If that's the case, then the expressivity of native python or bash scripting will probably supersede the need for find expressions anyway.

# Attribute-based

The tool for attribute-based searchs is fd. The syntax is:

fd pattern [path] [-flags and options]

Does the filename contain this string?

The search pattern is the only required argument when calling fd. By default, it uses regular expressions so you can do these types of searches: '^starts_with', 'ends_with$' or use . as wildcards.

What if I'm only interested in python files.

You can filter by files having a specific extension with the -e option. So if you only want python files use -e py. Although you could add the extension to the search pattern with '(...)\.py$', but it's cleaner and more intuitive to keep it separate.

What files were modified this month?

Use the rather verbose --changed-within 1w option. You can specify time in minutes (m or min), hours (h), days (d), weeks (w), months (month), or years (y). You can also specify a specific timestamp in the verbose format yyyy-mm-dd hh:mm:ss.

Find the old files

Use the --changed-before 1d option.

How do I avoid binaries? related: How do I find files smaller/larger than X?

Search by size, using the -S or --size option. Text files are small so a common trick to avoid binaries is to search by files smaller than 20Kb -S -20k. Notice the negative 20? That means "smaller than". To search for files larger than use a + as in -S +20k. Units are bytes, kilobytes, megabytes, gigabytes.

How do I only search for files? Directories? Pipes?

To search for files use the -t f where f is files. To search for directories use -t d, for executable -t x, for empty files -t e, for symlinks -t l, for sockets -t s, and named pipes -t p

How do I exclude files that match a glob pattern?

Use -E glob*pattern.

# Content-based

The tool for content-based searches is ripgrep which is shortened to rg in the CLI invocation. The syntax is:

rg [-flags and options] pattern [path]

How do I find files that contain a function name?

Just use write the function name as the search pattern. By default, the search pattern uses regular expressions. If you need literal strings, to for example not have to escape code segments, then use -F.

How do I include/exclude files from the search? You can use multiple -g options, specifying the globs to include or exclude (prefix with !). By default files in your .gitignore, .git, or hidden files/directories are also not scanned. To search among the "hidden files", use -H or to ignore git "ignored" files use --no-ignore. If you want to specify additional files to ignore, you can use the --ignore-file PATH option, passing in a gitignore compatible file.

How can I view more of the surrounding code? Use the context option -C n, the default is 3 lines.

# Combined

When searching through a very large number of files, it is sometimes helpful to filter both on content and on attributes. To do this successfully you must first start with content-based search, using rg. Once you have perfected your search pattern you'll want to get rid of the context and only show the filepath to the matching files. This is accomplished by passing the lowercase L flag -l.

Now your results of matching files can be piped into fd which can narrow the results even further based on file-attributes. For example:

rg function_name -l | fd --changed-within 2d

# I have a filelist, Now what?

Once you have a list of files, you'll probably want to run some command using these files. Maybe you want to read the files with vim? Maybe you're trying to delete these files from your directory? Or copy them?

Well executables can either take a specific number of parameters or they can take an unlimited amount. Executables like mv or cp expect a set number of arguments. The first argument is the original filepath and the second is new filepath. For commands like those, you'll want to use the lowercase option -x COMMAND. If you have 1000 results, then the command will run 1000 times, each time being fed a different file from the list.

Commands like rm or vim can be fed the entire filelist at once. Rather than launching 1000x, they'll launch once, but iterate through all 1000 files. This is more efficient, when available. To feed the entire list at once, use the UPPERCASE option -X COMMAND.

fd . -x rm
# same as:
rm A
rm B
...
rm Z

While uppercase X is equivalent to:

fd . -X rm
# same as:
rm A B ... Z

# fd Cheatsheet

  • --changed-before [duration or timestamp] - files older than
  • --changed-within [duration or timestamp] - files newer than
  • -e extension - filter by file extension
  • -E glob_pattern - exclude those files
  • -S [-/+][n][unit] - files less than/greater than n Units
  • -t [f|d|x|e] - find only files, directories, executables, empty files,..
  • -x COMMAND - execute
  • -X COMMAND - batch execute

# rg Cheatsheet

  • -l (lowercase L) - only show matching files, no context
  • -v - invert match
  • -C n how many lines of context to give
  • -F LITERAL_STRING - match on literal string
  • -g GLOB_PATTERN - include/exclude(using !) files in the glob pattern
  • --no-ignore - don't use VCS gitignore