Searching For and Extracting Data


grep

The grep command searches for files that contain a specified string and returns the name of the file and (if it’s a text file) the line containing that string.

  • can also use to search a specified file for a specified string
  • uses regular expressions
  • if a filename is not specified, it uses standard input
  • shell uses some characters for its own purposes, so you may need to enclose regex in quotes
    • e.g., | or *

find

The find command locates files using filename and file’s date stamps by searching through a specified directory tree.

  • tends to be slow because of the brute-force approach
  • can use multiple directory paths

wc

wc provides basic word statistics on text files.

  • e.g., wc newfile.txt
    • outputs: 37 59 1990 newfile.txt
    • 37 lines
    • 59 words
    • 1,990 bytes

cut

The cut command extracts text from fields in a file record.

  • frequently used to extract variable information from a file whose contents are highly patterned
  • to use:
    • pass to it one or more options that specify what information you want
    • followed by one or more filenames

sort

The sort command sorts information in a file.

  • sorts alphabetically by default with no options
  • e.g.,
$ sort pets.txt
bird
cat
dog
fish
  • no changes are made to the files data, only output is sorted

cat

The cat command displays text files on screen and can concatenate files together.

  • files themself are not modified, only output