Understanding Grep Command in Linux

In Linux, the grep command is used for text searching. Whether processing logs, filtering files, or finding specific strings in a code repository, grep can perform remarkably well.

1. Basic Syntax

The basic format of the grep command is:<span>grep [options] 'search pattern' [file]</span>. For example, to search for the word “linux” in the <span>run.log</span> file, you can execute the command:<span>grep 'linux' run.log</span>.

2. Common Options

  1. -i: Ignore case: When searching, it does not distinguish between uppercase and lowercase letters, so any target string, whether uppercase, lowercase, or mixed case, can be matched.

  2. -v: Invert match: Outputs lines that do not match the specified search pattern, often used to filter out content that does not contain specific information.

  3. -n: Show line numbers: While outputting matching lines, it also shows the line number in the file, making it easy to quickly locate the target content.

  4. -r: Recursive search: Used for recursively searching files in the specified directory and all its subdirectories, allowing traversal of the entire directory tree to find the target string.

  5. -l: Show only filenames: Only outputs the filenames that contain matching content, without displaying the specific matching lines, suitable for quickly understanding which files contain the target information.

  6. -w: Match whole words: Ensures that only complete words are matched, not parts of words, to avoid false matches.

  7. -c: Count matching lines: Outputs the number of lines containing matching content, instead of the specific matching lines, often used to quickly understand the frequency of the target string in the file. Running<span>grep -c 'error' run.log</span>, will return<span>run.log</span> the number of lines containing “error”, such as “25”, indicating that 25 lines contain that string.

  8. -o: Show only matching parts: Only outputs the matched string parts, not the entire line, making it easy to extract specific information. If the file contains “ID: 12345”, executing<span>grep -o '[0-9]' run.log</span>, will only output the matched numeric characters “1”, “2”, “3”, “4”, “5”.

  9. -A num: Displays the matching line and the following (-A means after) num lines, which helps to view the context of the matching content.

  10. -B num: Displays the matching line and the preceding (-B means before) num lines, which is the opposite of<span>-A</span>, it outputs the matching line and the num lines of content before it, also used to view the context of the matching content.

  11. -C num: Displays the matching line and num lines before and after it (-C means context), combining the functions of<span>-A</span> and<span>-B</span>, outputs the matching line along with num lines before and after it, providing more comprehensive context information.

3. Enhanced by Regular Expressions

3.1 Basic Regular Expressions

  1. ^: Matches the start of a line

  • Explanation: Matches the beginning position of the line; only when the target string appears at the start of the line will it be matched.

  • Example: Execute<span>grep '^begin' run.log</span>, it will find the lines in<span>run.log</span> that start with “begin”.

  • $: Matches the end of a line

    • Explanation: Matches the end position of the line; only when the target string appears at the end of the line will it be matched.

    • Example: Running<span>grep 'end$' run.log</span>, can find the lines that end with “end”.

  • .: Matches any single character

    • Explanation: Can match any single character, including letters, numbers, symbols, etc.

    • Example: Execute<span>grep 'data_in._vld' run.log</span>, will match lines like “data_in1_vld”, “data_int_vld”, “data_inp_vld”, etc.

  • *: Matches the previous character zero or more times

    • Explanation: Allows the previous character to appear any number of times, including zero.

    • Example: Running<span>grep 'go*gle' run.log</span>, can match “gle”, “gogle”, “goooogle”, etc.

  • [ ]: Matches a set of characters

    • Explanation: Matches any one character within the brackets.

    • Example: Execute<span>grep 'data_[abc]' run.log</span>, will match “data_a”, “data_b”, “data_c” signals, but will not match “data_d”.

  • [^]: Matches characters not in the set

    • Explanation: Opposite to<span>[]</span>, matches any character not in the character set within the brackets.

    • Example: Running<span>grep 'data_[^a-c]' run.log</span>, will find other “data_” signals except for “data_a”, “data_b”, “data_c”.

    3.2 Extended Regular Expressions

    -E indicates using extended regular expressions for pattern matching.

    1. +: Matches the previous character one or more times

    • Explanation: Requires the previous character to appear at least once, similar to<span>*</span> but does not include the case of zero appearances.

    • Example: Execute<span>grep -E 'go+gle' run.log</span> (note to add<span>-E</span> option to enable extended regex), will match “gogle”, “gooogle”, etc., but not “gle”.

  • ?: Matches the previous character zero or one time

    • Explanation: The previous character either appears once or does not appear, i.e., the occurrence count is 0 or 1.

    • Example: Running<span>grep -E 'colou?r' run.log</span>, can match “color” and “colour”.

  • () : Grouping

    • Explanation: Treats the content within the parentheses as a whole, making it easier to apply the same operation or limitation to a group of characters.

    • Example: Execute<span>grep -E '(red|blue) car' run.log</span>, will match lines with “red car” and “blue car”.

  • {n,m}: Specify the range of occurrences

    • Explanation: Indicates that the previous character appears between n and m times (including n and m).

    • Example: Running<span>grep -E 'a{2,4}' run.log</span>, will match lines containing “aa”, “aaa”, “aaaa”.

    Leave a Comment