Understanding Grep Command in Linux

In Linux, the grep command is used for text searching. Whether processing logs, filtering files, or finding specific strings in a code repository, grep can perform remarkably well.

1. Basic Syntax

The basic format of the grep command is:grep [options] 'search pattern' [file]. For example, to search for the word “linux” in the run.log file, you can execute the command:grep 'linux' run.log.

2. Common Options

-i: Ignore case: When searching, it does not distinguish between uppercase and lowercase letters, so any target string, whether uppercase, lowercase, or mixed case, can be matched.
-v: Invert match: Outputs lines that do not match the specified search pattern, often used to filter out content that does not contain specific information.
-n: Show line numbers: While outputting matching lines, it also shows the line number in the file, making it easy to quickly locate the target content.
-r: Recursive search: Used for recursively searching files in the specified directory and all its subdirectories, allowing traversal of the entire directory tree to find the target string.
-l: Show only filenames: Only outputs the filenames that contain matching content, without displaying the specific matching lines, suitable for quickly understanding which files contain the target information.
-w: Match whole words: Ensures that only complete words are matched, not parts of words, to avoid false matches.
-c: Count matching lines: Outputs the number of lines containing matching content, instead of the specific matching lines, often used to quickly understand the frequency of the target string in the file. Runninggrep -c 'error' run.log, will returnrun.log the number of lines containing “error”, such as “25”, indicating that 25 lines contain that string.
-o: Show only matching parts: Only outputs the matched string parts, not the entire line, making it easy to extract specific information. If the file contains “ID: 12345”, executinggrep -o '[0-9]' run.log, will only output the matched numeric characters “1”, “2”, “3”, “4”, “5”.
-A num: Displays the matching line and the following (-A means after) num lines, which helps to view the context of the matching content.
-B num: Displays the matching line and the preceding (-B means before) num lines, which is the opposite of-A, it outputs the matching line and the num lines of content before it, also used to view the context of the matching content.
-C num: Displays the matching line and num lines before and after it (-C means context), combining the functions of-A and-B, outputs the matching line along with num lines before and after it, providing more comprehensive context information.

3. Enhanced by Regular Expressions

3.1 Basic Regular Expressions

^: Matches the start of a line

Explanation: Matches the beginning position of the line; only when the target string appears at the start of the line will it be matched.
Example: Executegrep '^begin' run.log, it will find the lines inrun.log that start with “begin”.

$: Matches the end of a line

Explanation: Matches the end position of the line; only when the target string appears at the end of the line will it be matched.
Example: Runninggrep 'end$' run.log, can find the lines that end with “end”.

.: Matches any single character

Explanation: Can match any single character, including letters, numbers, symbols, etc.
Example: Executegrep 'data_in._vld' run.log, will match lines like “data_in1_vld”, “data_int_vld”, “data_inp_vld”, etc.

*: Matches the previous character zero or more times

Explanation: Allows the previous character to appear any number of times, including zero.
Example: Runninggrep 'go*gle' run.log, can match “gle”, “gogle”, “goooogle”, etc.

[ ]: Matches a set of characters

Explanation: Matches any one character within the brackets.
Example: Executegrep 'data_[abc]' run.log, will match “data_a”, “data_b”, “data_c” signals, but will not match “data_d”.

[^]: Matches characters not in the set

Explanation: Opposite to[], matches any character not in the character set within the brackets.
Example: Runninggrep 'data_[^a-c]' run.log, will find other “data_” signals except for “data_a”, “data_b”, “data_c”.

3.2 Extended Regular Expressions

-E indicates using extended regular expressions for pattern matching.

+: Matches the previous character one or more times

Explanation: Requires the previous character to appear at least once, similar to* but does not include the case of zero appearances.
Example: Executegrep -E 'go+gle' run.log (note to add-E option to enable extended regex), will match “gogle”, “gooogle”, etc., but not “gle”.

?: Matches the previous character zero or one time

Explanation: The previous character either appears once or does not appear, i.e., the occurrence count is 0 or 1.
Example: Runninggrep -E 'colou?r' run.log, can match “color” and “colour”.

() : Grouping

Explanation: Treats the content within the parentheses as a whole, making it easier to apply the same operation or limitation to a group of characters.
Example: Executegrep -E '(red|blue) car' run.log, will match lines with “red car” and “blue car”.

{n,m}: Specify the range of occurrences

Explanation: Indicates that the previous character appears between n and m times (including n and m).
Example: Runninggrep -E 'a{2,4}' run.log, will match lines containing “aa”, “aaa”, “aaaa”.