Detailed Explanation of the Linux grep Command

(Click the public account above to quickly follow)

Source:ggjucheng

Link: http://www.cnblogs.com/ggjucheng/archive/2013/01/13/2856896.html

Introduction

grep (global search regular expression (RE) and print out the line) is a powerful text search tool that can search text using regular expressions and print out the matching lines.

The grep family in Unix includes grep, egrep, and fgrep. The commands egrep and fgrep differ only slightly from grep. egrep is an extended version of grep that supports more RE metacharacters, while fgrep stands for fixed grep or fast grep, treating all characters as literals, meaning that the metacharacters in the regular expression are interpreted literally and are no longer special. Linux uses the GNU version of grep, which is more powerful and can utilize the functionalities of egrep and fgrep through the command line options -G, -E, and -F.

Common Usage of grep

[root@www ~]# grep [-acinv] [–color=auto] ‘search string’ filename

Options and parameters:

-a: Search binary files as text files

-c: Count the occurrences of ‘search string’

-i: Ignore case differences, treating upper and lower case as the same

-n: Output line numbers as well

-v: Invert selection, displaying lines that do not contain ‘search string’!

–color=auto: Can highlight the found keywords!

Extract lines containing root from /etc/passwd

# grep root /etc/passwd

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

or

# cat /etc/passwd | grep root

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

Extract lines containing root from /etc/passwd and display their line numbers

# grep -n root /etc/passwd1:root:x:0:0:root:/root:/bin/bash30:operator:x:11:0:operator:/root:/sbin/nologin

For keyword highlighting, grep can use –color=auto to highlight the keyword part. This is a very nice feature! However, if you have to manually add –color=auto every time you use grep, it can be cumbersome. In this case, a useful alias can help! You can add this line to ~/.bashrc: ‘alias grep=’grep –color=auto” and then run ‘source ~/.bashrc’ to make it effective immediately! This way, every time you run grep, it will automatically add color highlighting!

Extract lines from /etc/passwd that do not contain root

# grep -v root /etc/passwdroot:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

Extract lines from /etc/passwd that do not contain root and nologin

# grep -v root /etc/passwd | grep -v nologin
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

Use dmesg to list kernel information, then use grep to find lines containing eth, highlighting the captured keywords and adding line numbers:

[root@www ~]# dmesg | grep -n --color=auto 'eth'247:eth0: RealTek RTL8139 at 0xee846000, 00:90:cc:a6:34:84, IRQ 10248:eth0: Identified 8139 chip type 'RTL-8139C'294:eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1305:eth0: no IPv6 routers present
# You will find that besides eth being highlighted, the line numbers are also displayed!

In terms of keyword highlighting, grep can use –color=auto to highlight the keyword part. This is a very nice feature! However, if you have to manually add –color=auto every time you use grep, it can be cumbersome. In this case, a useful alias can help! You can add this line to ~/.bashrc: ‘alias grep=’grep –color=auto” and then run ‘source ~/.bashrc’ to make it effective immediately! This way, every time you run grep, it will automatically add color highlighting!

Use dmesg to list kernel information, then use grep to find lines containing eth, capturing the two lines before and after the keyword:

[root@www ~]# dmesg | grep -n -A3 -B2 --color=auto 'eth'245-PCI: setting IRQ 10 as level-triggered246-ACPI: PCI Interrupt 0000:00:0e.0[A] -> Link [LNKB] ...247:eth0: RealTek RTL8139 at 0xee846000, 00:90:cc:a6:34:84, IRQ 10248:eth0: Identified 8139 chip type 'RTL-8139C'249-input: PC Speaker as /class/input/input2250-ACPI: PCI Interrupt 0000:00:01.4[B] -> Link [LNKB] ...251-hdb: ATAPI 48X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cache, UDMA(66)
# As shown, you will find that the two lines before and three lines after the keyword are also displayed! This allows you to capture surrounding data for analysis!

Recursively search directories based on file content

# grep 'energywise' *           # Search for files containing 'energywise' in the current directory

# grep -r 'energywise' *        # Search for files containing 'energywise' in the current directory and its subdirectories

# grep -l -r 'energywise' *     # Search for files containing 'energywise' in the current directory and its subdirectories, but do not display matching lines, only show matching files

These commands are very useful and are powerful tools for file searching.

grep and Regular Expressions

Character Classes

Character class search: If I want to search for the words test or taste, I can find that they share a common pattern ‘t?st’. In this case, I can search like this:

[root@www ~]# grep -n 't[ae]st' regular_express.txt8:I can't finish the test.9:Oh! The soup taste good.

Actually, regardless of how many bytes are in [], they represent a single byte, so the above example indicates that I need the strings ‘tast’ or ‘test’ only!

Character class negation [^]: If I want to search for lines containing oo but do not want g before oo, as shown below:

[root@www ~]# grep -n '[^g]oo' regular_express.txt2:apple is my favorite food.3:Football game is not use feet only.18:google is the best tools for search keyword.19:goooooogle yes!

Lines 2 and 3 are fine because foo and Foo are both acceptable! However, line 18 contains google’s goo, but don’t forget that it also has the word tool’s too! Therefore, this line is also listed. This means that even though line 18 contains the unwanted item (goo), it also contains the needed item (too), so it meets the string search criteria!

As for line 19, similarly, because goooooogle contains oo, which may be preceded by o, for example: go(ooo)oogle, this line also meets the requirement!

Character class continuity: Now, suppose I do not want lowercase letters before oo, I can write [^abcd….z]oo, but this seems inconvenient. Since lowercase letters are sequential in ASCII encoding, we can simplify it as follows:

[root@www ~]# grep -n '[^a-z]oo' regular_express.txt3:Football game is not use feet only.

This means that when we have a set of characters that are continuous, such as uppercase letters/lowercase letters/numbers, we can use [a-z], [A-Z], [0-9], etc. If our required string includes both numbers and letters, we can write them all together as [a-zA-Z0-9].

To obtain lines containing numbers, we can do this:

[root@www ~]# grep -n '[0-9]' regular_express.txt5:However, this dress is about $ 3183 dollars.15:You are the best is mean you are the no. 1.

Start and End of Line Characters ^ $

Start of line character: If I want to list the word the only at the start of the line, I need to use positioning characters! We can do this:

[root@www ~]# grep -n '^the' regular_express.txt12:the symbol '*' is represented as start.

At this point, only line 12 remains because only line 12 starts with the word the. Additionally, if I want to list lines that start with lowercase letters, I can do this:

[root@www ~]# grep -n ‘^[a-z]’ regular_express.txt

2:apple is my favorite food.

4:this dress doesn’t fit me.

10:motorcycle is cheap than car.

12:the symbol ‘*’ is represented as start.

18:google is the best tools for search keyword.

19:goooooogle yes!

20:go! go! Let’s go.

If I do not want lines that start with letters, I can do this:

[root@www ~]# grep -n '^[^a-zA-Z]' regular_express.txt1:"Open Source" is a good mechanism to develop programs.21:# I am VBird

The ^ symbol has different meanings inside and outside of character class symbols ([])! Inside [] it represents negation, while outside [] it represents positioning at the start of the line!

If I want to find lines that end with a period (.):

[root@www ~]# grep -n '\.$' regular_express.txt1:"Open Source" is a good mechanism to develop programs.2:apple is my favorite food.3:Football game is not use feet only.4:this dress doesn't fit me.10:motorcycle is cheap than car.11:This window is clear.12:the symbol '*' is represented as start.15:You are the best is mean you are the no. 1.16:The world &lt;Happy&gt; is the same with "glad".17:I like dog.18:google is the best tools for search keyword.20:go! go! Let's go.

Note that because the period has other meanings (which will be introduced later), we must use the escape character (\) to remove its special meaning!

Find blank lines:

[root@www ~]# grep -n '^$' regular_express.txt22:

Because there is only the start and end of the line (^$), this can find blank lines!

Any Character . and Repeated Character *

These two symbols have the following meanings in regular expressions:

. (dot): Represents 'there must be one arbitrary byte'; * (asterisk): Represents 'repeat the previous character, 0 to infinite times', forming a combination

Suppose I need to find the string g??d, which has four bytes, starting with g and ending with d, I can do this:

[root@www ~]# grep -n 'g..d' regular_express.txt1:"Open Source" is a good mechanism to develop programs.9:Oh! The soup taste good.16:The world &lt;Happy&gt; is the same with "glad".

This emphasizes that there must be two bytes between g and d, so lines 13 and 14 (god and gd) will not be listed!

If I want to list data with oo, ooo, oooo, etc., meaning at least two (including) o’s, how should I do it? Since * represents ‘0 or more of the preceding RE character’, ‘o*’ means ‘having an empty byte or one or more o’s’, so ‘grep -n ‘o*’ regular_express.txt’ will print all data to the terminal!

When we need ‘at least two o’s’, we need ooo*, which means:

[root@www ~]# grep -n 'ooo*' regular_express.txt1:"Open Source" is a good mechanism to develop programs.2:apple is my favorite food.3:Football game is not use feet only.9:Oh! The soup taste good.18:google is the best tools for search keyword.19:goooooogle yes!

If I want a string that starts and ends with g, but only has at least one o in between, such as gog, goog, gooog, etc., how should I do it?

[root@www ~]# grep -n 'goo*g' regular_express.txt18:google is the best tools for search keyword.19:goooooogle yes!

If I want to find lines that start with g and end with g, with any characters in between:

[root@www ~]# grep -n 'g.*g' regular_express.txt1:"Open Source" is a good mechanism to develop programs.14:The gd software is a library for drafting programs.18:google is the best tools for search keyword.19:goooooogle yes!20:go! go! Let's go.

This represents that g starts and ends, and any characters in between are acceptable, so lines 1, 14, and 20 are acceptable!

If I want to find lines with ‘any number’, since there are only numbers, it becomes:

[root@www ~]# grep -n '[0-9][0-9]*' regular_express.txt5:However, this dress is about $ 3183 dollars.15:You are the best is mean you are the no. 1.

Limiting the Range of RE Character Repetitions {}

We can use . and RE characters and * to configure 0 to infinite repetitions, but if I want to limit the number of repetitions within a range, how should I do it?

For example, if I want to find strings with two to five consecutive o’s, how should I do it? In this case, I need to use the limiting range character {}. However, since { and } have special meanings in the shell, we must use the character \ to make them lose their special meaning. The syntax for {} is as follows: if I want to find two o’s, it can be:

[root@www ~]# grep -n 'o\{2\}' regular_express.txt1:"Open Source" is a good mechanism to develop programs.2:apple is my favorite food.3:Football game is not use feet only.9:Oh! The soup taste good.18:google is the best tools for search ke19:goooooogle yes!

If we want to find g followed by 2 to 5 o’s, then followed by a g, it would be like this:

[root@www ~]# grep -n 'go\{2,5\}g' regular_express.txt18:google is the best tools for search keyword.

If I want at least 2 o’s in goooo….g, besides using gooo*g, I can also do:

[root@www ~]# grep -n 'go\{2,\}g' regular_express.txt18:google is the best tools for search keyword.19:goooooogle yes!

Extended grep (grep -E or egrep):

The main benefit of using extended grep is the addition of extra regular expression metacharacters.

Print all lines containing NW or EA. If not using egrep, but grep, there will be no results.

# egrep 'NW|EA' testfile     
northwest       NW      Charles Main        3.0     .98     3       34
eastern         EA      TB Savage           4.4     .84     5       20

For standard grep, if you add \ before the extended metacharacters, grep will automatically enable the extended option -E.

#grep 'NW\|EA' testfile
northwest       NW      Charles Main        3.0     .98     3       34eastern         EA      TB Savage           4.4     .84     5       20

Search for all lines containing one or more 3’s.

# egrep '3+' testfile
# grep -E '3+' testfile
# grep '3\+' testfile        
# These 3 commands will return
northwest       NW      Charles Main          3.0     .98     3       34western         WE      Sharon Gray           5.3     .97     5       23northeast       NE      AM Main Jr.           5.1     .94     3       13central         CT      Ann Stephens          5.7     .94     5       13

Search for all lines containing 0 or 1 decimal point characters.

# egrep '2\.?[0-9]' testfile 
# grep -E '2\.?[0-9]' testfile
# grep '2\.?[0-9]' testfile 
# First contains the character 2, followed by 0 or 1 point, and then a number between 0 and 9.
western         WE       Sharon Gray          5.3     .97     5       23southwest       SW      Lewis Dalsass         2.7     .8      2       18eastern         EA       TB Savage             4.4     .84     5       20

Search for one or more consecutive no’s.

# egrep '(no)+' testfile
# grep -E '(no)+' testfile
# grep '\(no\)\+' testfile   # These 3 commands return the same results,
northwest       NW      Charles Main        3.0     .98     3       34northeast       NE       AM Main Jr.        5.1     .94     3       13north           NO      Margot Weber        4.5     .89     5       9

Without Using Regular Expressions

fgrep queries are faster than grep commands but are less flexible: it can only find fixed text, not regular expressions.

If you want to find lines containing the asterisk character in a file or output:

fgrep  '*' /etc/profile
for i in /etc/profile.d/*.sh ; do

or
grep -F '*' /etc/profile
for i in /etc/profile.d/*.sh ; do

【Today’s Recommended WeChat Public Account↓】

For more recommendations, please see《Recommended Technical and Design Public Accounts》

Among them, it recommends popular public accounts related to technology, design, geeks, and IT matchmaking. The technology covers: Python, Web front-end, Java, Android, iOS, PHP, C/C++, .NET, Linux, databases, operations, big data, algorithms, IT workplace, etc. Click on 《Recommended Technical and Design Public Accounts》 to discover exciting content!

(Click the public account above to quickly follow)

Introduction

Common Usage of grep

grep and Regular Expressions

Related posts

Leave a Comment Cancel reply