I Tried Using SQL to Query Linux Logs, and It Works Amazingly Well

Let’s grow together every evening at 18:00!

Recently, I discovered some interesting tools and couldn’t wait to share them with everyone.

How do you usually check Linux logs? Personally, I often use tail, head, cat, sed, more, and less— these classic system commands, or tools like awk for data filtering, which work together efficiently. However, one thing that bothers me during usage is the numerous command parameter rules that are hard to remember.

Is there a universal way to check logs, such as using SQL queries? After all, this is an expression that programmers are quite familiar with.

The tool I’m sharing today, q, allows you to query and analyze text content using SQL syntax. Let’s take a look at what makes this tool so amazing.

Setting Up the Environment

q is a command-line tool that allows us to execute SQL queries on any file or query results. For example, you can directly execute SQL statements on the result set of the ps -ef command that queries processes.

The principle is that text acts as a database table. Well, this is my own interpretation, haha.

It treats ordinary files or result sets as database tables and supports almost all SQL structures, such as WHERE, GROUP BY, JOINS, etc. It also supports automatic column name and type detection, and cross-file join queries. These two features will be detailed later, and it supports multiple encodings.

Installation is quite simple. In a Linux CentOS environment, you can complete it in just three steps, and in a Windows environment, you only need to install an exe file.

wget https://github.com/harelba/q/releases/download/1.7.1/q-text-as-data-1.7.1-1.noarch.rpm # Download version

sudo rpm -ivh q-text-as-data-1.7.1-1.noarch.rpm # Install

q --version  # Check installation version

“

Official documentation: https://harelba.github.io/q

Syntax

q supports all SQLite SQL syntax, with the standard command line format q + parameter command + "SQL"

q &lt;command&gt; "&lt;SQL&gt;"

To query the contents of the myfile.log file, simply use q "SELECT * FROM myfile.log".

q "SELECT * FROM myfile.log"

Using q without additional parameters is perfectly fine, but utilizing parameters can make the output more visually appealing. So, let’s briefly understand that its parameters are divided into two types.

input input command: refers to operations on the file or result set to be queried. For example, the -H command indicates that the input data contains a header row.

q -H "SELECT * FROM myfile.log"

In this case, column names will be automatically detected and can be used in the query statement. If this option is not provided, columns will be automatically named as cX, starting from c1 and so on.

q  "select c1, c2 from ..."

output output command: affects the result set of the query output. For example, -O allows the displayed results to show column names.

[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H "select count(UID) from - where UID='root'"
104
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H -O "select count(UID) from - where UID='root'"
count(UID)
104

There are many more parameters, but I won’t list them all here. Interested readers can check the official website. Next, we will focus on demonstrating how to use SQL to handle various log query scenarios.

Many Ways to Play

Let’s look at several common scenarios for querying logs and how to write SQL for them.

1. Keyword Search

Keyword retrieval is probably the most frequently used operation in daily development. However, I personally think that q does not have much advantage here, as it requires specifying a particular column during the query.

[root@iZ2zebfzaequ90bdlz820sZ software]# q "select * from douyin.log where c9 like '%待解析%'"
2021-06-11 14:46:49.323 INFO 22790 --- [nio-8888-exec-2] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9g9uJ6%2F                                              
2021-06-11 14:57:31.938 INFO 22790 --- [nio-8888-exec-5] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9pdhGP%2F                                              
2021-06-11 15:23:48.004 INFO 22790 --- [nio-8888-exec-2] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9pQjBR%2F                                              
2021-06-11 2

Using the grep command allows for full-text search.

[root@iZ2zebfzaequ90bdlz820sZ software]# cat douyin.log | grep '待解析URL'
2021-06-11 14:46:49.323  INFO 22790 --- [nio-8888-exec-2] c.x.douyin.controller.ParserController   : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9g9uJ6%2F
2021-06-11 14:57:31.938  INFO 22790 --- [nio-8888-exec-5] c.x.douyin.controller.ParserController   : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9pdhGP%2F

2. Fuzzy Search

like fuzzy search: if the text content column has a name, you can directly search by column name; if not, you can search by column number c1, c2, cN.

[root@iZ2zebfzaequ90bdlz820sZ software]# cat test.log 
abc
2
3
4
5
23
24
25
[root@iZ2zebfzaequ90bdlz820sZ software]# q -H -t "select * from test.log where abc like '%2%'"
Warning: column count is one - did you provide the correct delimiter?
2
23
24
25

3. Intersection and Union

Supports UNION and UNION ALL operators to take intersections or unions of multiple files.

For example, I created test.log and test1.log files, which have overlapping content, and used union to remove duplicates.

q -H -t "select * from test.log union select * from test1.log"

[root@iZ2zebfzaequ90bdlz820sZ software]# cat test.log 
abc
2
3
4
5
[root@iZ2zebfzaequ90bdlz820sZ software]# cat test1.log 
abc
3
4
5
6
[root@iZ2zebfzaequ90bdlz820sZ software]# q -H -t "select * from test.log union select * from test1.log"
Warning: column count is one - did you provide the correct delimiter?
Warning: column count is one - did you provide the correct delimiter?
2
3
4
5
6

4. Content Deduplication

For example, to count the total number of unique occurrences of the uuid field in the ./clicks.csv file.

q -H -t "SELECT COUNT(DISTINCT(uuid)) FROM ./clicks.csv"

5. Automatic Column Type Detection

Note: q will understand whether each column is a number or a string, determining whether to filter based on real value comparison or string comparison. This will use the -t command.

q -H -t "SELECT request_id, score FROM ./clicks.csv WHERE score &gt; 0.7 ORDER BY score DESC LIMIT 5"

6. Field Calculations

Read the system command query results and calculate the total values for each user and group in the /tmp directory. You can perform calculations on fields.

sudo find /tmp -ls | q "SELECT c5, c6, sum(c7)/1024.0/1024 AS total FROM - GROUP BY c5, c6 ORDER BY total DESC"

[root@iZ2zebfzaequ90bdlz820sZ software]# sudo find /tmp -ls | q "SELECT c5, c6, sum(c7)/1024.0/1024 AS total FROM - GROUP BY c5, c6 ORDER BY total DESC"
www www 8.86311340332
root root 0.207922935486
mysql mysql 4.76837158203e-06

7. Data Statistics

Count the top 3 user IDs with the most processes in the system, sorted in descending order. This requires using system commands first to query all processes and then filtering with SQL. Here, the q command is equivalent to the grep command.

ps -ef | q -H "SELECT UID, COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"

[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H "SELECT UID, COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
root 104
www 16
rabbitmq 4
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H -O "SELECT UID, COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
UID cnt
root 110
www 16
rabbitmq 4

We can see that the difference between using and not using the -O command is whether the query result displays the title.

8. Cross-file Queries

In general, our log files are split into many fixed-capacity sub-files by day. Without a unified log collection server, if you don’t specify a time range to search for a keyword, it would be like looking for a needle in a haystack. I Tried Using SQL to Query Linux Logs, and It Works Amazingly Well If you can merge all file contents and then search, it would save a lot of trouble. q supports querying files as if they were database tables.

q -H "select * from douyin.log a join douyin-2021-06-18.0.log b on (a.c2=b.c3) where b.c1='root'"

Conclusion

After reading this, some may argue:q requires writing so much code; wouldn’t it be better to use awk? Well, the purpose of introducing this tool is not to replace any existing tools, but to provide a more convenient way to check logs.

I also use awk, which is indeed powerful, but there is a learning cost involved. With a plethora of commands and matching rules, mastering it requires some effort. For new programmers with some database experience, writing SQL is not a big issue, and getting started with q will be much easier.

Setting Up the Environment

Syntax

Many Ways to Play

Conclusion

Related posts

Leave a Comment Cancel reply