Let’s grow together every evening at 18:00!
Recently, I discovered some interesting tools and couldn’t wait to share them with everyone.
How do you usually check <span>Linux</span>
logs? Personally, I often use <span>tail</span>
, <span>head</span>
, <span>cat</span>
, <span>sed</span>
, <span>more</span>
, and <span>less</span>
— these classic system commands, or tools like <span>awk</span>
for data filtering, which work together efficiently. However, one thing that bothers me during usage is the numerous command parameter rules that are hard to remember.
Is there a universal way to check logs, such as using SQL queries? After all, this is an expression that programmers are quite familiar with.
The tool I’m sharing today, q, allows you to query and analyze text content using SQL syntax. Let’s take a look at what makes this tool so amazing.
Setting Up the Environment
q is a command-line tool that allows us to execute SQL queries on any file or query results. For example, you can directly execute SQL statements on the result set of the <span>ps -ef</span>
command that queries processes.
The principle is that text acts as a database table. Well, this is my own interpretation, haha.
It treats ordinary files or result sets as database tables and supports almost all SQL structures, such as <span>WHERE</span>
, <span>GROUP BY</span>
, <span>JOINS</span>
, etc. It also supports automatic column name and type detection, and cross-file join queries. These two features will be detailed later, and it supports multiple encodings.
Installation is quite simple. In a <span>Linux CentOS</span>
environment, you can complete it in just three steps, and in a <span>Windows</span>
environment, you only need to install an <span>exe</span>
file.
wget https://github.com/harelba/q/releases/download/1.7.1/q-text-as-data-1.7.1-1.noarch.rpm # Download version
sudo rpm -ivh q-text-as-data-1.7.1-1.noarch.rpm # Install
q --version # Check installation version
“
Official documentation: https://harelba.github.io/q
Syntax
q supports all <span>SQLite</span>
SQL syntax, with the standard command line format <span>q + parameter command + "SQL"</span>
q <command> "<SQL>"
To query the contents of the <span>myfile.log</span>
file, simply use <span>q "SELECT * FROM myfile.log"</span>
.
q "SELECT * FROM myfile.log"
Using q without additional parameters is perfectly fine, but utilizing parameters can make the output more visually appealing. So, let’s briefly understand that its parameters are divided into two types.
<span>input</span>
input command: refers to operations on the file or result set to be queried. For example, the <span>-H</span>
command indicates that the input data contains a header row.
q -H "SELECT * FROM myfile.log"
In this case, column names will be automatically detected and can be used in the query statement. If this option is not provided, columns will be automatically named as cX, starting from c1 and so on.
q "select c1, c2 from ..."
<span>output</span>
output command: affects the result set of the query output. For example,<span>-O</span>
allows the displayed results to show column names.
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H "select count(UID) from - where UID='root'"
104
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H -O "select count(UID) from - where UID='root'"
count(UID)
104
There are many more parameters, but I won’t list them all here. Interested readers can check the official website. Next, we will focus on demonstrating how to use SQL to handle various log query scenarios.
Many Ways to Play
Let’s look at several common scenarios for querying logs and how to write SQL for them.
1. Keyword Search
Keyword retrieval is probably the most frequently used operation in daily development. However, I personally think that <span>q</span>
does not have much advantage here, as it requires specifying a particular column during the query.
[root@iZ2zebfzaequ90bdlz820sZ software]# q "select * from douyin.log where c9 like '%待解析%'"
2021-06-11 14:46:49.323 INFO 22790 --- [nio-8888-exec-2] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9g9uJ6%2F
2021-06-11 14:57:31.938 INFO 22790 --- [nio-8888-exec-5] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9pdhGP%2F
2021-06-11 15:23:48.004 INFO 22790 --- [nio-8888-exec-2] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9pQjBR%2F
2021-06-11 2
Using the <span>grep</span>
command allows for full-text search.
[root@iZ2zebfzaequ90bdlz820sZ software]# cat douyin.log | grep '待解析URL'
2021-06-11 14:46:49.323 INFO 22790 --- [nio-8888-exec-2] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9g9uJ6%2F
2021-06-11 14:57:31.938 INFO 22790 --- [nio-8888-exec-5] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9pdhGP%2F
2. Fuzzy Search
<span>like</span>
fuzzy search: if the text content column has a name, you can directly search by column name; if not, you can search by column number c1, c2, cN.
[root@iZ2zebfzaequ90bdlz820sZ software]# cat test.log
abc
2
3
4
5
23
24
25
[root@iZ2zebfzaequ90bdlz820sZ software]# q -H -t "select * from test.log where abc like '%2%'"
Warning: column count is one - did you provide the correct delimiter?
2
23
24
25
3. Intersection and Union
Supports <span>UNION</span>
and <span>UNION ALL</span>
operators to take intersections or unions of multiple files.
For example, I created <span>test.log</span>
and <span>test1.log</span>
files, which have overlapping content, and used <span>union</span>
to remove duplicates.
q -H -t "select * from test.log union select * from test1.log"
[root@iZ2zebfzaequ90bdlz820sZ software]# cat test.log
abc
2
3
4
5
[root@iZ2zebfzaequ90bdlz820sZ software]# cat test1.log
abc
3
4
5
6
[root@iZ2zebfzaequ90bdlz820sZ software]# q -H -t "select * from test.log union select * from test1.log"
Warning: column count is one - did you provide the correct delimiter?
Warning: column count is one - did you provide the correct delimiter?
2
3
4
5
6
4. Content Deduplication
For example, to count the total number of unique occurrences of the <span>uuid</span>
field in the <span>./clicks.csv</span>
file.
q -H -t "SELECT COUNT(DISTINCT(uuid)) FROM ./clicks.csv"
5. Automatic Column Type Detection
Note: q will understand whether each column is a number or a string, determining whether to filter based on real value comparison or string comparison. This will use the <span>-t</span>
command.
q -H -t "SELECT request_id, score FROM ./clicks.csv WHERE score > 0.7 ORDER BY score DESC LIMIT 5"
6. Field Calculations
Read the system command query results and calculate the total values for each user and group in the <span>/tmp</span>
directory. You can perform calculations on fields.
sudo find /tmp -ls | q "SELECT c5, c6, sum(c7)/1024.0/1024 AS total FROM - GROUP BY c5, c6 ORDER BY total DESC"
[root@iZ2zebfzaequ90bdlz820sZ software]# sudo find /tmp -ls | q "SELECT c5, c6, sum(c7)/1024.0/1024 AS total FROM - GROUP BY c5, c6 ORDER BY total DESC"
www www 8.86311340332
root root 0.207922935486
mysql mysql 4.76837158203e-06
7. Data Statistics
Count the top 3 user IDs with the most processes in the system, sorted in descending order. This requires using system commands first to query all processes and then filtering with SQL. Here, the q command is equivalent to the <span>grep</span>
command.
ps -ef | q -H "SELECT UID, COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H "SELECT UID, COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
root 104
www 16
rabbitmq 4
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H -O "SELECT UID, COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
UID cnt
root 110
www 16
rabbitmq 4
We can see that the difference between using and not using the <span>-O</span>
command is whether the query result displays the title.
8. Cross-file Queries
In general, our log files are split into many fixed-capacity sub-files by day. Without a unified log collection server, if you don’t specify a time range to search for a keyword, it would be like looking for a needle in a haystack. If you can merge all file contents and then search, it would save a lot of trouble. q supports querying files as if they were database tables.
q -H "select * from douyin.log a join douyin-2021-06-18.0.log b on (a.c2=b.c3) where b.c1='root'"
Conclusion
After reading this, some may argue:<span>q</span>
requires writing so much code; wouldn’t it be better to use <span>awk</span>
? Well, the purpose of introducing this tool is not to replace any existing tools, but to provide a more convenient way to check logs.
I also use <span>awk</span>
, which is indeed powerful, but there is a learning cost involved. With a plethora of commands and matching rules, mastering it requires some effort. For new programmers with some database experience, writing SQL is not a big issue, and getting started with <span>q</span>
will be much easier.