A Practical Guide to JSON Processing in Linux Using the jq Command Line Tool

<span>JSON</span> processing is a common operation in operations and maintenance,<span>API</span> returns, configuration files, log analysis, etc. all involve it. Especially in <span>K8S/Docker</span> operation and maintenance scenarios,<span>JSON</span> data processing is a routine task. This article introduces a <span>Linux JSON</span> data processing tool – <span>jq</span>, which is much more powerful than <span>grep</span> and <span>awk</span>. This is what we call specialization! Without further ado, let’s get started!

1. Basic Operations

1.1 Formatting Output

# The most commonly used feature, formatting compressed JSON into a human-readable form:
# Beautifying API return results
curl -s https://api.opsnot.com/users | jq '.'

# Processing files
cat opsnot.json | jq '.'

1.2 Extracting Fields

# Extracting a single field
echo '{"name":"opsnot","age":30}' | jq '.name'
# Output: "opsnot"

# Extracting nested fields
echo '{"user":{"name":"opsnot"}}' | jq '.user.name'

# Array indexing
echo '["a","b","c"]' | jq '.[1]'
# Output: "b"

1.3 Removing Quotes

# By default, strings are quoted. Add the `-r` parameter to remove:
echo '{"domain":"opsnot.com"}' | jq -r '.domain'
# Output: opsnot.com (without quotes)

2. Array Processing

2.1 Iterating Over Arrays

# User list, extracting all usernames
echo '[{"name":"tom"},{"name":"jerry"}]' | jq '.[].name'

# Using -r directly
kubectl get pods -o json | jq -r '.items[].metadata.name'

2.2 Filtering

# Filtering records where status is active
jq '.users[] | select(.status == "active")' users.json

# Filtering services with ports greater than 8000 (from opsnot monitoring system)
# Note: If port is a string, it needs to be converted to a number
jq '.services[] | select((.port | tonumber) > 8000)' config.json

# Multiple conditions
jq '.[] | select(.age > 25 and .city == "beijing")' data.json

2.3 Array Slicing

# Taking the first 3 elements
echo '[1,2,3,4,5]' | jq '.[:3]'

# Taking the last 2 elements
echo '[1,2,3,4,5]' | jq '.[-2:]'

3. Practical Scenarios

3.1 Analyzing Docker Container Status

# Finding all running container names (removing leading slashes)
docker ps -q | xargs docker inspect | jq -r '.[].Name | ltrimstr("/")'

# Reminder: When there are too many containers, to prevent command line arguments from being too long, you can use a safer method like this
docker ps --format "{{.ID}}" | while read id; do
    docker inspect "$id" | jq -r '.[].Name | ltrimstr("/")'
done

# Finding containers occupying port 80
docker ps -q | xargs docker inspect | \
  jq -r '.[] | select(.NetworkSettings.Ports."80/tcp" != null) | \
         .Name | ltrimstr("/")'

For more <span>docker</span> frequently used commands, please see the other two articles: Docker High-Frequency Command Practical Manual, worth collecting! Docker Inspect, a command worth dedicating a page in the family tree

3.2 Handling Kubernetes Resources

# Listing all Pod names and IPs
kubectl get pods -o json | \
  jq -r '.items[] | "(.metadata.name) (.status.podIP)"'

# Finding Pods that are not in Running status
kubectl get pods -o json | \
  jq -r '.items[] | select(.status.phase != "Running") | .metadata.name'

# Counting the number of Pods in each namespace (opsnot cluster inspection script)
kubectl get pods --all-namespaces -o json | \
  jq -r '.items | group_by(.metadata.namespace) | \
         .[] | "(.[0].metadata.namespace): (length)"'

For more <span>k8S</span> frequently used commands, please see the other two articles: K8s High-Frequency Command Practical Manual, worth collecting! Kubectl Describe, a powerful tool for troubleshooting k8s!

3.3 Log Analysis

# Extracting error messages from JSON formatted logs
cat app.log | jq -r 'select(.level=="ERROR") | .message'

# Counting requests with response times exceeding 1 second
cat access.log | jq -r 'select(.response_time > 1000) | \
  "(.timestamp) (.path) (.response_time)ms"'

# Grouping and counting by status code (opsnot.com access logs, here you need to configure nginx log format as JSON format)
cat nginx.log | jq -s 'group_by(.status) | \
  .[] | {status: .[0].status, count: length}'

3.4 API Data Extraction

# GitHub API - Listing repository star counts
curl -s https://api.github.com/users/opsnot/repos | \
  jq -r '.[] | "(.name): (.stargazers_count) stars"'

# Extracting all SSH clone URLs
curl -s https://api.github.com/users/opsnot/repos | \
  jq -r '.[].ssh_url'

3.5 Constructing New JSON

# Reorganizing fields
jq '{username: .name, email: .contact.email}' users.json

# Batch generating configuration files
cat servers.json | jq -r '.[] | \
  "Host (.name)\n  HostName (.ip)\n  User opsnot\n"' > ~/.ssh/config.d/auto

4. Advanced Techniques

4.1 Pipelining Operations

# Filtering first then extracting
jq '.users[] | select(.age > 30) | .name' data.json

# Multi-level processing
jq '.data.items[] | select(.price < 100) | {name, price}' products.json

4.2 Array Operations

# map - batch conversion
echo '[1,2,3]' | jq 'map(. * 2)'
# Output: [2,4,6]

# Array length
jq '.users | length' data.json

# Removing duplicates
jq '[.[] | .city] | unique' users.json

# Sorting
jq 'sort_by(.price)' products.json

4.3 Conditional Judgments

# if-then-else
jq '.[] | if .age >= 18 then "adult" else "minor" end' users.json

# Handling null values (opsnot data cleaning script)
jq '.email // "[email protected]"' users.json

4.4 String Processing

# Concatenating strings
jq '.name + "@opsnot.com"' users.json

# Splitting
echo '{"path":"/var/log/app.log"}' | jq -r '.path | split("/") | .[-1]'

# Regular expression matching
jq 'select(.email | test(".*@opsnot\.com$"))' users.json

5. Performance Optimization

5.1 Streaming Large Files

# Processing GB-level logs, do not use jq -s to read into memory
cat huge.json | jq -c '.[] | select(.error != null)'

5.2 Reducing Pipeline Counts

# Bad practice
cat data.json | jq '.[]' | jq 'select(.age > 30)' | jq '.name'

# Good practice
jq '.[] | select(.age > 30) | .name' data.json

6. Common Pitfalls

6.1 Quoting Issues

# Error: Variables cannot be used inside single quotes
user="opsnot"
jq ".name == \"$user\"" users.json  # Can but not safe

# Recommended: Use --arg to avoid injection risks
jq --arg u "$user" '.name == $u' users.json

# Passing multiple variables
jq --arg name "$name" --arg domain "$domain" \
   '.user = $name | .email = $name + "@" + $domain' config.json

# Passing numeric variables
jq --argjson port 8000 '.port == $port' config.json

6.2 Handling Empty Arrays

# Avoiding errors
jq '.items[]? | .name' data.json  # Add ?

# Or provide default values
jq '.items // []' data.json

6.3 Numbers and Strings

# When port number is a string, it needs to be converted
jq '.[] | select((.port | tonumber) > 8000)' config.json

# String to number
echo '{"count":"42"}' | jq '.count | tonumber'

# Number to string
echo '{"count":42}' | jq '.count | tostring'

6.4 Error Handling

# Validating if JSON is legal
json='{"data": "test"}'
if echo "$json" | jq -e . >/dev/null 2>&1; then
    echo "Valid JSON"
else
    echo "Invalid JSON from opsnot.com API" >&2
    exit 1
fi

# Handling potentially missing fields
jq '.user.email // "[email protected]"' data.json

# Safely accessing arrays to avoid empty array errors
jq '.items[]?' data.json

7. Advanced Features

7.1 Custom Functions

# Defining functions to reuse logic
echo '{"users":[{"email":"alice"},{"email":"bob"}]}' | jq '
  def add_domain: . + "@opsnot.com";
  .users[].email | add_domain
'

# Functions with parameters
echo '{"prices":[10,20,30]}' | jq '
  def multiply($n): . * $n;
  .prices[] | multiply(1.1)
'

7.2 Recursive Processing

# Recursively searching for all name fields
jq '.. | .name? // empty' complex.json

# Recursively traversing tree structures
jq 'recurse(.children[]?) | .id' tree.json

7.3 Readability of Complex Queries

# Using heredoc to handle multi-line complex queries
jq -n --slurpfile config config.json '
  $config[0] | 
  .services[] | 
  select(.enabled == true) |
  select(.env == "prod") |
  "(.name):(.port) # opsnot.com"
'

# Or using input redirection:
jq '.services[] | select(.enabled == true) | select(.env == "prod") | "(.name):(.port)"' config.json

7.4 Combination Techniques

In operations and maintenance practice, it is often necessary to combine usage:

# Finding the top 5 containers with the highest CPU usage (opsnot monitoring alarm)
docker stats --no-stream --format "{{json .}}" | \
  jq -R 'fromjson? | select(.CPUPerc != null)' | \
  jq -s 'sort_by(.CPUPerc | rtrimstr("%") | tonumber) | \
         reverse | .[:5] | \
         .[] | "(.Name): (.CPUPerc)"'

# Counting slow request URL distribution from ELB logs
cat elb.log | \
  jq -r 'select(.response_time > 1) | .request_url' | \
  sort | uniq -c | sort -rn | head -20

# Generating Prometheus batch monitoring configuration
cat servers.json | \
  jq -r '.[] | \
    "  - job_name: \"(.name)\"",
    "    static_configs:",
    "      - targets: [\"(.ip):9100\"]", 
    "        labels:",
    "          env: \"(.env)\",",
    "          # by opsnot.com"'

8. Debugging Techniques

# Debugging complex expressions, viewing intermediate results
jq 'debug | .users[] | select(.age > 30)' data.json

# Gradually building complex queries
jq '.' data.json           # First look at the overall structure
jq '.users' data.json      # Then look at specific fields
jq '.users[]' data.json    # Finally iterate through the array
jq '.users[].name' data.json  # Ultimately extract

# Checking types
echo '{"port":"8080"}' | jq '.port | type'  # Output: "string"

9. Performance and Limitations

9.1 Performance Optimization

# Using -c to compress output to reduce memory usage
jq -c '.' large-file.json

# Avoiding repeated parsing of the same file
config=$(jq '.' config.json)
echo "$config" | jq '.db.host'
echo "$config" | jq '.app.port'

# Streaming processing to avoid memory explosion
cat opsnot.log | jq -c 'select(.error)' > errors.log

9.2 Limitations of jq

When processing extremely large files (GB level), performance is limited and files need to be split.
It does not support in-place modification of files and requires redirection of output.
Complex logic is not as flexible as programming languages, such as using Python.

9.3 Others

# Python for handling complex logic
python3 -m json.tool < data.json

# Handling JSON Lines format
cat data.jsonl | jq -c '.'

# More robust, if a line is not valid JSON, jq will report an error and skip it
cat data.jsonl | jq -c '.' 2>/dev/null
# This will only output valid JSON lines

Conclusion:

<span>jq</span> is centered around pipelines and filtering, thinking about data flow from left to right. Use it frequently in daily tasks, and break down complex scenarios into smaller steps. Remember to use <span>--arg</span> to pass parameters, <span>-e</span> to validate <span>JSON</span>, handle null values well, and perform type conversions.

Best Practices:

  • First use <span>jq -e .</span> to validate <span>JSON</span> legality
  • Use <span>--arg</span> to pass external variables to avoid injection
  • For large files, use <span>-c</span> for streaming processing
  • When processing arrays, add <span>?</span> to prevent empty array errors
  • Remember to compare strings and numbers using <span>tonumber</span>

jq Parameter Quick Reference

# Common parameters
-r, --raw-output          # Output raw strings, removing quotes
-c, --compact-output      # Compressed output, not formatted
-e, --exit-status         # Set exit code based on output, used for validation
-s, --slurp               # Read the entire input stream as an array
-n, --null-input          # Do not read input, start from null
-j, --join-output         # Output without new lines
-S, --sort-keys           # Sort object keys when outputting
--stream                  # Stream parsing of large JSON files
--seq                     # Input mode using RS separator

# Passing parameters (recommended by opsnot.com)
--arg name value          # Pass string variable: $name
--argjson name json       # Pass JSON variable: $name  
--slurpfile name file     # Read file as array: $name

# Output control
-C, --color-output        # Color output (default)
-M, --monochrome-output   # Monochrome output
--tab                     # Use tab for indentation instead of spaces
--indent n                # Set number of spaces for indentation

# Program files
-f file, --from-file file # Read jq program from file
-L directory              # Add module search path

# Examples
jq -r '.name' data.json                    # Output raw string
jq -c '.' data.json                        # Compress to one line
jq -e '.error' data.json && echo "There is an error" # Validate field existence
cat file1.json file2.json | jq -s '.'      # Merge multiple files
jq --arg env prod '.[$env]' config.json    # Use variable to access property

For more operations and maintenance techniques, please click the link below <span>Read the original text</span>

This article is organized by opsnot.com, please indicate the source when reprinting, click the card below to follow

Leave a Comment