π₯ Ansible Firefighting Hotline | Is OS Performance Analysis Too Complicated? One-Click PCP Automated Inspection Turns You into a Performance Expert!
Are you still struggling with system performance analysis? Manually running a bunch of commands like top, iostat, vmstat, free… leads to scattered information that is hard to integrate, and you might miss key metrics. Today, we bring you an enterprise-level automated performance inspection solution for RHEL systems, allowing you to say goodbye to the nightmare of manual performance analysis!
π― Pain Points Addressed
The daily routine of an operations engineer: the system slows down β manually run top to check CPU β iostat to check disk IO β vmstat to check system load β free to check memory β numastat to check NUMA β view historical data… After a series of commands, several hours have passed, and the root cause of the performance issue remains elusive.
Even more frightening is: traditional performance analysis tools have scattered information, lack historical trend analysis, cannot form a complete performance picture, and manually integrating data is time-consuming and labor-intensive, making it difficult to support enterprise-level performance management needs. Have you ever thought that if there were an automated performance analysis solution, all these problems would be solved?
β¨ Solution Preview
Today, we share an automated performance analysis solution for RHEL systems using Ansible, based on the Performance Co-Pilot (PCP) enterprise-level monitoring tool, which includes 8 core analysis modules, allowing your system performance analysis to be standardized, automated, and intelligent!
Results Preview
π§Ύ PCP Performance Inspection Report Excerpt (Results Only)
# RHEL Performance Inspection Report (Automatically Generated by PCP and Ansible)
**Report Generation Time:** 2025-09-27T15:30:00+08:00
---
## π― Host: rhel-server-01
### 1. System Overview (PCP)
This section displays the configuration of PCP, hardware summary, and running agents, which is the first step to understanding the monitoring environment.
```text
Performance Co-Pilot (PCP) Archive Logger
Copyright (c) 2012-2024 Red Hat.
Hostname: rhel-server-01
Archive: /var/log/pcp/pmlogger/rhel-server-01/20250927
Start: Fri Sep 27 15:00:00.000 2025
End: Fri Sep 27 15:30:00.000 2025
Commencing PCP Archive Logger (pmlogger)...
```
2. System Load and Uptime
Displays the current time, system uptime, number of logged-in users, and average load over the past 1, 5, and 15 minutes.
15:30:00 up 30 days, 4:15, 2 users, load average: 1.25, 1.10, 0.95
3. Memory Usage
Provides detailed information on total, used, and available physical memory (Mem) and swap space (Swap), in MB.
total used free shared buff/cache available
Mem: 32168 18964 3208 1252 9996 11632
Swap: 8192 512 7680
4. NUMA Architecture Statistics
Displays Non-Uniform Memory Access statistics, which are crucial for performance tuning on servers with multiple physical CPUs.
node0 1048576 123456 56789 1234 567 123 45 6 0
node1 1048576 234567 67890 2345 678 234 56 7 1
5. Disk I/O Statistics
Displays key I/O metrics such as read/write rates, queue lengths, and average wait times for each block device.
Linux 5.14.0-427.13.1.el9_4.x86_64 (rhel-server-01) 09/27/2025 _x86_64_ (4 CPU)
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 5.2 12.3 256.8 512.4 0.1 0.2 1.9 1.6 2.1 1.8 0.05 49.4 41.7 0.8 1.4
sdb 2.1 8.7 128.4 256.8 0.0 0.1 0.0 1.1 1.5 1.2 0.02 61.1 29.5 1.2 1.3
6. Top 5 CPU Consuming Processes
Lists the processes currently consuming the most CPU resources, helping to quickly identify performance bottlenecks.
Linux 5.14.0-427.13.1.el9_4.x86_64 (rhel-server-01) 09/27/2025 _x86_64_ (4 CPU)
15:30:00 UID PID %usr %system %guest %CPU CPU Command
15:30:00 0 1234 12.5 2.1 0.0 14.6 0 java
15:30:00 0 5678 8.3 1.2 0.0 9.5 1 mysql
15:30:00 1000 9012 5.1 0.8 0.0 5.9 2 nginx
15:30:00 0 12345 3.2 1.5 0.0 4.7 3 httpd
15:30:00 0 16789 2.1 0.9 0.0 3.0 0 kworker
7. Metric Description
Using the <span><span>pminfo</span></span> tool, we can gain insights into the meaning of any PCP metric. For example, the system load:
kernel.all.load
Data Type: float (D)
InDom: PM_INDOM_NULL 0xffffffff
Semantics: instant
Units: none
Help: 1, 5, and 15 minute load averages
8. Historical Performance Summary
<span><span>pmlogger</span></span> continuously archives performance data.<span><span>pmlogsummary</span></span> can calculate the average of key metrics from the latest archive, reflecting overall trends over time.
kernel.all.cpu.user: 5.8
disk.all.total: 25.3
metric: kernel.all.cpu.user
inst [0 or ""] value 5.8
metric: disk.all.total
inst [0 or ""] value 25.3
π― Traditional vs PCP Performance Analysis Comparison
| Analysis Dimension | Traditional Tool Analysis | PCP Automated Analysis |
|---|---|---|
| Data Integrity | β Information is scattered, requires multiple tools | β Unified platform, comprehensive coverage |
| Historical Analysis | β Lacks historical data, difficult to analyze trends | β Automatic archiving, supports historical backtracking |
| Analysis Efficiency | β Manually executing multiple commands takes 2-3 hours | β One-click execution, completed in 5 minutes |
| Standardization Level | β Relies on personal experience, lacks standards | β Standardized reports, enterprise-level specifications |
| Scalability | β Difficult to integrate between tools | β Unified architecture, easy to expand |
π€ Design Philosophy: Why is PCP + Ansible Best Practice?
A professional performance analysis solution is not just a simple stack of commands. Our design philosophy incorporates the core practices of Red Hat enterprise-level monitoring:
1
Enterprise-level monitoring architecture, unified performance data platform β¨ We use Performance Co-Pilot (PCP) as the core monitoring platform, which is the officially recommended enterprise-level monitoring solution by Red Hat, providing a unified framework for performance data collection, storage, and analysis, avoiding the issues of traditional tools being scattered.
2
Multi-dimensional performance coverage, 360-degree analysis without blind spots π» PCP provides comprehensive performance data collection from system overview to process level, from real-time status to historical trends, from hardware monitoring to application performance, truly achieving “one-click access to all performance metrics”.
3
Automated report generation, professional-level performance insights β Ansible automatically integrates the data collected by PCP into a structured Markdown report, including expert-level performance analysis explanations, allowing ordinary operations personnel to quickly understand the meaning behind the performance data.
4
Historical data archiving, supports trend analysis π― The pmlogger service of PCP continuously archives performance data, supporting historical backtracking and trend analysis, providing data support for performance capacity planning and problem prevention.
β Automated Scenario Scoring
| Scoring Dimension | Score | Description |
|---|---|---|
| Ease of Use | ββββ | PCP is powerful, but the learning curve is slightly steep |
| Reusability | βββββ | Enterprise-level architecture, supports large-scale deployment |
| Stability | βββββ | Official Red Hat components, validated in production environments |
| Scalability | βββββ | Open architecture, supports custom metrics |
| Best Practice Compliance | βββββ | Officially recommended enterprise-level solution by Red Hat |
ποΈ Project Directory Structure
13_OS Performance Automation Analysis/
βββ OsPcpAnalysis.yml # Main analysis Playbook
π Core File Content Overview
π― Main Analysis Playbook (OsPcpAnalysis.yml)
---
# ===========================================================================
# Integrated Ansible Playbook: RHEL Host Performance Inspection and Reporting
# Description: This is a standalone Ansible Playbook that contains all necessary logic and report templates.
# You only need one file to accomplish the following tasks:
# 1. Install and configure Performance Co-Pilot (PCP) on the target host.
# 2. Use various PCP commands to collect comprehensive real-time and historical performance data.
# 3. Generate a detailed Markdown format report on the Ansible control node.
# =========================================================================
# -----------------------------------------------------------------------------
# Play 1: Install and configure PCP on the target host
# Goal: Ensure that the PCP-related packages are installed and core services are running.
# -----------------------------------------------------------------------------
- name: "Play 1: Install and configure PCP on the target host"
hosts: pcp_servers
become: true
tasks:
- name: "Ensure pcp and pcp-system-tools are installed"
ansible.builtin.package:
name:
- pcp
- pcp-system-tools
state: present
- name: "Ensure pmcd and pmlogger services are started and set to start on boot"
ansible.builtin.service:
name: "{{ item }}"
state: started
enabled: true
loop:
- pmcd
- pmlogger
# -----------------------------------------------------------------------------
# Play 2: Collect comprehensive performance data from the target host
# Goal: Run a series of PCP commands to capture performance snapshots and historical summaries across various dimensions.
# -----------------------------------------------------------------------------
- name: "Play 2: Collect comprehensive performance data from the target host"
hosts: pcp_servers
become: true
tasks:
- name: "1. Get PCP system overview"
ansible.builtin.command: pcp
register: pcp_overview_result
changed_when: false
ignore_errors: true
- name: "2. Get system load and uptime"
ansible.builtin.command: pcp uptime
register: pcp_uptime_result
changed_when: false
ignore_errors: true
- name: "3. Get memory usage (in MB)"
ansible.builtin.command: pcp free -m
register: pcp_free_result
changed_when: false
ignore_errors: true
- name: "4. Get NUMA architecture statistics"
ansible.builtin.command: pcp numastat
register: pcp_numastat_result
changed_when: false
ignore_errors: true
- name: "5. Get disk I/O statistics (similar to iostat)"
ansible.builtin.command: pcp iostat -x
register: pcp_iostat_result
changed_when: false
ignore_errors: true
- name: "6. Get top 5 CPU consuming processes (similar to pidstat)"
ansible.builtin.shell: "pcp pidstat -u | head -n 8"
register: pcp_pidstat_result
changed_when: false
ignore_errors: true
- name: "7. Get detailed description of 'kernel.all.load' metric (pminfo)"
ansible.builtin.command: pminfo -T kernel.all.load
register: pminfo_load_desc_result
changed_when: false
ignore_errors: true
- name: "8. Extract summary information from historical archives (pmlogsummary)"
ansible.builtin.command: pmlogsummary -l kernel.all.cpu.user disk.all.total
register: pmlogsummary_result
changed_when: false
ignore_errors: true
# -----------------------------------------------------------------------------
# Play 3: Generate and display performance report on the Ansible control node
# Goal: Integrate all data collected in Play 2 into a report file using embedded templates.
# -----------------------------------------------------------------------------
- name: "Play 3: Generate and display performance report on the Ansible control node"
hosts: localhost
connection: local
gather_facts: true
vars:
report_filename: "integrated_pcp_report_{{ ansible_date_time.date }}.md"
report_template_content: |
# RHEL Performance Inspection Report (Automatically Generated by PCP and Ansible)
**Report Generation Time:** {{ ansible_date_time.iso8601 }}
---
{% for host in groups['pcp_servers'] %}
## π― Host: {{ host }}
### 1. System Overview (PCP)
This section displays the configuration of PCP, hardware summary, and running agents, which is the first step to understanding the monitoring environment.
```text
{% if not hostvars[host].pcp_overview_result.failed %}{{ hostvars[host].pcp_overview_result.stdout }}{% else %}Error: {{ hostvars[host].pcp_overview_result.stderr | default('Failed to execute pcp command') }}{% endif %}
```
### 2. System Load and Uptime
Displays the current time, system uptime, number of logged-in users, and average load over the past 1, 5, and 15 minutes.
```text
{% if not hostvars[host].pcp_uptime_result.failed %}{{ hostvars[host].pcp_uptime_result.stdout }}{% else %}Error: {{ hostvars[host].pcp_uptime_result.stderr | default('Failed to execute pcp uptime command') }}{% endif %}
```
### 3. Memory Usage
Provides detailed information on total, used, and available physical memory (Mem) and swap space (Swap), in MB.
```text
{% if not hostvars[host].pcp_free_result.failed %}{{ hostvars[host].pcp_free_result.stdout }}{% else %}Error: {{ hostvars[host].pcp_free_result.stderr | default('Failed to execute pcp free -m command') }}{% endif %}
```
### 4. NUMA Architecture Statistics
Displays Non-Uniform Memory Access statistics, which are crucial for performance tuning on servers with multiple physical CPUs.
```text
{% if not hostvars[host].pcp_numastat_result.failed %}{{ hostvars[host].pcp_numastat_result.stdout }}{% else %}Error: {{ hostvars[host].pcp_numastat_result.stderr | default('Failed to execute pcp numastat command') }}{% endif %}
```
### 5. Disk I/O Statistics
Displays key I/O metrics such as read/write rates, queue lengths, and average wait times for each block device.
```text
{% if not hostvars[host].pcp_iostat_result.failed %}{{ hostvars[host].pcp_iostat_result.stdout }}{% else %}Error: {{ hostvars[host].pcp_iostat_result.stderr | default('Failed to execute pcp iostat -x command') }}{% endif %}
```
### 6. Top 5 CPU Consuming Processes
Lists the processes currently consuming the most CPU resources, helping to quickly identify performance bottlenecks.
```text
{% if not hostvars[host].pcp_pidstat_result.failed %}{{ hostvars[host].pcp_pidstat_result.stdout }}{% else %}Error: {{ hostvars[host].pcp_pidstat_result.stderr | default('Failed to execute pcp pidstat command') }}{% endif %}
```
### 7. Metric Description
Using `pminfo` tool, we can gain insights into the meaning of any PCP metric. For example, the system load:
```text
{% if not hostvars[host].pminfo_load_desc_result.failed %}{{ hostvars[host].pminfo_load_desc_result.stdout }}{% else %}Error: {{ hostvars[host].pminfo_load_desc_result.stderr | default('Failed to execute pminfo command') }}{% endif %}
```
### 8. Historical Performance Summary
`pmlogger` continuously archives performance data. `pmlogsummary` can calculate the average of key metrics from the latest archive, reflecting overall trends over time.
```text
{% if not hostvars[host].pmlogsummary_result.failed %}{{ hostvars[host].pmlogsummary_result.stdout }}{% else %}Error: {{ hostvars[host].pmlogsummary_result.stderr | default('Failed to execute pmlogsummary command') }}{% endif %}
```
---
{% endfor %}
tasks:
- name: "Generate Markdown format report using embedded template variables"
ansible.builtin.copy:
content: "{{ report_template_content }}"
dest: "./{{ report_filename }}"
mode: '0644'
- name: "Output report path and content summary to console"
ansible.builtin.debug:
msg: |
=================================================================
β
Performance report has been successfully generated!
Report saved to: {{ report_filename }}
Please open this Markdown file to view the detailed report.
===================================================================
π οΈ Foolproof Deployment Guide
Seeing it a thousand times in theory is not as good as doing it once!
Prerequisites
1One Ansible control node.2The target RHEL server is configured with SSH trust, and the user executing Ansible has<span><span>sudo</span></span> permissions.3The control node has Ansible installed.4The target server can access the Red Hat software repository (for installing PCP).
Project Directory Structure
This is a very simple project, you only need a few files!
13_OS Performance Automation Analysis/
βββ OsPcpAnalysis.yml # Main analysis Playbook
βββ inventory # Host inventory (needs to be created)
How to Use?
1
Create Host Inventory π: Create an <span><span>inventory</span></span> file and fill in your server hostnames or IP addresses.
[pcp_servers]
rhel-server-01.example.com
rhel-server-02.example.com
# or
# 192.168.1.100
# 192.168.1.101
2
Execute Automation βΆοΈ: Run the following command, then you can go make a cup of coffee βοΈ!
ansible-playbook -i inventory OsPcpAnalysis.yml
3
View Report π: After execution, a performance report file similar to <span><span>integrated_pcp_report_2025-09-27.md</span></span> will be generated in the current directory.
π Analysis Coverage
β PCP System Overview Analysis
β’System Architecture: PCP service configuration, hardware information summaryβ’Agent Status: Running PCP agents and service statusβ’Archiving Information: Log archiving configuration and time range
β System Load and Uptime Analysis
β’System Load: 1-minute, 5-minute, and 15-minute average loadβ’Uptime: System uptime and user login informationβ’Timestamp: Accurate data collection time records
β Memory Usage Analysis
β’Physical Memory: Total, used, available, cache, buffer, and other detailed metricsβ’Swap Space: Swap usage and available spaceβ’Memory Allocation: Shared memory, available memory, and other key information
β NUMA Architecture Statistics Analysis
β’Node Statistics: Memory access statistics for each NUMA nodeβ’Local Access: numa_hit (efficient local memory access)β’Remote Access: numa_miss (inefficient cross-node memory access)β’Performance Optimization: Provides data support for performance tuning on multi-CPU servers
β Disk I/O Performance Analysis
β’IOPS Metrics: Read and write counts per second (r/s, w/s)β’Throughput: Amount of data read and written (rkB/s, wkB/s)β’Wait Time: Average I/O wait time (await)β’Queue Length: Device queue depth (aqu-sz)β’Device Utilization: Disk busy level (%util)
β Process-Level CPU Analysis
β’Top Processes: List of processes with the highest CPU usageβ’User Mode/Kernel Mode: Distinguishing between user mode and kernel mode CPU usageβ’Process Information: Detailed information such as PID, UID, process name, etc.β’Performance Bottlenecks: Quickly identify major CPU resource consumers
β Metric Description
β’Metric Metadata: Data type, units, semantic descriptionβ’Help Information: Professional explanations of PCP metricsβ’Metric Relationships: Understanding the relationships between metricsβ’Expert Guidance: Provides professional guidance for performance analysis
β Historical Performance Trend Analysis
β’Average Calculation: Statistical averages of historical dataβ’Trend Analysis: Time variation trends of performance metricsβ’Capacity Planning: Provides data support for system expansionβ’Problem Prevention: Prevent potential performance issues through trend analysis
β Enterprise-Level Features
β’Unified Platform: All performance data is collected through a unified PCPβ’Historical Archiving: Automatically saves historical data, supports trend analysisβ’Standard Reports: Generates professional-level Markdown format reportsβ’Multi-Host Support: Batch analysis of performance across multiple serversβ’Error Tolerance: Comprehensive error handling and default value mechanisms
π‘ Tips for Use
π― Batch Performance Analysis
# Add multiple servers in the inventory
[pcp_servers]
server1 ansible_host=192.168.1.100
server2 ansible_host=192.168.1.101
server3 ansible_host=192.168.1.102
# Execute in parallel, doubling efficiency
ansible-playbook OsPcpAnalysis.yml -i inventory --forks 10
π§ Custom Analysis Scope
Modify the PCP commands in the Playbook as needed:
β’Change <span><span>pcp iostat -x</span></span> to <span><span>pcp iostat -x 1 5</span></span> to get data for 5 seconds continuouslyβ’Add more PCP commands like <span><span>pcp netstat</span></span>, <span><span>pcp df</span></span>, etc.β’Adjust the number of lines displayed for <span><span>pcp pidstat</span></span>.
π Troubleshooting
If you encounter issues related to PCP:
1Check PCP Services:<span><span>systemctl status pmcd pmlogger</span></span>2Validate PCP Commands:Manually execute the <span><span>pcp</span></span> command on the target server3Check Logs:<span><span>journalctl -u pmcd -u pmlogger</span></span>4Check Permissions:Ensure the executing user has sufficient permissions
π― Advanced Usage
Regular Inspections
# Combine with cron for regular performance inspections
# Automatically execute performance analysis every day at 9 AM
0 9 * * * cd /path/to/pcp-analysis && ansible-playbook -i inventory OsPcpAnalysis.yml
Establishing Performance Baselines
# Establish performance baselines under normal system conditions
ansible-playbook -i inventory OsPcpAnalysis.yml
# Save the report as a baseline for future comparative analysis
Locating Performance Issues
# Quickly generate a current status report when performance issues arise
ansible-playbook -i inventory OsPcpAnalysis.yml --limit problematic-server
π Surprise Time! Get the Enterprise-Level PCP Solution!
Do you think the above Playbook is not detailed enough? Want to learn more about the advanced features and enterprise-level deployment solutions of PCP?
Let you not only use it but also master the core skills of enterprise-level performance monitoring!
π Click on the γRead Originalγ below to get the complete PCP enterprise-level monitoring solution and best practices for performance analysis! π
π Summary
This automated performance analysis solution for RHEL systems truly achieves:
β’π Comprehensive Coverage: From system overview to process level, from real-time status to historical trends, comprehensive performance analysisβ’π One-Click Execution: Automates the complete process of PCP installation, configuration, data collection, and report generationβ’π Professional Reports: Generates structured Markdown reports containing expert-level analysis explanationsβ’π§ Enterprise-Level Architecture: Based on the PCP monitoring platform recommended by Red Hat, validated in production environmentsβ’π Historical Analysis: Supports performance trend analysis and capacity planningβ’π‘οΈ High Reliability: Comprehensive error handling and fault tolerance mechanismsβ’β° Standardization: Standard processes for enterprise-level performance management
π― Now the operations team can:
β’Quickly Diagnose system performance issues without manually executing multiple commandsβ’Establish Baselines, analyze performance trends through historical dataβ’Batch Manage performance monitoring across multiple serversβ’Improve Efficiency, reducing performance analysis time from hours to minutes
What are you waiting for? Download this enterprise-level performance analysis solution and elevate your system performance management capabilities to new heights!
π Advanced Application Scenarios
Enterprise-Level Monitoring Deployment
β’Monitoring Center: Establish a unified PCP monitoring centerβ’Alert Integration: Integrate with existing alert systemsβ’Visualization: Combine with Grafana for performance visualizationβ’SLA Monitoring: Establish a service level agreement monitoring system
Performance Capacity Planning
β’Trend Analysis: Capacity planning based on historical dataβ’Predictive Models: Establish performance predictive modelsβ’Expansion Recommendations: Provide data-driven expansion recommendationsβ’Cost Optimization: Optimize resource allocation and reduce costs
Fault Prevention System
β’Baseline Comparison: Establish performance baselines to detect anomalies in a timely mannerβ’Trend Alerts: Preventive alerts based on trend analysisβ’Automated Recovery: Achieve self-healing of faults with automation toolsβ’Knowledge Base: Establish a knowledge base for performance issues and handling processes
DevOps Integration
β’CI/CD Integration: Include performance checks in the deployment processβ’A/B Testing: Support performance testing for version comparisonsβ’Monitoring as Code: Include monitoring configurations in version controlβ’Automated Operations: Build a complete automated operations system
Tags: #Ansible #PCP #Performance Monitoring #System Analysis #RHEL #Enterprise-Level #Automated Operations #Performance Tuning #Monitoring Solutions #Red Hat Certified