β° Ansible Firefighting Hotline | Struggling with Time Synchronization Failures? One-Click Automated Diagnosis Turns You into a Time Expert!
Are you still struggling with chaotic troubleshooting of chronyd time synchronization failures? Today, we bring you a comprehensive automated analysis solution for chronyd time synchronization failures on RHEL8/9 & CentOS8/9, allowing you to say goodbye to the nightmare of manually typing commands!
π― Pain Points Addressed
The daily routine of an operations engineer: time synchronization alerts β manually checking chronyd status β reviewing time source configurations β checking network connectivity β analyzing system logs β troubleshooting firewalls β verifying time deviations… After a series of actions, several hours have passed, and the problem may still be elusive.
Even more frightening is: Time synchronization issues often affect the entire system cluster, and manual troubleshooting can easily overlook key information, lacking systematic analysis and failing to quickly locate the root cause. Have you ever thought that if there were an automated chronyd diagnostic solution, all these problems would be resolved?
β¨ Solution Preview
Today, we share an automated analysis solution for chronyd time synchronization failures on RHEL8/9 & CentOS8/9 using Ansible, which includes six core diagnostic modules, standardizing, automating, and intelligentizing your time synchronization troubleshooting!
Results Preview
π§Ύ Sample of Original Diagnostic Report (results only)
=== Chrony Diagnose Report (10.66.208.231) =
OS: RedHat 9.6
--- Service Status ---
β chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; preset: enabled)
Active: active (running) since Fri 2025-09-19 16:25:23 CST; 3 days ago
--- timedatectl ---
Local time: Mon 2025-09-22 20:13:01 CST
Universal time: Mon 2025-09-22 12:13:01 UTC
System clock synchronized: yes
NTP service: active
--- chronyc activity ---
200 OK
3 sources online
0 sources offline
--- chronyc -n sources -v ---
MS Name/IP address Stratum Poll Reach LastRx Last sample
=============================================================================
^+ 10.11.160.238 2 10 347 606 -2787us[-2787us] +/- 159ms
^+ 10.2.32.37 2 10 377 1066 +1412us[+1412us] +/- 157ms
^* 10.2.32.38 2 10 377 23m +1533us[+1603us] +/- 144ms
Reach Check: β
Reach OK (377)
--- chronyc -n tracking ---
Reference ID : 0A022026 (10.2.32.38)
Stratum : 3
Ref time (UTC) : Mon Sep 22 11:49:55 2025
System time : 0.000233934 seconds fast of NTP time
Leap status : Normal
Sync Check: β
NTP synchronized
π€ Design Philosophy: Why Our Playbook is a Best Practice?
A professional automation solution is not just a simple pile of commands. Our design philosophy incorporates the core practices advocated by Red Hat, elevating your automation solution from “just works” to “professional and reliable”!
1
Intelligent Anomaly Detection, Problems at a Glance β¨ We not only collect data but also intelligently analyze key metrics. For example, automatically detecting whether the Reach value is 377 (normal value), automatically judging the NTP synchronization status, and using prominent β π΄ markers to expose problems instantly!
2
Variable-Driven, Flexible Adaptation π» We centralize all configurable parameters (such as NTP test server, report output path) in the <span><span>vars</span></span> section at the top of the Playbook. This means that when you need to adjust the diagnostic scope, you only need to modify these variables without touching any core automation task logic.
3
Idempotency Assurance, Safe and Worry-Free β All our Playbooks strictly adhere to Ansible’s core principleβidempotency. You can confidently execute this Playbook repeatedly; Ansible will automatically detect the current state and only perform necessary checks.
4
Closed-Loop Verification, Results Visible π― The last step of the Playbook is to generate a complete diagnostic report. This forms a check-analyze-report closed loop. Not only have you executed automation, but you can also immediately see the diagnostic results, ensuring that problems are under control!
β Automation Scenario Rating
| Rating Dimension | Rating | Description |
|---|---|---|
| Ease of Use | β | One-click execution, detailed comments, beginner-friendly |
| Reusability | βββββ | Variable configuration, supports multi-host parallel execution |
| Stability | βββββ | Idempotent design, comprehensive error handling |
| Scalability | ββββ | Modular design, easy to extend functionality |
| Best Practice Compliance | βββββ | Follows Ansible best practices, code standards |
ποΈ Project Directory Structure
08_chrony_service_automated_diagnosis/
βββ troubleshooting01_chrony_diagnose.yml
π Core File Content Overview
π― Main Diagnostic Playbook (troubleshooting01_chrony_diagnose.yml)
---
- name: "Chrony Troubleshooting & Diagnostic"
hosts: rhel9
gather_facts: true
become: true
vars:
ntp_test_server: "ntp2.ntp-001.prod.iad2.dc.redhat.com" # Change to your NTP IP
report_dir: "/tmp/chrony_reports"
report_file: "{{ report_dir }}/chrony_report_{{ inventory_hostname }}.txt"
pre_tasks:
- name: "Assert OS is RHEL/CentOS 7/8/9"
ansible.builtin.assert:
that:
- ansible_facts['os_family'] "RedHat"
- ansible_facts['distribution_major_version'] | int in [7, 8, 9]
fail_msg: "β This playbook only supports RHEL/CentOS 7/8/9"
success_msg: "β
OS version check passed"
- name: "Ensure report directory exists"
ansible.builtin.file:
path: "{{ report_dir }}"
state: directory
mode: '0755'
tasks:
# -------------------
# Basic Checks
# -------------------
- name: "Get chronyd service status"
ansible.builtin.command: systemctl status chronyd
register: chronyd_service
failed_when: false
- name: "Get timedatectl status"
ansible.builtin.command: timedatectl
register: timedatectl_status
failed_when: false
# -------------------
# Comprehensive Chrony Diagnostic Commands
# -------------------
- name: "Run chronyc activity"
ansible.builtin.command: chronyc activity
register: chronyc_activity
failed_when: false
- name: "Run chronyc ntpdata"
ansible.builtin.command: chronyc ntpdata
register: chronyc_ntpdata
failed_when: false
- name: "Run chronyc -n sources -v"
ansible.builtin.command: chronyc -n sources -v
register: chronyc_sources
failed_when: false
- name: "Run chronyc -n sourcestats -v"
ansible.builtin.command: chronyc -n sourcestats -v
register: chronyc_sourcestats
failed_when: false
- name: "Run chronyc -n tracking"
ansible.builtin.command: chronyc -n tracking
register: chronyc_tracking
failed_when: false
- name: "Run chronyd -Q test NTP server"
ansible.builtin.command: "chronyd -Q 'server {{ ntp_test_server }} iburst'"
register: chronyd_Q_test
failed_when: false
# -------------------
# Logic Judgments & Anomaly Marking
# -------------------
- name: "Mark abnormal Reach"
set_fact:
reach_status: |
{% if '377' not in chronyc_sources.stdout %}
π΄ Reach abnormal (not 377)
{% else %}
β
Reach OK (377)
{% endif %}
- name: "Mark abnormal Sync"
set_fact:
sync_status: |
{% if 'Leap status : Normal' not in chronyc_tracking.stdout %}
π΄ NTP not synchronized
{% else %}
β
NTP synchronized
{% endif %}
# -------------------
# Report Generation
# -------------------
- name: "Assemble chrony diagnostic report"
ansible.builtin.copy:
dest: "{{ report_file }}"
mode: '0644'
content: |
= Chrony Diagnose Report ({{ inventory_hostname }}) ===
OS: {{ ansible_facts['distribution'] }} {{ ansible_facts['distribution_version'] }}
--- Service Status ---
{{ chronyd_service.stdout }}
--- timedatectl ---
{{ timedatectl_status.stdout }}
--- chronyc activity ---
{{ chronyc_activity.stdout }}
--- chronyc ntpdata ---
{{ chronyc_ntpdata.stdout }}
--- chronyc -n sources -v ---
{{ chronyc_sources.stdout }}
Reach Check: {{ reach_status }}
--- chronyc -n sourcestats -v ---
{{ chronyc_sourcestats.stdout }}
--- chronyc -n tracking ---
{{ chronyc_tracking.stdout }}
Sync Check: {{ sync_status }}
--- chronyd -Q test server ---
(server: {{ ntp_test_server }})
{{ chronyd_Q_test.stdout }}
- name: "Display report summary"
ansible.builtin.debug:
msg:
- "π Report generated: {{ report_file }}"
- "Reach Status: {{ reach_status }}"
- "Sync Status: {{ sync_status }}"
π οΈ Foolproof Deployment Guide
Seeing it a thousand times in theory is not as good as doing it once!
Prerequisites
1One Ansible control node.2The target server is configured with SSH trust, and the user executing Ansible has<span><span>sudo</span></span> privileges.3The control node has Ansible installed.
Project Directory Structure
This is a very simple project; you only need a few files!
08_chrony_service_automated_diagnosis/
βββ troubleshooting01_chrony_diagnose.yml # Main diagnostic Playbook
βββ inventory # Host inventory (needs to be created)
How to Use?
1
Create Host Inventory π: Create a <span><span>inventory</span></span> file and fill in the hostnames or IP addresses of your time synchronization servers.
[rhel9]10.66.208.231# or # server1.example.com# server2.example.com
2
Modify Variables βοΈ: Open the <span><span>troubleshooting01_chrony_diagnose.yml</span></span> file and modify the variable section according to your needs, such as NTP test server configuration, report output path, etc.
3
Execute Automation βΆοΈ: Run the following command, then you can go make a cup of coffee βοΈ!
ansible-playbook -i inventory troubleshooting01_chrony_diagnose.yml
π Diagnostic Coverage
β Service Status Check
β’systemctl status chronyd: Service running statusβ’timedatectl: System time statusβ’Service start time and running duration
β Time Synchronization Status Analysis
β’chronyc activity: Active connection statusβ’chronyc ntpdata: Detailed NTP dataβ’chronyc sources: Time source status and connection qualityβ’chronyc sourcestats: Time source statisticsβ’chronyc tracking: Time tracking and synchronization status
β Intelligent Anomaly Detection
β’Reach Value Detection: Automatically determine if it is 377 (normal value)β’Synchronization Status Detection: Automatically determine if NTP is synchronizing correctlyβ’Anomaly Marking: Use β π΄ markers to make problems clear at a glance
β NTP Server Connectivity Testing
β’chronyd -Q: Test connectivity to the specified NTP serverβ’Configurable test server address
β Complete Diagnostic Report
β’Structured report formatβ’Includes all key informationβ’Automatically generated to the specified path
β Comprehensive Error Handling
β’Idempotent designβ’Fault tolerance for failed tasks (failed_when: false)β’Operating system version checkβ’Automatic creation of report directory
π‘ Tips for Use
π― Batch Diagnosis
# Add multiple servers in the inventory
[rhel9]
server1 ansible_host=192.168.1.100
server2 ansible_host=192.168.1.101
server3 ansible_host=192.168.1.102
# Execute in parallel, doubling efficiency
ansible-playbook troubleshooting01_chrony_diagnose.yml -i inventory --forks 10
π§ Custom Configuration
Edit the <span><span>troubleshooting01_chrony_diagnose.yml</span></span> file to adjust according to your environment:
β’Modify NTP test server configurationβ’Adjust report output pathβ’Customize report directory
π Troubleshooting
If you encounter issues, check the generated diagnostic report:
β’Report location:<span><span>/tmp/chrony_reports/chrony_report_[hostname].txt</span></span>β’Contains complete fault cluesβ’Intelligent anomaly marking makes problems clear at a glance
β οΈ Reminder on the Importance of Time Synchronization
Time synchronization issues often affect the entire system cluster; it is recommended to:
β’Regularly check time synchronization statusβ’Set time deviation alertsβ’Establish a time synchronization monitoring mechanism
π― Advanced Usage
Custom Diagnostic Scope
# Check only specific modules
ansible-playbook troubleshooting01_chrony_diagnose.yml -i inventory --tags "service_check"
# Skip certain checks
ansible-playbook troubleshooting01_chrony_diagnose.yml -i inventory --skip-tags "network_check"
Output Format Customization
# Detailed output mode
ansible-playbook troubleshooting01_chrony_diagnose.yml -i inventory -v
# Super detailed output mode
ansible-playbook troubleshooting01_chrony_diagnose.yml -i inventory -vvv
π Bonus Time! Get the Complete Annotated Version!
Do you find the above Playbook not detailed enough? Want to delve into the logic behind every line of code and the best practices recommended by the official documentation?
Let you not only use it but also be able to apply it in various scenarios, becoming the most outstanding Ansible automation expert in your team!
π Click the link below to get the complete annotated and syntax-highlighted Playbook project package download! π
π Summary
This automated analysis solution for chronyd time synchronization failures on RHEL8/9 & CentOS8/9 truly achieves:
β’π Comprehensive Diagnosis: From service status to time synchronization, from network connectivity to anomaly detection, all-around analysisβ’π One-Click Execution: Automation completes all diagnostic steps without manual interventionβ’π Intelligent Reporting: Generates structured diagnostic reports, with intelligent anomaly marking making problems clear at a glanceβ’π§ Highly Customizable: Variable configuration adapts to different time synchronization environmentsβ’π Batch Processing: Supports multi-host parallel diagnosis, doubling efficiencyβ’π‘οΈ Safe and Reliable: Read-only analysis, does not modify system configurations
What are you waiting for? Download this automated diagnostic solution now and improve your time synchronization troubleshooting efficiency by 10 times!
π Advanced Application Scenarios
Enterprise-Level Deployment
β’Multi-Environment Support: Unified time synchronization diagnosis for development, testing, and production environmentsβ’Compliance Checks: Meet enterprise time synchronization audit requirementsβ’Monitoring Integration: Seamlessly integrate with existing monitoring systemsβ’Cluster Management: Unified management of time synchronization status across the entire cluster
Fault Prevention
β’Regular Checks: Set scheduled tasks to proactively discover potential time synchronization issuesβ’Trend Analysis: Analyze time synchronization health trends through historical reportsβ’Alert Mechanism: Set time deviation alert thresholds based on diagnostic resultsβ’Automatic Repair: Combine with other tools to achieve automatic time synchronization repair
Team Collaboration
β’Standardized Processes: Unified team time synchronization troubleshooting standardsβ’Knowledge Accumulation: Solidify expert experience into automation scriptsβ’New Employee Training: Quickly enhance the overall time synchronization technical level of the teamβ’Documentation Management: Establish a time synchronization fault handling knowledge base
Tags:#Ansible #Automation Operations #chronyd Diagnosis #RHEL8 #CentOS8 #Time Synchronization #Operational Efficiency