Ansible Firefighting Hotline Series (21): Automated Analysis of Time Synchronization Failures

⏰ Ansible Firefighting Hotline | Struggling with Time Synchronization Failures? One-Click Automated Diagnosis Turns You into a Time Expert!

Are you still struggling with chaotic troubleshooting of chronyd time synchronization failures? Today, we bring you a comprehensive automated analysis solution for chronyd time synchronization failures on RHEL8/9 & CentOS8/9, allowing you to say goodbye to the nightmare of manually typing commands!

🎯 Pain Points Addressed

The daily routine of an operations engineer: time synchronization alerts β†’ manually checking chronyd status β†’ reviewing time source configurations β†’ checking network connectivity β†’ analyzing system logs β†’ troubleshooting firewalls β†’ verifying time deviations… After a series of actions, several hours have passed, and the problem may still be elusive.

Even more frightening is: Time synchronization issues often affect the entire system cluster, and manual troubleshooting can easily overlook key information, lacking systematic analysis and failing to quickly locate the root cause. Have you ever thought that if there were an automated chronyd diagnostic solution, all these problems would be resolved?

✨ Solution Preview

Today, we share an automated analysis solution for chronyd time synchronization failures on RHEL8/9 & CentOS8/9 using Ansible, which includes six core diagnostic modules, standardizing, automating, and intelligentizing your time synchronization troubleshooting!

Results Preview

🧾 Sample of Original Diagnostic Report (results only)

=== Chrony Diagnose Report (10.66.208.231) =
OS: RedHat 9.6

--- Service Status ---
● chronyd.service - NTP client/server
     Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-09-19 16:25:23 CST; 3 days ago

--- timedatectl ---
               Local time: Mon 2025-09-22 20:13:01 CST
           Universal time: Mon 2025-09-22 12:13:01 UTC
System clock synchronized: yes
              NTP service: active

--- chronyc activity ---
200 OK
3 sources online
0 sources offline

--- chronyc -n sources -v ---
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
=============================================================================
^+ 10.11.160.238                 2  10   347   606  -2787us[-2787us] +/-  159ms
^+ 10.2.32.37                    2  10   377  1066  +1412us[+1412us] +/-  157ms
^* 10.2.32.38                    2  10   377   23m  +1533us[+1603us] +/-  144ms

Reach Check: βœ… Reach OK (377)

--- chronyc -n tracking ---
Reference ID    : 0A022026 (10.2.32.38)
Stratum         : 3
Ref time (UTC)  : Mon Sep 22 11:49:55 2025
System time     : 0.000233934 seconds fast of NTP time
Leap status     : Normal

Sync Check: βœ… NTP synchronized

πŸ€” Design Philosophy: Why Our Playbook is a Best Practice?

A professional automation solution is not just a simple pile of commands. Our design philosophy incorporates the core practices advocated by Red Hat, elevating your automation solution from “just works” to “professional and reliable”!

1

Intelligent Anomaly Detection, Problems at a Glance ✨ We not only collect data but also intelligently analyze key metrics. For example, automatically detecting whether the Reach value is 377 (normal value), automatically judging the NTP synchronization status, and using prominent βœ…πŸ”΄ markers to expose problems instantly!

2

Variable-Driven, Flexible Adaptation πŸ’» We centralize all configurable parameters (such as NTP test server, report output path) in the <span><span>vars</span></span> section at the top of the Playbook. This means that when you need to adjust the diagnostic scope, you only need to modify these variables without touching any core automation task logic.

3

Idempotency Assurance, Safe and Worry-Free βœ… All our Playbooks strictly adhere to Ansible’s core principleβ€”idempotency. You can confidently execute this Playbook repeatedly; Ansible will automatically detect the current state and only perform necessary checks.

4

Closed-Loop Verification, Results Visible 🎯 The last step of the Playbook is to generate a complete diagnostic report. This forms a check-analyze-report closed loop. Not only have you executed automation, but you can also immediately see the diagnostic results, ensuring that problems are under control!

⭐ Automation Scenario Rating

Rating Dimension Rating Description
Ease of Use ⭐ One-click execution, detailed comments, beginner-friendly
Reusability ⭐⭐⭐⭐⭐ Variable configuration, supports multi-host parallel execution
Stability ⭐⭐⭐⭐⭐ Idempotent design, comprehensive error handling
Scalability ⭐⭐⭐⭐ Modular design, easy to extend functionality
Best Practice Compliance ⭐⭐⭐⭐⭐ Follows Ansible best practices, code standards

πŸ—‚οΈ Project Directory Structure

08_chrony_service_automated_diagnosis/
β”œβ”€β”€ troubleshooting01_chrony_diagnose.yml   

πŸ“„ Core File Content Overview

🎯 Main Diagnostic Playbook (troubleshooting01_chrony_diagnose.yml)

---
- name: "Chrony Troubleshooting &amp; Diagnostic"
  hosts: rhel9
  gather_facts: true
  become: true

  vars:
    ntp_test_server: "ntp2.ntp-001.prod.iad2.dc.redhat.com"   # Change to your NTP IP
    report_dir: "/tmp/chrony_reports"
    report_file: "{{ report_dir }}/chrony_report_{{ inventory_hostname }}.txt"

  pre_tasks:
    - name: "Assert OS is RHEL/CentOS 7/8/9"
      ansible.builtin.assert:
        that:
          - ansible_facts['os_family']  "RedHat"
          - ansible_facts['distribution_major_version'] | int in [7, 8, 9]
        fail_msg: "❌ This playbook only supports RHEL/CentOS 7/8/9"
        success_msg: "βœ… OS version check passed"

    - name: "Ensure report directory exists"
      ansible.builtin.file:
        path: "{{ report_dir }}"
        state: directory
        mode: '0755'

  tasks:
    # -------------------
    # Basic Checks
    # -------------------
    - name: "Get chronyd service status"
      ansible.builtin.command: systemctl status chronyd
      register: chronyd_service
      failed_when: false

    - name: "Get timedatectl status"
      ansible.builtin.command: timedatectl
      register: timedatectl_status
      failed_when: false

    # -------------------
    # Comprehensive Chrony Diagnostic Commands
    # -------------------
    - name: "Run chronyc activity"
      ansible.builtin.command: chronyc activity
      register: chronyc_activity
      failed_when: false

    - name: "Run chronyc ntpdata"
      ansible.builtin.command: chronyc ntpdata
      register: chronyc_ntpdata
      failed_when: false

    - name: "Run chronyc -n sources -v"
      ansible.builtin.command: chronyc -n sources -v
      register: chronyc_sources
      failed_when: false

    - name: "Run chronyc -n sourcestats -v"
      ansible.builtin.command: chronyc -n sourcestats -v
      register: chronyc_sourcestats
      failed_when: false

    - name: "Run chronyc -n tracking"
      ansible.builtin.command: chronyc -n tracking
      register: chronyc_tracking
      failed_when: false

    - name: "Run chronyd -Q test NTP server"
      ansible.builtin.command: "chronyd -Q 'server {{ ntp_test_server }} iburst'"
      register: chronyd_Q_test
      failed_when: false

    # -------------------
    # Logic Judgments &amp; Anomaly Marking
    # -------------------
    - name: "Mark abnormal Reach"
      set_fact:
        reach_status: |
          {% if '377' not in chronyc_sources.stdout %}
          πŸ”΄ Reach abnormal (not 377)
          {% else %}
          βœ… Reach OK (377)
          {% endif %}

    - name: "Mark abnormal Sync"
      set_fact:
        sync_status: |
          {% if 'Leap status     : Normal' not in chronyc_tracking.stdout %}
          πŸ”΄ NTP not synchronized
          {% else %}
          βœ… NTP synchronized
          {% endif %}

    # -------------------
    # Report Generation
    # -------------------
    - name: "Assemble chrony diagnostic report"
      ansible.builtin.copy:
        dest: "{{ report_file }}"
        mode: '0644'
        content: |
          = Chrony Diagnose Report ({{ inventory_hostname }}) ===
          OS: {{ ansible_facts['distribution'] }} {{ ansible_facts['distribution_version'] }}

          --- Service Status ---
          {{ chronyd_service.stdout }}

          --- timedatectl ---
          {{ timedatectl_status.stdout }}

          --- chronyc activity ---
          {{ chronyc_activity.stdout }}

          --- chronyc ntpdata ---
          {{ chronyc_ntpdata.stdout }}

          --- chronyc -n sources -v ---
          {{ chronyc_sources.stdout }}

          Reach Check: {{ reach_status }}

          --- chronyc -n sourcestats -v ---
          {{ chronyc_sourcestats.stdout }}

          --- chronyc -n tracking ---
          {{ chronyc_tracking.stdout }}

          Sync Check: {{ sync_status }}

          --- chronyd -Q test server ---
          (server: {{ ntp_test_server }})
          {{ chronyd_Q_test.stdout }}

    - name: "Display report summary"
      ansible.builtin.debug:
        msg:
          - "πŸ“‹ Report generated: {{ report_file }}"
          - "Reach Status: {{ reach_status }}"
          - "Sync Status: {{ sync_status }}"

πŸ› οΈ Foolproof Deployment Guide

Seeing it a thousand times in theory is not as good as doing it once!

Prerequisites

1One Ansible control node.2The target server is configured with SSH trust, and the user executing Ansible has<span><span>sudo</span></span> privileges.3The control node has Ansible installed.

Project Directory Structure

This is a very simple project; you only need a few files!

08_chrony_service_automated_diagnosis/
β”œβ”€β”€ troubleshooting01_chrony_diagnose.yml    # Main diagnostic Playbook
└── inventory                        # Host inventory (needs to be created)

How to Use?

1

Create Host Inventory πŸ“: Create a <span><span>inventory</span></span> file and fill in the hostnames or IP addresses of your time synchronization servers.

[rhel9]10.66.208.231# or # server1.example.com# server2.example.com

2

Modify Variables ✏️: Open the <span><span>troubleshooting01_chrony_diagnose.yml</span></span> file and modify the variable section according to your needs, such as NTP test server configuration, report output path, etc.

3

Execute Automation ▢️: Run the following command, then you can go make a cup of coffee β˜•οΈ!

ansible-playbook -i inventory troubleshooting01_chrony_diagnose.yml

πŸ” Diagnostic Coverage

βœ… Service Status Check

β€’systemctl status chronyd: Service running statusβ€’timedatectl: System time statusβ€’Service start time and running duration

βœ… Time Synchronization Status Analysis

β€’chronyc activity: Active connection statusβ€’chronyc ntpdata: Detailed NTP dataβ€’chronyc sources: Time source status and connection qualityβ€’chronyc sourcestats: Time source statisticsβ€’chronyc tracking: Time tracking and synchronization status

βœ… Intelligent Anomaly Detection

β€’Reach Value Detection: Automatically determine if it is 377 (normal value)β€’Synchronization Status Detection: Automatically determine if NTP is synchronizing correctlyβ€’Anomaly Marking: Use βœ…πŸ”΄ markers to make problems clear at a glance

βœ… NTP Server Connectivity Testing

β€’chronyd -Q: Test connectivity to the specified NTP serverβ€’Configurable test server address

βœ… Complete Diagnostic Report

β€’Structured report formatβ€’Includes all key informationβ€’Automatically generated to the specified path

βœ… Comprehensive Error Handling

β€’Idempotent designβ€’Fault tolerance for failed tasks (failed_when: false)β€’Operating system version checkβ€’Automatic creation of report directory

πŸ’‘ Tips for Use

🎯 Batch Diagnosis

# Add multiple servers in the inventory
[rhel9]
server1 ansible_host=192.168.1.100
server2 ansible_host=192.168.1.101
server3 ansible_host=192.168.1.102

# Execute in parallel, doubling efficiency
ansible-playbook troubleshooting01_chrony_diagnose.yml -i inventory --forks 10

πŸ”§ Custom Configuration

Edit the <span><span>troubleshooting01_chrony_diagnose.yml</span></span> file to adjust according to your environment:

β€’Modify NTP test server configurationβ€’Adjust report output pathβ€’Customize report directory

πŸ› Troubleshooting

If you encounter issues, check the generated diagnostic report:

β€’Report location:<span><span>/tmp/chrony_reports/chrony_report_[hostname].txt</span></span>β€’Contains complete fault cluesβ€’Intelligent anomaly marking makes problems clear at a glance

⚠️ Reminder on the Importance of Time Synchronization

Time synchronization issues often affect the entire system cluster; it is recommended to:

β€’Regularly check time synchronization statusβ€’Set time deviation alertsβ€’Establish a time synchronization monitoring mechanism

🎯 Advanced Usage

Custom Diagnostic Scope

# Check only specific modules
ansible-playbook troubleshooting01_chrony_diagnose.yml -i inventory --tags "service_check"

# Skip certain checks
ansible-playbook troubleshooting01_chrony_diagnose.yml -i inventory --skip-tags "network_check"

Output Format Customization

# Detailed output mode
ansible-playbook troubleshooting01_chrony_diagnose.yml -i inventory -v

# Super detailed output mode
ansible-playbook troubleshooting01_chrony_diagnose.yml -i inventory -vvv

🎁 Bonus Time! Get the Complete Annotated Version!

Do you find the above Playbook not detailed enough? Want to delve into the logic behind every line of code and the best practices recommended by the official documentation?

Let you not only use it but also be able to apply it in various scenarios, becoming the most outstanding Ansible automation expert in your team!

πŸ‘‰ Click the link below to get the complete annotated and syntax-highlighted Playbook project package download! πŸ‘ˆ

🎁 Summary

This automated analysis solution for chronyd time synchronization failures on RHEL8/9 & CentOS8/9 truly achieves:

β€’πŸ” Comprehensive Diagnosis: From service status to time synchronization, from network connectivity to anomaly detection, all-around analysisβ€’πŸš€ One-Click Execution: Automation completes all diagnostic steps without manual interventionβ€’πŸ“Š Intelligent Reporting: Generates structured diagnostic reports, with intelligent anomaly marking making problems clear at a glanceβ€’πŸ”§ Highly Customizable: Variable configuration adapts to different time synchronization environmentsβ€’πŸ“ˆ Batch Processing: Supports multi-host parallel diagnosis, doubling efficiencyβ€’πŸ›‘οΈ Safe and Reliable: Read-only analysis, does not modify system configurations

What are you waiting for? Download this automated diagnostic solution now and improve your time synchronization troubleshooting efficiency by 10 times!

πŸš€ Advanced Application Scenarios

Enterprise-Level Deployment

β€’Multi-Environment Support: Unified time synchronization diagnosis for development, testing, and production environmentsβ€’Compliance Checks: Meet enterprise time synchronization audit requirementsβ€’Monitoring Integration: Seamlessly integrate with existing monitoring systemsβ€’Cluster Management: Unified management of time synchronization status across the entire cluster

Fault Prevention

β€’Regular Checks: Set scheduled tasks to proactively discover potential time synchronization issuesβ€’Trend Analysis: Analyze time synchronization health trends through historical reportsβ€’Alert Mechanism: Set time deviation alert thresholds based on diagnostic resultsβ€’Automatic Repair: Combine with other tools to achieve automatic time synchronization repair

Team Collaboration

β€’Standardized Processes: Unified team time synchronization troubleshooting standardsβ€’Knowledge Accumulation: Solidify expert experience into automation scriptsβ€’New Employee Training: Quickly enhance the overall time synchronization technical level of the teamβ€’Documentation Management: Establish a time synchronization fault handling knowledge base

Tags:#Ansible #Automation Operations #chronyd Diagnosis #RHEL8 #CentOS8 #Time Synchronization #Operational Efficiency

Leave a Comment