Click on the “Programmer Technical Expert” above, follow and select “Set as Star”

Reply with “Join Group” to get qualification for group discussions!

In a modern distributed system, “time accuracy” is the underlying cornerstone of all computations, yet it is often the most overlooked foundational capability. From log alignment, monitoring alerts, transaction consistency, to container orchestration, certificate validation, and message delay calculations… A deviation of just a few seconds in system time can trigger a series of hard-to-locate online issues.

This article will provide a panoramic explanation of Linux system time synchronization from the perspectives of principles, tools, production implementation, architectural design, and troubleshooting methods, making it suitable for technical sharing or internal training materials.

1. Why is Time Synchronization So Important?

In distributed systems, what we need more is “time consistency across all machines,” rather than just the correctness of a single machine’s time.

Problems Caused by Time Desynchronization

1. Log Misalignment

When troubleshooting issues, you may find that Service A calls Service B at 10:01, but Service B’s log shows 09:59, which will lead to:

Broken call chain
Unable to align TraceID
Monitoring graphs showing misalignment

2. Distributed System Consistency Failures

For example:

Redis’s EXPIRE judgment error leading to premature or delayed key expiration
Zookeeper/Kafka’s election mechanism confusion due to time dependencies
Distributed locks expiring early causing “lock contention safety issues”
Database transaction timeout judgment anomalies

3. Impact on Security Mechanisms

JWT token showing “not yet effective” or “expired”
HTTPS certificate validation failures (common browser errors)

4. Monitoring and Alert Anomalies

Prometheus/Grafana graphs showing gaps, even generating “phantom alerts”.

In short: Time synchronization is the foundation of reliability in production-level systems.

2. Linux Time Architecture

Linux has two sets of time systems:

Name	Type	Power Dependency	Usage
RTC (Real-Time Clock)	Clock on the BIOS motherboard	Not affected by power outages	Initializes the system clock at startup
System Clock	Maintained in memory by the kernel	Fails on shutdown	Time actually used by applications

At startup:

RTC → System Clock (synchronized once at boot)

Afterwards:

System Clock = Kernel Tick + NTP/Chrony calibration

Note:

Time in containers should remain consistent with the host machine
System Clock in virtual machines is more prone to drift

3. Comparison of Mainstream Time Synchronization Tools

Tool	Type	Advantages	Recommended Scenarios
chronyd (Recommended)	NTP Client/Server	High precision, fast speed, supports virtualization, supports offline drift calculation	Enterprise-level production environments
ntpd	Traditional NTP daemon	Long history	Not recommended, do not use for new projects
systemd-timesyncd	Lightweight SNTP	Simple, lightweight	Containers or lightweight systems
hwclock	Adjust hardware clock	Adjust RTC	Used for synchronization before and after startup

Best choice for production environments: chrony (compatible, stable, high precision)

4. Chrony: The Preferred Solution for Enterprise-Level Time Synchronization

1. Installation

CentOS / Rocky Linux

yum install chrony -y

Ubuntu / Debian

apt install chrony -y

2. Configuration (/etc/chrony.conf)

Below is a typical configuration suitable for enterprises:

# Upstream NTP servers, multiple can be configured
server ntp.aliyun.com iburst
server time1.cloud.tencent.com iburst
server cn.pool.ntp.org iburst
# Allow clients in the local area network to synchronize (can be opened as needed for multiple data centers)
allow 192.168.0.0/16
allow 10.0.0.0/8
# Specify local hardware clock
rtcsync
# Time drift record file for automatic calibration
driftfile /var/lib/chrony/drift
# Allow system to predict drift in case of disconnection
local stratum 10

3. Start the service

systemctl enable --now chronyd

4. Check Synchronization Status

Check overall quality:

chronyc tracking

Check synchronization sources:

chronyc sources -v

Example of field meanings:

Stratum: Level, 1 is the highest, normal values are usually between 2~4
Offset: Offset between local machine and time source (the smaller the better, in microseconds)
Ref time: Time of the last synchronization

5. Force Immediate Calibration (default does not allow large time adjustments at once)

If the local time deviation exceeds 1000 seconds, NTP will not immediately adjust by default, but will slowly “pull back”.

Force immediate correction:

chronyc makestep

5. Building an Internal NTP Server for Enterprises (Recommended Architecture)

For large enterprises or multiple IDC data centers, the following architecture can be adopted:

National Time Center / Aliyun NTP / Tencent Cloud NTP
                       │              Company Level NTP (Stratum 2)
                  10.10.1.10 / 10.10.1.11
                       │           ┌───────────┴───────────┐
           │                         │
   Data Center A Secondary NTP              Data Center B Secondary NTP
   (Stratum 3)                 (Stratum 3)
           │                         │
       All business servers, load balancers, databases, K8s nodes

Example configuration for enterprise NTP Server:

server ntp.aliyun.com iburst
server time.google.com iburst
local stratum 2
allow 10.0.0.0/8

This means:

Secondary servers can continue to synchronize downwards
All machines in the production environment rely only on internal NTP, not directly requesting the public network

Advantages:

Safe and stable, not affected by network fluctuations
High time consistency within the same data center (deviation <1ms)
Reduces pressure on public NTP services

6. systemd-timesyncd (Commonly Used in Lightweight Systems)

Used for lightweight installations, in scenarios without chronyd (e.g., containers, IoT).

Check status:

timedatectl

Enable synchronization:

timedatectl set-ntp true

Note:
Do not replace chrony in production environments.

7. Common Time Synchronization Failures and Troubleshooting Methods

1. NTP Server Unreachable

Troubleshooting:

chronyc sources -v

If you see:

^? unreachable

It indicates:

UDP port 123 is not open
DNS resolution issues
Public NTP standard restrictions

Solution:

firewall-cmd --add-port=123/udp --permanent
firewall-cmd --reload

2. Severe Time Drift in Virtual Machines

Virtual machines may experience unstable ticks due to CPU scheduling anomalies.

Solution:

Kernel Parameter Adjustment

grubby --update-kernel=ALL --args="tsc=reliable"

Use chrony (better than ntpd)

Chrony has many optimizations for virtualization.

3. Time Inconsistency in Containers (Docker/K8s)

Containers do not maintain their own time; the time is determined by the host machine.

Recommendations:

Configure chrony on the host machine
Do not run chronyd inside containers
All K8s nodes must connect to the same time source

4. Time is Wrong Again After Restart

Reason: Inaccurate hardware RTC.

Synchronize RTC:

hwclock --systohc

Read from RTC:

hwclock --hctosys

8. Summary of Best Practices for Production

✅ 1. Use chrony uniformly

Stable, fast, high precision, suitable for large-scale virtual machine scenarios.

✅ 2. Unified NTP source across multiple data centers

Ensure all server time deviations are <1ms.

✅ 3. Deploy enterprise-level NTP Server in core data centers

Reduce external network dependencies and improve security.

✅ 4. Focus on time synchronization in container clusters and virtualized environments

Avoid drift causing distributed issues.

✅ 5. Check NTP configuration after system upgrades

Some images and automation tools may overwrite configurations.

✅ 6. Use makestep for large deviations to force calibration

Avoid long periods of inconsistency due to “slow pull back”.

Time synchronization is one of the most critical infrastructures in distributed systems. It may not be as visible as CPU or memory, but it determines the reliability baseline of the system.

Thank you for reading, and feel free to share any suggestions regarding this article. Follow me for more technical insights!

Comprehensive Analysis of Time Synchronization in Linux Systems

Today Cloudflare had a global incident, can your ChatGPT still access it?
Understand distributed storage in one article: mainstream technologies, applicable scenarios, and selection guide (super detailed)
From MCP to RAG to Agent: The next leap in AI application architecture
Implementing traffic mirroring with Nginx Mirror module