Cloud Computing Solutions for Urban Rail CBTC Signal Systems

Cloud Computing Solutions for Urban Rail CBTC Signal Systems

Source: Shanghai Electric Thales TST

Current Status of Safety Computer Platforms

The safety computer platform is the foundation for achieving functional safety in the urban rail CBTC signal system and is a core key technology. Due to the limitations in chip and operating system technology development, the industry still predominantly relies on imported equipment for safety computer platforms. From the perspective of the entire lifecycle operation and maintenance, existing technologies still have some application limitations.

Dedicated Hardware

In the urban rail CBTC signal system, the safety computer platform is primarily applied in the ATP and interlocking subsystems, achieving safety functions such as business logic computation, sensor data collection, and external device state monitoring. The safety computer platform is one of the key core technologies of the signal system, and from a hardware perspective, existing systems mainly adopt two architectural solutions:
1. Embedded computer solutions; 2. General industrial control computers with dedicated safety boards.
Among them, the existing CBTC onboard controllers mostly adopt embedded solutions. The advantages of embedded board cards include small size, low power consumption, better adaptability to power fluctuations, mechanical shock and vibration, and extreme temperature environments, as well as better electromagnetic compatibility. The downside is that due to the requirements for computing power and operating environments, the chips used in embedded safety computers are generally SoC processors, which integrate many external interfaces, such as network interfaces, serial ports, IO interfaces, CAN buses, etc. The advantages are high integration, proximity of the computing layer to the driving control layer, ensuring space, energy consumption, and real-time performance; the disadvantage is that costs are relatively high, and due to market conditions, geopolitical factors, and production volume, long-term supply cannot be guaranteed, necessitating inventory or continuous upgrades and replacements to maintain stable service for subway customers throughout the signal system lifecycle. Other subsystems, such as ground area controllers, drive control units, and axle counting systems, also mostly adopt embedded solutions.
The general industrial control computer solution is more commonly used for computer interlocking safety calculations. Its hardware is more general, easy to procure, and has relatively high computing power, but according to traditional safety platform design thinking, it requires dedicated safety boards to achieve safe shutdown and switching. The general industrial control computer solution is not as good as the embedded solution in terms of space, energy consumption, and adaptability to harsh environments.

Software-Hardware Binding

In embedded computer solutions, due to the low universality of SoC chips, software is required to adapt and develop for the chips. The safety platform software needs to be tightly integrated with the operating system and hardware to provide hardware drivers and monitoring services for the business application layer, ensuring the correctness and timeliness of business logic execution. The strong coupling between software and hardware means that changing chips entails a significant amount of software adaptation work, and the software lifecycle is limited by the hardware lifecycle.

Limitations of Multi-core Processors

On the other hand, with the development of chip technology, multi-core CPUs or SoCs have been widely adopted. From the architecture of multi-core processors, resources such as cache and memory need to be shared among multi-cores, which may lead to contention conflicts. In general applications, general-purpose operating systems dynamically allocate actual resources to applications based on resource usage to ensure operational efficiency. However, in safety applications, to ensure the time determinism of safety task execution and continuous monitoring of hardware, existing safety computer platforms restrict safety tasks to fixed cores, with fixed memory space mapping, leading to resource waste and limiting computational upgrades for applications.

Cloud Computing Solutions for Urban Rail CBTC Signal Systems

Figure 1 Multi-core General Processor Architecture

Cloud Computing Solutions

To achieve complete autonomy and control of the system, the safety computer platform is one of the key directions for the subsequent development of domestic urban rail signal systems. There are two technical paths: one is to replace the existing embedded safety computer platform with domestic chips and operating systems; the other is to use relatively autonomous general-purpose computers to replace embedded safety computers. Implementing secure cloud computing based on general-purpose shelf server clusters belongs to the second technical path, and its inherent computing advantages will provide technical conditions for the sustainable development of the signal system.

Virtualization

Virtualization is the foundation of cloud computing. The cloud computing platform uses general-purpose shelf servers with multi-core processors that have powerful computing capabilities. Compared to distributed embedded computing, virtualization technology allows multiple applications to be centralized on the same physical machine, fully utilizing hardware capabilities. In the aviation field, to address the issues of scattered controllers and numerous communication cables between controllers under distributed architecture, which lead to a series of problems such as space, weight, and energy consumption, and as the number of controllers continues to increase due to added functionalities, the aviation field proposed the Integrated Modular Architecture (IMA) in the early 1990s. The first version of the ARINC 653 operating system specification was released in 1996 to guide the application of virtualization technology in the aviation field, initially targeting single-core processors to run multiple applications on one processor. The ARINC 653 P1-4 specification added support for multi-core processors in 2015. In 2018, Wind River released a multi-core version of the ARINC specification operating system. The automotive industry’s AUTOSAR architecture also defines technical specifications for multi-core processors.

The application of signal systems is relatively simple, and the homogeneity of the applications is high. Based on the characteristics of safety applications that require continuous periodic operation from the system startup, the self-developed safety cloud platform by TST allocates virtual machines for safety applications to run fixedly on one CPU core. A multi-core CPU can run multiple safety applications. Taking the trackside area controller as an example, a general server can provide control for multiple control areas in terms of hardware. At the same time, non-safety tasks such as local ATS can also be allocated to independent cores, achieving coexistence of safety and non-safety applications.

Cloud Computing Solutions for Urban Rail CBTC Signal Systems

Figure 2 Multi-core Processors and Virtual Machines

For safety applications, after virtualization, the isolation of running resources is one of the key technologies. To achieve coexistence of multiple applications on single-core or multi-core processors, IEC61508-3 added a critical safety requirement in Appendix F of the 2010 version: independence of execution for each application. This primarily involves temporal and spatial isolation of CPU and memory, avoiding mutual interference caused by errors in different applications, and preventing common mode failures in hardware, which is also essential for information security. The TST safety cloud platform achieves CPU isolation by fixed allocation of the virtual machine’s core, while memory isolation relies on partition mapping technology to statically map different memory partitions to different virtual machines to achieve isolation.
The operation of virtual machines is scheduled and supervised through a supervisory module. Through real-time scheduling algorithms, performance loss caused by virtual machines competing for shared resources (mainly caches and external interfaces) is avoided, and processing delays are limited.

Secure Cloud

Although a single server system after virtualization is often referred to as cloud computing technology, it is not a true secure cloud technology without server clusters and the migratable deployment of safety applications.

Using server clusters, multiple instances of safety applications can be deployed on different physical servers. For example, to avoid erroneous outputs caused by common mode failures, SIL4 safety applications typically adopt 2oo2 or 2oo3 voting mechanisms. Under traditional architectures, dual-redundant or triple-redundant safety application instances run on different processor boards or hosts, with key hardware such as CPUs, storage, power supply, and clocks completely independent. By supporting server clusters, we can achieve the same level of hardware independence as traditional solutions. For example, in a 2oo2 application, one instance runs on server 1, while the other instance runs on server 2. These two instances communicate over the network for synchronization and voting, ensuring safe output.

Cloud Computing Solutions for Urban Rail CBTC Signal Systems

Figure 3 Secure Cloud

With server clusters, safety applications can achieve higher redundancy. By simply increasing the number of server hardware, an Nx2oo2 system architecture can be constructed. At the same time, through fault isolation and virtual machine migration technologies, in the event of a virtual machine failure or shutdown, safety applications can be transferred to different servers, facilitating server maintenance and upgrades.

Signal Systems Based on Secure Cloud

Based on secure cloud technology and real-time communication transmission technologies, compact edge processors are deployed at trackside or stations, with most safety computing concentrated on the secure cloud platform, significantly reducing the number of dedicated safety processing platforms and avoiding “bottleneck” technology issues. At the same time, it supports the deployment of safety and non-safety applications on the same platform, achieving a “cloud for train control systems” that supports new general software architectures and existing software architectures, facilitating the migration of existing systems. The centralized general IT architecture allows for quick updates and iterations, with numerous tools available for easy deployment, management, and maintenance. Given that server cluster costs are controllable, from the perspective of intuitively replacing trackside embedded safety computers, a single server cluster can be deployed for one line or multiple lines, and due to sufficient software-hardware decoupling and gradual performance improvements of servers, the total lifecycle cost will be lower than that of existing systems.

Cloud Computing Solutions for Urban Rail CBTC Signal Systems

Figure 4 Secure Cloud Signal System

Conclusion

Urban rail has developed from a peak period of planning and construction to a stage of large-scale construction and refined enhancement. Long-term investment has placed a burden on urban finances, and the industry emphasizes the sustainability and economical, efficient operation and maintenance of urban rail.

The CBTC signal system based on cloud computing solutions features miniaturization of station and trackside equipment (future onboard functions can also adopt cloud computing solutions to achieve miniaturization of onboard equipment), easy availability of general servers, decoupling of software and hardware, support for different CBTC system software architectures (traditional CBTC, interoperable CBTC, FAO, TACS, etc.), and high-performance computing resources. These characteristics provide significant advantages over existing signal systems in terms of equipment carbon footprint, train energy-saving operation, and overall life system upgrades, providing key technical conditions for the high-quality sustainable development of urban rail.

Additional Notes

Differences Between Secure Cloud and General Cloud

1. Time-Determined Computation

For secure clouds, safety logic computations must be completed within determined time limits, thus secure clouds need to provide real-time services for applications to ensure that safety tasks can be processed within fixed time frames. The tasks executed in the secure cloud are anticipated, and the maximum number of tasks that can be supported is also fixed, meaning that the execution efficiency of tasks will not be affected by the increase or decrease of simultaneously executed tasks.
In general clouds, resource allocation is done based on priority, but due to the lack of functional safety requirements, the worst-case task time limits are not fixed and may be affected by different loaded tasks, leading to inevitable efficiency fluctuations.
2. Resource AllocationFor safety applications, running resources such as CPU and memory are fixedly allocated, and resources will only be reallocated to pre-set resource groups upon detection of hardware faults, such as CPU core faults or memory data faults. This allocation method facilitates online hardware supervision and also enables hardware isolation between safety task instances. It is also one of the conditions for achieving time-determined computations.
In general clouds, resource allocation is demand-driven and dynamic. Each application instance may execute on different physical CPUs and memory areas after startup, and due to load balancing and other reasons, may run on different resources at different times. Generally, application layers do not need to monitor the health status of CPUs and memory, and resource reallocation after faults is entirely left to the cloud platform. Resource reallocation due to competition among tasks leading to delays is normal.
3. Functional Safety ServicesSecure clouds uniquely provide functional safety services, including voting, online hardware supervision, and safety communication protocols. These services are not provided in general clouds.

Gao Xiang Director of Design Center, Shanghai Electric Thales Transportation Automation System Co., Ltd.

Cloud Computing Solutions for Urban Rail CBTC Signal Systems

Leave a Comment

×