Click on the card below to follow Arm Technology Academy
This article is authorized and reprinted from the WeChat public account Arm Selected. This article mainly shares the architecture interpretation of A53 cache.
1 A53 uses the classic big-LITTLE architecture
Below is an early classic big-LITTLE architecture diagram.
Figure 1
Figure 2
2 A53’s cache configuration
L1 I-Cache
● Configurable: 8KB, 16KB, 32KB, or 64KB
● Cache line: 64 bytes
● 2-way set associative
● 128-bit read L2 memory interface
L1 D-Cache
● Configurable: 8KB, 16KB, 32KB, or 64KB
● Cache line: 64 bytes
● 4-way set associative
● 256-bit write L2 memory interface
● 128-bit read L2 memory interface
● 64-bit read from L1 to datapath
● 128-bit write from datapath to L1
L2 memory System
● Integrates SCU (Snoop Control Unit), can connect up to 4 cores
● SCU internally duplicates the TAGs of L1 Data Cache
● The interface of the L2 memory system can be ACE or CHI, 128-bit width
L2 cache
● Configurable: 128KB, 256KB, 512KB, 1MB, and 2MB.
● Cache line: 64 bytes
● Physically indexed and tagged cache (PIPT)
● 16-way set associative structure
L1 data cache TAG
The L1 Data cache of A53 follows the MOESI protocol, as shown below, the tag in the L1 data cache contains the MOESI state bits.
Figure 3
MOESI state
Figure 4
L1 Instruction cache TAG
The L1 instruction cache is read-only, so there is no need for hardware to maintain consistency between instruction caches across multiple cores, hence it does not follow the MOESI protocol. Below is the TAG of the L1 Instruction cache, where the flags are minimal, with no MESI state bits.
Figure 5
3 Cache hierarchy:
● L1 cache is private to the core.
● L2 cache is shared within the cluster.
Figure 6
4 L2 memory system introduction
In the big.LITTLE architecture, there is an SCU unit in the cluster, which mainly executes and maintains the consistency of the L1 cache (using the MESI protocol or its variants like the MOESI protocol).
Figure 7
In the L2 Memory System, in addition to containing the L2 cache, it will also include L1 Duplicate tag RAM (which actually refers to the L1 Data Cache Tags).
Figure 8
5 Cache consistency between multiple clusters
The interface between the cluster and the outside world can be ACE or CHI (currently ACE is commonly used, but the trend may shift to CHI in the future).
Figure 9
● If ACE is used, the consistency between multiple clusters is maintained by CCI+ACE.
● If CHI is used, the consistency between multiple clusters is maintained by CMN+CHI.
Figure 10
6 Introduction to CCI (Taking CCI-550 as an example)
CCI-550 includes an inclusive snoop filter, which is used to record the data stored in the ACE main cache.
The snoop filter can respond to snoop transactions in the case of misses and snoop appropriately only when there is a hit. Snoop filter entries are maintained by observing transactions from the ACE master node to determine when entries must be allocated and deallocated.
The snoop filter can respond to multiple consistency requests without broadcasting to all ACE interfaces. For example, if the address is not in any cache, the snoop filter will respond with a miss and direct the request to memory. If the address is in the processor cache, the request is treated as a hit and directed to the ACE port that contains that address in its cache.
Figure 11
Figure 12
7 Classic example framework
Figure 13
Recommended Reading
-
Cortex-M3 Beginner’s Guide (1): Overview of the Architecture
-
What You Didn’t Know About Arm v7/Arm v8/Arm v9 Architectures
-
ARM Series — Armv8-A
Long press to identify the QR code to add the WeChat of Miss Ji Shu (aijishu20), and join the reader group of Arm Technology Academy.
Follow Arm Technology Academy
Click on the “Read Original” below to read more articles from the Arm Selected column.。
Leave a Comment
Your email address will not be published. Required fields are marked *