Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Click the blue text aboveComplete Decoding of iPhone 6 A8 Processor! Perfect Apple StyleFollow the iPhone Channel

The iPhone Channel is the hottest and most followed Apple-related public account on WeChat! It is a gathering place for Apple enthusiasts! In the Apple fan community, users can help each other and solve daily issues with Apple devices. The editor regularly pushes fresh and fun iPhone tips, games, applications, wallpapers, accessories, news, and more! Looking forward to the attention of Apple fans!

WeChat ID: iphone-apple-ipad

On the surface, Apple is a mobile phone manufacturer, but in recent years it has quietly occupied a certain position in the mobile processor industry, especially with its CPUs: A6 was the first to use the self-designed architecture Swift, and A7 was the first to use the 64-bit ARMv8 architecture Cyclone. What surprises can A8 bring?

Unfortunately, if you are expecting another revolution, you may be disappointed. Both the CPU and GPU of A8 are just a steady improvement, not very aggressive.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

A8

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Photo of A8 core and basic layout

[New Process, Memory]

The biggest highlight of A8 is actually its manufacturing process, which creates several firsts:

It is one of the first mobile processors to use the 20nm process, and it is the most important one.

This is the first time Apple has used the most advanced semiconductor process (the 28nm was adopted for over a year).

This is also the first time Apple has used a non-Samsung factory for manufacturing, of course, it is also the first time using TSMC for manufacturing, and it quickly took away most of the production capacity.

TSMC claims that the 20nm process can increase chip speed by 30% compared to 28nm, or increase integration by 90%, or reduce power consumption by 25%, depending on the chip design. The number of transistors in A8 has doubled, the core area has decreased by 13%, and performance… more on that later.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

The memory subsystem of A8 has basically not changed, only some minor adjustments. The SRAM cache still exists, continuing to serve both the CPU and GPU (considered as a level 3 cache), with a capacity of 4MB, and the memory controller still supports LPDDR3-1600.

Testing has found that the memory bandwidth of A8 has slightly increased compared to A7, by about 2-9%, which is small, indicating further optimization.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Interestingly, memory latency has decreased by about 20ns in the 1-4MB SRAM cache area and the 6+MB memory area, but how this was achieved is still under investigation.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

In addition to the CPU, GPU, cache, and memory, A8 also has many other solid-top functional modules (or DSPs with limited flexibility at most), including audio controllers, USB controllers, video codecs, flash memory controllers, camera ISPs, etc. Specific details about the chip’s location and characteristics are still unclear, but at least we know that A8 supports H.265 format video encoding, although it is currently only used for FaceTime video calls.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Comparison of A8 and A7

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

History of Apple’s self-designed processors

[CPU Architecture: No Revolution, Still Surprising]

More than two years have passed, and we still know very little about the A6 and A7 CPU architectures; Apple has never disclosed any technical details. The same goes for A8, this time we don’t even know the architecture codename (the previous two generations were Swift and Cyclone).

From what we currently know, the A8 CPU architecture design does not have revolutionary changes like A6 and A7; it is just an enhanced version of Cyclone, but that is not necessarily a bad thing. The A7 architecture is already excellent: it has a wide and high IPC design, with very low latency, achieving high performance at very low frequencies. (Intel Core’s philosophy is similar)

The A7 has a frequency of only 1.3GHz, while the A8 is just 1.4GHz, but because of this, Apple does not need to invest too much in frequency to achieve ideal performance while ensuring that power consumption is low enough, which is obviously ideal for mobile devices.

In other words, Apple completed what other chip manufacturers might only achieve next year last year.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Although we no longer need to dig into architectural details, we still want to know what has changed with the A8.

Estimates suggest that the CPU portion of the A8 occupies about 12.2 square millimeters, a 29% reduction compared to the A7’s 17.1 square millimeters, indicating that after fitting more transistors, Apple has managed to reduce the CPU area thanks to the updated process.

From the test data, A8 and A7 CPUs do indeed behave like the same chip in many lower-level tests, with only slight differences in floating-point addition and integer multiplication latencies, and possibly slight variations in buffering and branch prediction.

The A7 has only one integer multiplication unit that requires 4 cycles to execute, while the A8 only needs 3 cycles. More surprisingly, the integer multiplication performance has more than doubled, indicating that there are now 2 integer multiplication units.

Floating-point addition has also seen significant improvement, although not as high, with execution cycles reduced from 5 to 4, still seemingly retaining 3 floating-point ALU units.

Overall, the changes between A8 and A7 resemble Intel’s Tick-Tock model, where the process is upgraded while the architecture is only slightly adjusted and enhanced.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Next, let’s look at some tests that are closer to the surface. Here we use SPEC CPU2000, which was used for PC systems years ago, and it is now perfectly suitable for running on mobile processors.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

The A8’s frequency is 100MHz higher, which is about 7.7%, and excluding this factor, the A8’s performance is still significantly higher than that of the A7, with improvements in all 12 sub-items, especially in MCF, GCC, PerlMmk, and GAP, with the highest increase reaching 55%.

Now let’s look at Geekbench 3, which includes both integer and floating-point sections.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

There are still widespread improvements, with decent increases, the highest reaching 37%, but encryption and decryption lagged behind, occasionally even regressing when excluding frequency factors.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

The floating-point section performed better, with relatively uniform improvements.

Overall, the A8 CPU did not achieve another revolution, but with architectural optimizations, memory latency improvements, and frequency increases, it still made significant progress, completing the basic task of being an “A7 upgrade version”.

Next year, a large number of 64-bit ARM processors will be launched one after another, including the public version A57 and non-public self-designed versions. Apple will face severe challenges, but it is believed that A8 will still have its place.

Although quad-core, hexa-core, and octa-core processors will become mainstream, Apple is still only a dual-core, but it can be seen that Apple’s single-thread IPC has always been excellent, and many workloads still cannot be well supported by multi-core, so A8 doesn’t need to worry.

Of course, what surprises the next generation A9 can bring is even more worth looking forward to.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

[GPU Architecture: Six Cores? Not Necessary]

Cook stated that the A8 GPU performance has increased by up to 50%, which naturally led us to believe that it has changed from four cores to six cores, using Imagination’s top PowerVR GX6650.

However, upon observing the chip, only four GPU cores were found, ruling out the possibility of six cores.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Combining other information, especially Apple’s Metal programming guidelines, it was finally confirmed that A8 uses a four-core PowerVR GX6450, which is an upgraded version of the A7 G6430, based on the latest PowerVR Series6XT architecture, announced at the CES earlier this year.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

The Series6XT is an enhanced version of the Series6 architecture, which was created in 2012 (G6430 belongs to the latter), focusing on various adjustments and optimizations in architecture to enhance performance and add features, such as support for the next-generation texture compression technology ASTC (Adaptive Scalable Texture Compression).

This technology comes from the Khronos Group, which produces standards such as OpenGL and OpenGL ES, and can provide better texture compression performance and finer quality control, while also being a universal format supported by all GPUs.

Apple has always used PowerVR GPUs, consistently supporting PVRTC, PVRTC2, and ASTC can further enhance game graphics and performance.

The Series6XT architecture also has new power management technologies that can reduce power consumption during standby and light-load conditions. For example, the “PowerGearing G6XT” gating technology can individually switch each GPU core (shader cluster/USC) on and off, which naturally extends the standby time of the phone.

The Series6XT has improved overall performance through a series of low-level optimizations, officially claimed to reach up to 50%, which coincides with Apple’s promotional data, but how this is achieved has not been disclosed by Imagination.

We only know that the new architecture has made improvements to the cores (shader clusters) themselves, still maintaining a 16-wide SIMD structure, but each pipeline has added a set of medium/half-precision FP16 ALUs, changing from 2×3 to 2×4, theoretically increasing performance by 33%.

FP16 operations save bandwidth and power compared to FP32, but require careful programming to utilize effectively; otherwise, performance improvements will be limited.

The FP32 part still has two ALUs per pipeline, with a maximum of four FP32 floating-point operations per clock cycle, or 128 MAC (Multiply-Add operations).

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Both front-end and back-end geometric processing and texture filling have also improved, but details are still unclear.

Finally, regarding GPU computing performance, iOS still does not provide good support in this area, and there is no corresponding version of OpenCL. iOS 8 has added the Metal API, which can be used for both graphics and computing, but its effectiveness remains to be observed.

The GPU frequency is completely unknown and not easy to test.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Overall, the GX6450 can achieve a 50% performance increase in some optimized and supported projects, while others may only see 15-20%, generally speaking, it is about 30-35%.

Complete Decoding of iPhone 6 A8 Processor! Perfect Apple Style

Why Four Cores Still?

In fact, Imagination already has more powerful six-core models G6630 and GX6650, which could better meet the needs of larger screens like the iPhone 6 and iPhone 6 Plus, and with support from new processes, transistor and core area can be better controlled. So why stick with four cores?

This again reflects Apple’s design principle: never blindly chase after new things; as long as it is sufficient, it’s good enough.

The A8 CPU is just an evolution, and the GPU does not need to rush for a revolution, especially since the G6430 itself is already very powerful; the iPad Air’s 2048×1536 screen is not an issue, so what is the iPhone 6 Plus’s 1080p? The improvements of the GX6450 itself, along with frequency increases (if any), are just right.

The area of the A7 GPU was 22.1 square millimeters, while the A8 has shrunk to 19.1 square millimeters. It looks ideal, but don’t forget that the 20nm process theoretically could reduce it to 11.1 square millimeters. This indicates that A8 GPU has indeed been enhanced significantly, especially those two clearly visible shared texture units, which will have a crucial impact on texture performance.

In conclusion, we are looking forward to next year’s A9: 16nm process, new CPU architecture, six-core GPU…

Leave a Comment