Author | Jiang BaoshangEditor | Qing MuThis Double Eleven, Apple released a new Mac series, featuring its first self-developed ARM architecture chip. This 5nm SoC (System on Chip) claims to significantly enhance performance.For example, the new MacBook Air boasts a CPU performance increase of 3.5 times, GPU performance increase of 5 times, machine learning capability improvement of 9 times, and solid-state drive performance also doubled compared to the previous generation.The new MacBook Pro sees a CPU performance boost of 2.8 times, GPU enhancement of 5 times, and machine learning capability increase of 11 times.The new Mac Mini has improved energy efficiency by 60%, CPU speed increased by 3 times, and graphics display speed improved by 6 times.How does it perform in real-world scenarios? Many content creators conducted reviews, covering aspects like gaming speed, audio quality, video editing effects…However, reviews focusing on machine learning are relatively scarce.Just as Apple stated in their launch event, can machine learning speeds improve by several times? On Medium, a blogger named Daniel Bourke published a post demonstrating the power of the Apple M1 chip from the perspective of machine learning training evaluation.Bourke conducted two main experiments:CreateML and TensorFlow macOS code. The results of both experiments indicate that computers with the new M1 chip outperform “older” Intel chip computers in terms of running speed. Other bloggers have also published articles proving that:The M1 chip is 14% faster than the RTX 2080Ti GPU during machine learning training!1
Experiment 1: CreateML: Air Better than Pro
Before conducting CreateML training, the author stated that he had never run CreateML before. This review aims to see how powerful this machine learning platform, specifically designed for Apple, really is.Figure Caption: Training Operation InterfaceThe settings for each Mac in this test are as follows:
Problem: Multi-class image classification
Model: CreateML Image Classification (unknown network architecture used by Apple, guessed to be ResNet)
Data Size: 7500 training images, 2500 testing images
Maximum Iterations: 25
Data Augmentation Methods: Flipping, Rotation
The evaluation results are as shown above,the fanless MacBook Air performed the best, with the 7-core GPU M1 outperforming the 8-core M1 GPU of the MacBook Pro. Meanwhile, the 16-inch MacBook Pro had already failed before the training finished.It is evident that Apple’s CreateML platform has been potentially optimized for the M1 chip. Despite having an 8-core dedicated GPU, the Intel-driven MacBook Pro failed to complete the experiment.2
Experiment 2: TensorFlow macOS
At the M1 chip launch event in November, Apple claimed that the new self-developed chip offers faster speeds than previous generations, specifically mentioning deep learning frameworks like TensorFlow.The author then found training results of machine learning models on Macs with M1 and Intel chips from blog posts published by the TensorFlow team and the Apple Machine Learning team.It turns out that Apple has recently released TensorFlow_MacOS, allowing developers to run “native” TensorFlow code on Macs.Thus, the author miraculously installed Apple’s TensorFlow fork into the Python 3.8 environment, creating the following three small experiments without the need for 8-10 hours of troubleshooting.The first experiment involved a basic Convolutional Neural Network (CNN), with the specific model configuration code shown below:Specifically, the author replicated the CNN architecture from a CNN explainer website (TinyVGG) and used a dataset similar to the CreateML test.
Problem: Multi-class image classification
Model: TinyVGG
Data: 7500 training images, 2500 testing images
Number of Classes: 10
Number of Epochs: 5
Batch Size: 32
Note: CNN Explainer Websitehttps://poloclub.github.io/cnn-explainer/The second experiment utilized EfficientNetB0 for transfer learning. Since creating models from scratch is quite cumbersome.The author used an existing untrained architecture and trained it based on specific data. Alternatively, he used a pre-trained architecture like EfficientNet and fine-tuned it based on specific data.
Problem: Multi-class image classification
Model: Headless EfficientNetB0
Data: 750 training images, 625 testing images (2500×0.25 validation_steps parameter)
Number of Classes: 10 (from Food101 dataset)
Number of Epochs: 5
Batch Size: 4 (due to M1’s insufficient memory capacity to handle larger Batch sizes, the author tried 32, 16, 8, but they all failed)
The third experiment was something the author discovered while browsing the issue thread on Apple’s tensorflow_macos GitHub. It included benchmarks run on different machines, so the author decided to add it to the test.
Problem: Multi-class image classification
Model: LeNet
Data: 60,000 training images, 10,000 testing images (MNIST)
Number of Classes: 10
Number of Epochs: 5
Batch Size: 32
3
TensorFlow Code Evaluation Results
In addition to running the above three experiments on MAC, the author also ran them on Google Colab with GPU support (the author’s strategy is: conduct experiments on Google Colab and expand to larger cloud servers when needed).As shown in the figure above, Google Colab performed the best, but the MacBook equipped with the M1 chip did not lag far behind. The MacBook allows experiments to be run locally without the need to be constantly connected to Colab, which is highly convenient.Note that in the above comparison, due to data loading, the first epoch usually takes the longest, hence the comparison is based on training times after the second epoch. Additionally, the Google Colab GPU instance uses pure TensorFlow rather than tensorflow_macos.It is worth noting that in basic CNN and transfer learning experiments, the performance of the M1-equipped computer significantly surpassed that of the Intel-equipped computer. However, in the tensorflow_macos benchmark tests, Intel regained some ground. The author believes this is a clear result of using the GPU during training.
M1 Mac Mini Machine Learning Evaluation
The above is the evaluation of the Air and Pro versions, so how does the new M1 Mac Mini perform? In another Medium blog post, Andrew A Borkowski found that the training speed of the M1 Mac Mini is even faster than the RTX 2080Ti.Specifically, the author installed tensorflow_macos on the Mac Mini according to the instructions on Apple’s GitHub site and completed the classification task on the fashi-MNIST dataset. The test result was: training and testing took 6.70 seconds, 14% faster than the RTX 2080Ti GPU!Note: The experimental configuration for the RTX 2080Ti was System: Linux; CPU: Intel® Core™ i7–9700K; RAM: 32GB; Storage: 1TB SSD. The configuration for the Apple computer was: System: macOS Big Sur; Storage: 512GB SSD; Unified Memory: 8GB; The M1 chip includes 8 CPU cores, 8 GPU cores, and 16 neural engine cores.However, on larger model datasets, the M1 Mac Mini took 2286.16 seconds, which is more than 5 times longer than the Linux machine using the Nvidia RTX 2080Ti GPU!According to the Mac’s Activity Monitor, the CPU usage was minimal, and the GPU was not utilized at all.To summarize:Since M1 TensorFlow is still in the Alpha version, there is hope for the future to utilize the chip’s GPU and neural engine cores to accelerate machine learning training.References:https://towardsdatascience.com/apples-new-m1-chip-is-a-machine-learning-beast-70ca8bfa6203
Click to read the original article, directly to the AAAI group!
Due to the trial of unordered push notifications on WeChat public accounts, you may no longer receive timely notifications from AI Technology Review. To receive AI Technology Review’s reports promptly, please set “AI Technology Review” as a starred account, and frequently click the “Read More” button at the bottom right of the article.