In the previous article (Audio on Smartwatches (Part 4): Voice Calls), we discussed voice calls. This article focuses on recording. The recording function saves the captured audio as a file. The supported file formats are two: one is PCM (16K sampling) in WAV format, and the other is AMR-NB (8K sampling) in AMR format. The WAV format is simple: a 44-byte file header + PCM data, as illustrated in the following figure. There are many explanations available online regarding this, so we won’t go into detail here.
The AMR file format is relatively more complex; it starts with a file header, followed by frame-by-frame data, as illustrated in the following figure:

The file header occupies 6 bytes, with the specific content being “#!AMR/n”, represented in hexadecimal as “23 21 41 4D 52 0A”. The following image shows the AMR file opened in text format, where the first 6 bytes of the file header can be seen.

From the figure, we can see that the audio frame is also divided into two parts: the frame header and the frame content. The frame header occupies one byte (8 bits), and the meanings of the various bits are indicated in the figure, where P represents 0, FT occupies 4 bits, ranging from binary 0000 to 0111, corresponding to the 8 bit rates of AMR-NB. Q indicates quality; when it is 1, it represents a good frame, and when it is 0, it represents a bad frame. The frame content contains the AMR-NB encoded bitstream data with a duration of 20 milliseconds. The size of the frame content is determined by the AMR-NB bit rate. For example, at a bit rate of 12.2 kbps, the size of each frame’s bitstream is 122000/50=244 bits (since there are 50 frames per second for 20 ms each). 244/8=30.5 bytes, rounded to 31 bytes. Adding 1 byte for the frame header, the audio frame size is 32 bytes. The table below lists the frame header content and frame sizes for various bit rates.

Recording is mainly divided into ordinary recording (speaking into the MIC and saving the captured audio as a file) and phone recording (saving the audio during a call, which can be further divided into recording only the other party’s voice or recording both parties’ voices together). There are significant differences in processing between ordinary recording and phone recording. First, let’s look at ordinary recording, as illustrated in the following figure:

From the above figure, we can see that PCM data at 48k is obtained from the driver at fixed intervals. If it is to be saved in WAV format, it is resampled to 16k. If it is to be saved in AMR format, it must not only be resampled to 8K but also undergo AMR-NB encoding to obtain the bitstream. Finally, the audio data is sent from the ADSP to the AP via IPC, and the AP saves it as the corresponding file.
Now let’s look at phone recording, as illustrated in the following figure:

From the above figure, we can see that the audio data for recording is taken from the share memory of the ADSP-CP (the share memory of the ADSP-CP contains the audio data for uplink and downlink). After extracting the audio data, it may need to undergo mixing (when recording both parties’ audio), resampling, and AMR-NB encoding (when saving as an AMR file). Finally, the audio data is sent from the ADSP to the AP via IPC, and the AP saves it as the corresponding file.
This concludes the entire series on audio on smartwatches, including architecture, drivers, and basic functions (playing audio files, making calls, and recording).
To facilitate better communication and mutual assistance, I have created a discussion group. If you are interested, please scan the QR code below, and I will add you to the group. Thank you!
AuthorIntroduction:
Graduated with a master’s degree from a well-known 985 engineering university. Engaged in audio software development for nearly 20 years, having worked for two Fortune 500 companies (one being a leading chip design company and the other a leading communication equipment company). Currently employed at a chip design company listed on the Science and Technology Innovation Board. I have experience in many aspects of audio software development, and the articles I write are based on my work experiences and summaries. My technical blog has nearly 3000 followers. I hope to find the valuable information you need here.