Implementing a Frequency Domain Equalizer on Android

Author Profile: leilei, Engineer at TianTian Pictorial

This article is mainly divided into three parts:

1. Creation and rendering process of the existing audio control sticker

2. Implementation of the FFT algorithm to convert time domain information into frequency domain information

3. Attaching the generated equalizer to the lenses of 3D glasses

1. Creation and rendering process of the existing audio control sticker

The process of the existing audio control sticker is shown in the figure:

1) Configure audio control switch, audio control amplitude, and other parameters in the configuration file

2) Create the corresponding filter based on the configuration and add it to the filter chain

3) Frame rendering, real-time calculation of audio control factors to adjust sticker size

Creating filters in the filter chain is relatively complex; different filters can determine whether they need to be created based on multiple parameters in the StickerItem:

The commonly used filters in applications generally include BaseFilter, such as the commonly used CopyFilter; or inherit from VideoBaseFilter, such as FaceCopyFilter, etc. The audio control material inherits from NormalVideoFilter, which not only contains TriggerCtrlItem to control parameter updates based on multiple parameters:

And it includes a video player and an audio player (which will not be discussed in this article).

NormalVideoFilter updates parameters before each rendering, and the code related to audio control ultimately calls the TriggerCtrlItem’s isAudioTriggered() function to obtain the decibel level:

AudioDataManager is a singleton class that manages sound, providing microphone recorded audio data by default or receiving and saving audio data from other sources.

The DecibelDetector class asynchronously processes microphone audio data at a frequency of 80ms per time, implemented using the Android system’s AudioRecord class.

2. Implementation of the FFT algorithm to convert time domain information into frequency domain information

From the first section, we can see that the original audio control sound decibel data db comes from the AudioDataManager class, with the default microphone data source being the DecibelDetector class. Let’s take a look at the implementation:

Here, BUFFER_SIZE is the length of the time domain data obtained from each sampling.

The sampling frequency is 32kHz, mono, and 16-bit PCM encoding, resulting in a short array of length BUFFER_SIZE, which is the time domain data obtained from one sampling.

The original DB data is derived as follows:

In simple terms, it involves taking the absolute value of each item in the short array, summing, averaging, taking the logarithm, and multiplying by a coefficient.

Based on the existing pathway, the function to obtain FFT data is as follows:

Next, let’s detail the implementation of FFT:

FFT stands for Fast Fourier Transform. To understand FFT, we need to first introduce DFT, which stands for Discrete Fourier Transform.

Here is a diagram showing the conversion between time domain and frequency domain for DFT:

On the left is the time domain waveform, and on the right is the time domain data.

There are many introductions to DFT or FFT online; here I will share my personal understanding for reference:

1) DFT

The DFT algorithm is the core of converting time domain to frequency domain. The FFT algorithm is its optimization, so I will introduce DFT first, then FFT.

The formula for DFT is as follows:

Let me simplify the input and output:

Where x(n) is the input short array, X(k) is the output frequency domain array of DFT, with n ranging from [0, N) and k ranging from [0, N).

Thus, the DFT algorithm converts an N-length array x into a new N-length array X. The difference between the two is:

The index of the x array represents fixed time intervals, while the index of the X array represents fixed frequency intervals. For example: x[0] represents the amplitude at time 0, and x[1] represents the amplitude at 40ms; X[0] represents the total amplitude of the wave at frequency 0, and X[1] represents the total amplitude of the wave at frequency 20Hz.

“`

Note:

1) The index meaning of the array here is derived from the properties of the sound itself: sampling rate, channels, etc. If the sampling frequency is high, the frequency interval of the array index will be larger while keeping N constant.

2) Different FFT algorithms can only affect the accuracy of frequency data but cannot change the maximum frequency range.

3) Due to frequency aliasing, the maximum frequency obtained from FFT is half of the sampling rate. The data from FFT is only valid for the first N/2 data; the second half of the data is completely symmetrical with the first half.

“`

Once we clarify the input and output results, let’s take a look at the expanded DFT formula:

k = [0, N). Here, the calculation of X(0) requires data from x[0] to x[N-1]. Each time we calculate an X data point, we need to traverse the input data, resulting in a time complexity of O(N^2). The principle of the DFT formula and the representation of determinants are quite complex and will be discussed in a future article.

2) FFT

The FFT algorithm can optimize DFT to a complexity of O(NlogN). Here, I will introduce the radix-2 FFT algorithm. The classic idea of optimizing the DFT algorithm is to use divide and conquer, and the radix-2 FFT algorithm is one such DFT algorithm:

It divides an input subset of length N into two subsets of N/2 and calculates them separately until the length is reduced to 2 for N/2 subsets, and then calculates the 2-point DFT.

Here, I will not detail the formula calculations; let me briefly mention the key points when using it: dividing subsets. The division of FFT is not a simple halving; it requires odd-even partitioning:

X(k) is the dataset with indices [0 – N-1], divided into G(k) and H(k);

G(k) has indices 0, 2, 4, ……, N-4, N-2, while H(k) has indices 1, 3, 5, …… ,N-3, N-1.

Here, the range of k has changed: k = [0, N/2).

The computation factors of DFT, each round of calculations only requires k computations. The number of rounds is logN.

1) The period of X(k) is N

2) The periods of G(k) and H(k) are N/2, and the indices of k are all [0, N/2)

The above formulas can be represented using butterfly diagrams:

This decomposes the N-period X set into N/2-period G and H sets. If N=2, we get the result directly. Now let’s look at the butterfly diagram for N=8:

We can see that the order of x(k) on the left has changed due to the odd-even partitioning. To facilitate calculations, the FFT needs to be reversed before the bottom-up computation; the reversal algorithm is as follows:

The indices of x(k) on the left of the butterfly diagram are the results of the sorting algorithm. xin[k] is the sorted complex array x(k). The butterfly calculation diagram is as follows:

cc is complex multiplication, cut is complex subtraction, and sum is complex addition. Each round’s intermediate results are saved in the corresponding position of xin. Finally, we obtain the FFT result X(k). For more implementation details, please refer to: https://www.cnblogs.com/Free-Thinker/p/4759949.html

3. Attaching the generated equalizer to the lenses of 3D glasses

This part is implemented based on existing 3D sticker materials. To attach the equalizer to the 3D lenses, it is necessary to obtain the current material of the 3D glasses’ lenses, and then attach the equalizer on top. The implementation of 3D stickers uses the gameplay engine. I won’t go into detail about gameplay (mainly because I don’t understand it very well), but simply put:

The material of the 3D sticker is based on nodes, and in the visitScene(Node *node) function, each node (including the material images inside the lenses) will be parsed. When re-adding the configuration parameters, the “__audio__” tag is added to indicate usage, and the texture is saved during parsing.

Thus, the texture rendered in the upper part is linked to the 3D model, and the final effect is as follows:

4. Conclusion

This article mainly introduces the method of converting recording from time domain data to frequency domain data, and all the code and specific implementation are based on Android. The FFT code originates from the internet, and the explanation of FFT mainly comes from K.R. Rao’s “Fast Fourier Transform: Algorithms and Applications”. The FFT algorithm is profound and extensive; this article mainly introduces the implementation of radix-2 FFT, along with radix-3, radix-4, and even 2D and 3D FFT algorithms. If readers are interested, they can refer to that book. The implementation of attaching the equalizer to the 3D model actually involves complex applications of OpenGL, but thanks to the excellent code encapsulation in existing applications, even beginners like me can make slight modifications to achieve a relatively cool 3D audio control effect.

References:

[1] https://www.cnblogs.com/luoqingyu/p/5930181.html

[2] K.R. Rao et al., translated by Wan Shuai, “Fast Fourier Transform: Algorithms and Applications”

[3] https://www.cnblogs.com/Free-Thinker/p/4759949.html

Postscript: TianTian Pictorial is an industry-leading image processing and camera beautification app developed by Tencent. You are welcome to scan or search for our WeChat public account: “TianTian Pictorial Engineer”, where we will continuously share our technical practices and look forward to exchanging and learning together!

Join Us: The TianTian Pictorial technical team is hiring:

(1) AND / iOS Development Engineer (2) Image Processing Algorithm Engineer

We look forward to having interested or recommended tech talents join us (based in Shanghai)! Contact: [email protected]

Related posts

Leave a Comment Cancel reply