Fundamentals of C Language: Reducing For Loops to Improve Execution Efficiency

For those engaged in automotive embedded development, the requirements for programming languages seem not to be very high. As long as it works and the program runs, we can patch it up for another three years. However, as developers, we should still strive for improvement. After all, automotive-grade MCUs do not have abundant resources, and optimizing code to enhance product performance is also a feature we can offer. In this article, we will discuss a method to improve program efficiency: reducing for loops to enhance execution efficiency.

1. Example of For Loop Issues

Problem Description: Encrypting data in the address range 0x200000 to 0x2FFFFF, where only 16 Bytes (one Block) can be processed at a time, and the result of the previous calculation is used in the next calculation. Those familiar with CMAC encryption should recognize this as a typical CMAC calculation. In conventional coding, how would we write it? The illustration is as follows:

for (; CmacData.leftDataLen > 16; ){    AesCmac_Cal128BitLen(addressIn);    addressIn += 16U;    CmacData.leftDataLen -= 16U;}

If this seems awkward, it can be transformed into the following while format:

while (CmacData.leftDataLen > 16){    AesCmac_Cal128BitLen(addressIn);    addressIn += 16U;    CmacData.leftDataLen -= 16U;}

The initial length is CmacData.leftDataLen = 0x100000, which is 1 MByte. So, how much time does this method consume? In this article, we will use a Timer to measure the time taken, as illustrated below:

TimeArr[0] = OsTimer0_GetTicks();for (; CmacData.leftDataLen > 16; ){    AesCmac_Cal128BitLen(addressIn);    addressIn += 16U;    CmacData.leftDataLen -= 16U;}TimeArr[1] = OsTimer0_GetTicks() - TimeArr[0];

The average time taken for the for loop is 8567044 Ticks, while the while loop takes an average of 8569820 Ticks. The Timer frequency is 60MHz, which translates to execution times of 142.784ms and 142.830ms, respectively. Can we optimize the above code? Answer: Yes.

2. Reducing For Loops to Improve Efficiency

Optimization Idea: In the above loop, a check is performed every 16 Bytes, and each computation requires address accumulation and length decrement. Therefore, if we increase the byte size to 64 Bytes for a single check, we can reduce the address accumulation and length decrement by three times, thus increasing the granularity of the checks. The modified while code is as follows:

while (CmacData.leftDataLen > 64){    AesCmac_Cal128BitLen(addressIn);    AesCmac_Cal128BitLen(addressIn+16);    AesCmac_Cal128BitLen(addressIn+32);    AesCmac_Cal128BitLen(addressIn+48);    addressIn += 64U;    CmacData.leftDataLen -= 64U;}

Time taken: 7586131 Ticks, which is 126.436ms.

Now, modifying the for code as follows:

for (;CmacData.leftDataLen > 64;){    AesCmac_Cal128BitLen(addressIn);    AesCmac_Cal128BitLen(addressIn+16);    AesCmac_Cal128BitLen(addressIn+32);    AesCmac_Cal128BitLen(addressIn+48);    addressIn += 64U;    CmacData.leftDataLen -= 64U;}

Time taken: 7585583 Ticks, which is 126.426ms.

The comparison of the statistics is as follows:

Fundamentals of C Language: Reducing For Loops to Improve Execution Efficiency

Therefore, when encountering for loops, we can appropriately increase the granularity of the checks.

Previous Highlights

Summary of previous excellent articles on Autosar: 303~350

Summary of previous excellent articles on Autosar: 252~302

Information Security: Principles of AES-CMAC and Signature Verification Strategies

SWAP (A/B Partition) based SOTA Practice: Memory and Partition Relationships

Thoughts on Bootloader Development with Information Security and SWAP (A/B Partition) Functionality

Engineering Issue: Why does the MCU reset when reading a DID?

Click below to follow and discuss Autosar/embedded systems. If needed, contact the author to join the group for more professional answers.

Leave a Comment