When binary resources such as fonts, images, and sounds are needed in C/C++ source programs, the most common method is to use various tools to convert the required resources into hexadecimal arrays, like this:
const uint8_t xxx[]={ 0x00,0x11,0x22,0x33,…};
The advantage is that it is intuitive and does not require additional operations, but the downside is that the storage density is too low. Considering commas and spaces, on average, each byte of data requires 6 bytes to store, so 10k of data becomes 60k. Additionally, it makes the source program too long; according to the old rules, the screen should not exceed 80 columns, and considering the 2 or 4 spaces at the beginning, only 12 data points can fit in one line, meaning over 800 lines for 10k of data, which is very inconvenient.
Other common methods include adding an assembly source file that includes the required binary resources using .incbin, with a global symbol added for calling in the C/C++ program; or using objcopy to convert the required binary resources into .o files for compilation together, which are all quite cumbersome. The new #embed preprocessor directive introduced in C/C++23 is quite good, but it is not yet applicable.
For larger files, they can be directly downloaded to a specified address in the MCU, and multiple files can be packaged into a tar or romfs image. If computational power and RAM space are sufficient, they can also be packaged into zip or lzma formats and decompressed when needed; these methods will not be discussed here. Let’s return to the initial method and see if we can improve efficiency a bit.
Once, I came across the source code of u8g2, where the font data is as follows:
const uint8_t u8x8_font_amstrad_cpc_extended_f[1796] U8X8_FONT_SECTION("u8x8_font_amstrad_cpc_extended_f") = " \377\1\1\0\0\0\0\0\0\0\0\0\0\0__\0\0\0\0\7\7\0\7\7\0\0\24\177\177\34" "\177\177\24\0\0$*\177\177*\22\0Ff\60\30\14fb\0\60zO]\67zH\0\0\0\0\7" "\7\0\0\0\0\0\34>cA\0\0\0\0Ac>\34\0\0\10*>\34\34>*\10\0\10\10>" ">\10\10\0\0\0\200\340`\0\0\0\0\10\10\10\10\10\10\0\0\0\0``\0\0\0`\60\30\14" ...
Does it look messy? In fact, it converts binary resources into strings, displaying visible characters as they are, while non-visible characters are represented using octal escape sequences. This way, each byte of data saves the 2 bytes for commas and spaces, and octal escape requires a minimum of 2 bytes (\0 ~ \9) and a maximum of 4 bytes (\177 ~ \377), resulting in an average of 3 bytes to represent each byte of data, which is much more efficient. Additionally, the characters \ and ” also need to be escaped as \\ and \”.
Let’s write a program to verify this:
#include<cstdio>
#include<cstdint>
#include<cstdlib>
#include<ctime>
int main(void)
{
const int size=1024;
auto now=time(0);
srand(now);
uint8_t arr[size];
for(int i=0;i<size;i++){
arr[i]=rand()&0xff;
}
int chars_per_line=20;
printf("unsigned char s[] = ");
uint32_t sum=0;
for(int i=0;i<size;i++){
const auto&c=arr[i];
sum+=c;
static bool escaped=false;
if(i % chars_per_line==0){// line start
printf("\"",i);
escaped=false;
}
if(c=='\')printf("\\");// \ and " need to be escaped
else if(c=='"')printf("\"");
else if(c>=33&&c<=126){// if the previous character was escaped, then the next 0-9, A-F, a-f also need to be escaped
if(escaped==true){
if(c>='0' && c<='9')printf("\\%o",c);
elseif(c>='A' && c<='F')printf("\\%o",c);
elseif(c>='a' && c<='f')printf("\\%o",c);
else{
printf("%c",c);
escaped=false;
}
}
else{
printf("%c",c);
escaped=false;
}
}
else{
printf("\\%o",c);
escaped=true;
}
if((i + 1) % chars_per_line == 0)printf("\"\n");// line end
}
printf("\";\n");
printf("sum=%lu\n",sum);// output checksum
}
Generate a 1024-byte random array, then convert it to a string using the above method and redirect the output to another C file, which compiles successfully. Check the output byte count, which is roughly around 3200 bytes, meaning each byte of data is represented by an average of 3.1 bytes, which is much more efficient. Setting 20 data points per line, after multiple trials, the output file’s maximum line length is about 70 columns, not exceeding 80 columns. As for the number of lines, originally it would take over 80 lines, but now only 51 lines are sufficient.
Adding the checksum calculation to the output file, compiling and running it, the checksum consistently matches the previously output checksum. The experiment is successful!
The previous program can actually be implemented more succinctly in Python. Also, it is unnecessary to fix the number of data points per line; it is more convenient to limit the total length of a line. The improved program is as follows:
import sys
if __name__ == "__main__": try: with open(sys.argv[1], "rb") as f: data = f.read() except: print("Failed to open file %s" % sys.argv[1]) exit() line = "\"" escaped = False for c in data: # \ and " need to be escaped if c == ord('\'): line += "\\" elif c == ord('"'): line += "\"" # if it is an ASCII character elif c >= 33 and c <= 126: # if the previous character was an octal escape character, the next character 0-7 also needs to be escaped if escaped == True and c >= ord('0') and c <= ord('7'): line += "\%o" % c # otherwise can output directly else: line += "%c" % c escaped = False # non-ASCII characters converted to octal else: line += "\%o" % c escaped = True # output a line, controlling the length within 80 columns if len(line) > 75: print(line + "\"") line = "\"" escaped = False print(line + "\"")
Execute py convert_oct.py data_file > data.c to convert the required resource file data_file into octal escaped data.c.
To verify the effect, write a test.c:
#include <stdio.h>#include <stdint.h>#include <fcntl.h>
const uint8_t arr[] = #include "data.c";
int main(void){ int size = sizeof(arr) / sizeof(*arr) - 1; _setmode(_fileno(stdout), _O_BINARY); fwrite(arr, 1, size, stdout); return 0;}
Then compile test.c to get a.exe, and run a > data_file2 to restore the resource file. Check the crc32 checksum of both, which are completely identical, success!