What is Base64?
Base64, as the name suggests, is a character set that includes lowercase letters a-z, uppercase letters A-Z, digits 0-9, and symbols “+” and “/”, totaling 64 characters (plus one “=”, making it actually 65 characters; the reason for the “=” will be explained later). Any symbol can be converted into a character from this character set, and this conversion process is called base64 encoding.
How to Convert to Base64
First, convert the string (such as an image) into a binary sequence, then group the bits into sets of 6. If there are not enough bits, pad with zeros. Each 6 bits form a new byte, padding with 00 at the high end, creating a new binary sequence, and finally, find the corresponding character using the base64 index table.
Let’s take an example. Suppose we have the string “abc”; what will the base64 encoding result be?
The string abc corresponds to 3 bytes, totaling 24 bits. Grouped into 4 sets of 6 bits, padding the high bits with 00 results in the base64 encoding of abc being YWJj. This means that the base64 representation is longer than the original string.
Now the question arises: what if the original string is less than 3 bytes, only 1 byte or 2 bytes?
For example, with 2 bytes, following the conversion logic, the third byte only has 4 bits. We need to add two zeros before and after the third group, resulting in the encoded string YWI. To complete 4 bytes, we add an “=” at the end, yielding the base64 encoding: “YWI=”
If the original character is only 1 byte, the principle is similar. The second byte, besides padding two zeros at the front, also needs to pad four zeros at the back, resulting in the string YQ. The remaining two bytes are also filled with the equal sign “=”. Therefore, the base64 encoding of a is YQ==
In summary, as long as the length of the original string is not divisible by 3, the subsequent bits will be padded with 0.
Where is Base64 Used?
1. Images in HTML represented in base64
When you open the Google homepage, you may notice that some images in styles are not resource URLs but base64 encoded strings. What are the benefits of this? Of course, it reduces one HTTP request, but not every image is suitable for base64 processing. The larger the image, the longer the converted base64 string, requiring higher bandwidth.
2. Email transmission
Early email transmission only allowed ASCII characters, which made it impossible to send non-ASCII characters, images, and other binary files. Therefore, MIME extended electronic mail, specifying the encoding format for content transmission, which can be base64. Base64 encoding makes it possible to transmit images in emails.
Base64 content can also be transmitted in URLs.
Most mainstream programming languages have built-in base64 modules that can be called directly, eliminating the need to reinvent the wheel.
Python Example
# Encoding
>>> base64.b64encode(b'abc')
b'YWJj'
# Decoding
>>> base64.b64decode(b'YWJj')
b'abc'
In addition to the basic base64, there is also a URL safe form of encoding, which replaces “+/” with “-_”. The standard Base64 is not suitable for direct transmission in URLs, as URL encoders convert “/” and “+” characters in standard Base64 into forms like “%XX”, and these “%” symbols need to be converted again when stored in a database.
>>> base64.b64encode(b'i\xcf\xbf')
b'ac+/''
# Using