Research on Data Compliance Business Encryption Algorithms (Part 1)

Research on Data Compliance Business Encryption Algorithms (Part 1)
Research on Data Compliance Business Encryption Algorithms (Part 1)

Research on Data Compliance Business Encryption Algorithms (Part 1)

This article has a total of1,0414 words, recommended reading time18 minutes.

Introduction

Data compliance business is receiving increasing attention. However, unlike other compliance businesses, data compliance is closely related to information technology and involves a large number of technical concepts. Among these technical concepts, the most difficult to understand is the issue of “encryption algorithms”. Indeed, encryption algorithms are not a new phenomenon, and there is already a considerable amount of literature on the principles and implementations of encryption algorithms. Unfortunately, the audience of this literature is not usually legal professionals. Moreover, even for information technology professionals, fully understanding encryption algorithms is not an easy task. Therefore, the purpose of this article is to help legal colleagues without a technical background understand the basic concepts and principles of encryption algorithms and to understand the issues regarding the use of encryption algorithms in common scenarios of data compliance business.

It should be particularly noted that the essence of encryption algorithms is a typical mathematical problem. However, understanding algorithms through mathematical formulas and applying them to compliance scenarios requires a long learning process and a high learning cost. Therefore, this article will try to avoid mathematical formulas and use textual examples and visual charts to explain algorithm issues, providing authoritative English explanations for the concepts and terms involved, to facilitate readers’ use in compliance business.

1. The Importance of Encryption Algorithms

Before discussing the importance of encryption algorithms, let’s first review the definition of the three principles of information security (CIA), which are:

1. Confidentiality, ensuring that data is not subject to unauthorized access.

2. Integrity, ensuring that data is not subject to unauthorized alteration or destruction.

3. Availability, ensuring that data can be accessed in a timely and uninterrupted manner.

Among the three principles of information security, encryption algorithms are closely related to data confidentiality and data integrity, and indirectly related to data availability. In short, without using various encryption algorithms to encrypt, verify, and validate data, there can be no discussion of the integrity and confidentiality of data; if the integrity and confidentiality of data do not exist, then the availability of data is meaningless.

For these reasons, encryption algorithms are used in almost all scenarios that require information confidentiality and transaction security. In the fields of finance, personal consumption, education and training, and healthcare, all situations involving data transmission, verification, storage, utilization, destruction, identity authentication, access control, and behavior auditing require varying degrees of encryption measures.

Therefore, encryption algorithms are the most important and core area in data compliance business.

2. Basic Concepts of Encryption Algorithms that Must be Understood in Data Compliance

Before introducing the basic concepts of encryption algorithms, we must first clarify the meanings of some basic terms related to encryption algorithms. These terms are easily confused, and using them incorrectly in the compliance field can lead to adverse consequences.

1. Cryptology

According to Article 2 of the “Cryptography Law of the People’s Republic of China”: “The cryptography referred to in this law refers to the technology, products, and services that use specific transformation methods to encrypt and protect information, as well as for security authentication.”

From the above provisions, it is not difficult to see that the “cryptography” in the “Cryptography Law” has a completely different meaning from the “password” we usually refer to. The “cryptography” in the law is a rather large concept, which includes all technologies, products, and services related to encryption and decryption.

In fact, the cryptography in the law refers to “cryptology” or “cryptographic technology”. What we usually refer to as a password generally refers to the string of characters entered when logging into a computer or various websites. Because most websites or apps replace the password entered by the user with asterisks or other methods for security reasons, people mistakenly believe that the string entered during login is a “password”, further confusing the concept of “password” with that of “key” and “ciphertext”. This is a misconception that must be avoided in compliance business. In fact, the string we enter when logging in, which is invisible, should be referred to as a “passphrase”.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 1: Passphrase entered when logging into a website

2. Plaintext and Ciphertext

(Ciphertext/Plaintext)

Plaintext is readable text, while ciphertext is unreadable text. Plaintext is converted into ciphertext through encryption, and ciphertext is converted back to plaintext through decryption. Legal professionals should be careful not to confuse “ciphertext” with concepts like “password” and “key”.

3. Encryption and Decryption

(Encipherment/Encryption)

Confidentiality is a fundamental need of humanity. Many times, we want confidential information to be visible only to specific individuals, such as ourselves or friends. Therefore, people may lock diaries in safes. However, as society develops, the flow of information is inevitable. We need to write letters, make phone calls, and send emails. Therefore, a mechanism is needed that allows confidential information to be seen by others without revealing its true meaning, which creates the need for encryption and decryption.

Encryption is the process of converting plaintext into ciphertext. Decryption is the process of converting ciphertext (unreadable text) back into plaintext (readable text). Encryption and decryption are inseparable concepts; information must be decrypted after encryption (when the information reaches the true recipient, it must be converted back into understandable content), and before decryption, it must first be encrypted (before the information reaches the true recipient, it must be converted into an unintelligible state).

4. Encryption Algorithm

(Cryptographic Algorithm)

An encryption algorithm is a mathematical method that specifies how to perform encryption and decryption. Encryption algorithms are generally public, thoroughly researched, and widely used, so relying solely on an encryption algorithm is not sufficient for encrypting and decrypting information; specific information’s encryption and decryption also require a key.

5. Key

Key refers to the critical parameter (string) information used to encrypt or decrypt information using a specific algorithm.

6. Key Space

(Key Space)

Key space refers to the numerical range of keys. Under the condition that the principles of modern encryption algorithms remain largely unchanged, their security is mainly determined by the length and complexity of the key. The larger the key space, the greater the key length and randomness, and the higher the encryption security.

Next, we will explain the basic principles of encryption, decryption, algorithms, and keys through a vivid example:

Lawyer Zou wants to meet client A, and he sets the meeting place at KFC at 4:30 PM on May 3, 2020, i.e., “2020-0503-1630-KFC”. Due to the significance of the case, client A does not want anyone to know he is meeting with the lawyer, and he does not trust any communication tools for security. Therefore, Lawyer Zou needs to find a way to hide the meeting time and place so that anyone other than client A cannot understand it.

Lawyer Zou uses a simple algorithm to encrypt the string “2020-0503-1630-KFC”. The encryption and decryption consist of three steps:

1. First, swap the positions of every two adjacent characters in the plaintext. After the swap, “2020-0503-1630-KFC” becomes “0202-5030-6103-FKC”.

2. Then, swap the first and last characters of the string obtained after the first step. After the swap, “0202-5030-6103-FKC” becomes “C202-5030-6103-FK0”.

3. When client A receives “C202-5030-6103-FK0”, he starts to decrypt it. First, he swaps the first and last characters, obtaining “0202-5030-6103-FKC”, and then swaps every two adjacent characters again to finally obtain “2020-0503-1630-KFC”, which is the true meeting time and place.

Thus, Lawyer Zou has converted the real meeting time and place “2020-0503-1630-KFC” into “C202-5030-6103-FK0” using a simple displacement method. This method is extremely simple yet very effective. When you see the string “C202-5030-6103-FK0”, you certainly cannot guess that Lawyer Zou is meeting client A at KFC at 4:30 PM on May 3, 2020.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Table 1: Lawyer Zou’s encryption process and client’s decryption steps

At this point, do you think Lawyer Zou’s problem has been perfectly solved? The answer is: No. Although the above algorithm can hide the true information of Lawyer Zou’s meeting with the client, Lawyer Zou soon discovers that this algorithm has two problems:

1. The content of the ciphertext seems easy to guess. For example, fixed positions in the encrypted information will always contain characters like “C, K, F” and “D, M, C”. If someone sees these characters frequently, they may inevitably associate them with KFC, McDonald’s, Starbucks, etc.

2. Lawyer Zou has many contacts; besides client A, there are colleagues B, classmates C, etc. If he wants the information to be confidential for each contact, Lawyer Zou must prepare a set of encryption algorithms for each contact. In this case, the design and management pressure of the algorithm is too great.

After some thought, Lawyer Zou comes up with a perfect solution to the above problems, which is to use a key in the encryption algorithm!

As mentioned earlier, Lawyer Zou has converted the real meeting time and place “2020-0503-1630-KFC” into “C202-5030-6103-FK0” using the displacement method. On this basis, he uses the key “213” to perform addition on each corresponding character in the string “C202-5030-6103-FK0”. For example:

The first character of the string “C202-5030-6103-FK0” is “C”, and the first digit of the key “213” is “2”, so “C+2=E” (the letter “C” moves two positions forward in the alphabet to become “E”);

The second character of the string “C202-5030-6103-FK0” is “2”, and the second digit of the key “213” is “1”, so “2+1=3”;

The third character of the string “C202-5030-6103-FK0” is “0”, and the third digit of the key “213” is “3”, so “0+3=3”;

Continuing this way, after the encryption operation, “C202-5030-6103-FK0” becomes “E334-6351-9316-HL3”.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Table 2: Encryption using key “213”

Thus, a very understandable “2020-0503-1630-KFC” is finally encrypted into “E334-6351-9316-HL3”. When seeing this string of characters, no one can guess that Lawyer Zou is going to meet with the client, and Lawyer Zou does not have to worry about having too many contacts. He can assign different keys to each object while keeping the encryption algorithm unchanged. For example, client A’s key is “213”, colleague B’s key is “112”, and classmate C’s key is “311”. Even if Lawyer Zou’s contacts know his encryption algorithm, they still cannot decrypt the ciphertext that does not belong to them due to different keys.

The story of Lawyer Zou is complete. The above example is just to help everyone better understand the basic principles of encryption algorithms, but real-world encryption algorithms are much more complex. According to the “Cryptography Law”, cryptography is divided into core cryptography, ordinary cryptography, and commercial cryptography. Core and ordinary cryptography are used to protect state secret information, while commercial cryptography is used to protect general information. However, whether it is core, ordinary, or commercial cryptography, the principles are basically the same, and the differences mainly lie in the complexity of the keys, that is, the length of the keys. In fact, “Lawyer Zou’s story” already basically conforms to the prototype of a “symmetric encryption algorithm”.

To facilitate everyone’s understanding of the above encryption principles, we will later implement the above encryption algorithm in Python and open-source the source code in the “TianTong Electronic Evidence Laboratory” WeChat mini-program. In addition, the above source code is not only for demonstration purposes; it also has certain practical value. In life, it is difficult for us to remember every login password for websites and apps. If we use the same login password for every website and app, there are security issues. Everyone can use this source code to encrypt the login passwords for various websites. As long as the key is kept safe, there will be no need to worry about the confidentiality and forgetfulness of passwords.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 2: “TianTong Electronic Evidence Laboratory” WeChat mini-program

3. Common Types of Encryption Algorithms in Data Compliance Business

Encryption algorithms can be classified into several types: symmetric encryption algorithms, asymmetric encryption algorithms, hybrid encryption algorithms, and message digest algorithms.

1. Symmetric Encryption Algorithm

(Symmetric Cryptographic Algorithm)

It is an algorithm that uses the same key for both encryption and decryption. The encryption algorithm used by Lawyer Zou earlier is a typical symmetric encryption algorithm. The key “213” can be used by Lawyer Zou for encryption and by client A for decryption.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 3: Symmetric encryption algorithm operation process

Common symmetric encryption algorithms include the following:

(1) DES Algorithm (Data Encryption Standard):

The Data Encryption Standard algorithm was published by the U.S. government in 1977. Due to insufficient encryption strength, it has now been replaced by AES.

(2) 3DES Algorithm (Triple Data Encryption Algorithm):

The Triple Data Encryption Standard algorithm strengthens the DES algorithm by using three keys while maintaining the DES algorithm unchanged.

(3) AES Algorithm (Advanced Encryption Standard):

The Advanced Encryption Standard algorithm was published by the U.S. government in 2000 to replace the insecure DES algorithm.

(4) National Secret SM1 Algorithm:

A symmetric encryption algorithm published by the National Cryptography Administration of China. This algorithm has encryption strength comparable to AES, but its algorithm is integrated into hardware and is not disclosed. It is now widely used in finance, transactions, government affairs, and various other occasions.

2. Asymmetric Encryption Algorithm

(Asymmetric Cryptographic Algorithm)

In simple terms, it is an algorithm that uses different keys for encryption and decryption; that is, data encrypted with key A must be decrypted with another key of A, and vice versa.

Therefore, the asymmetric encryption algorithm has two keys, namely the “public key” and the “private key”. Data encrypted with the “public key” must be decrypted with the “private key”, and cannot be decrypted with the “public key”. The “public key” is completely public and can be freely published; while the “private key” is kept completely confidential and must be strictly safeguarded by the individual. Since asymmetric encryption algorithms involve many application scenarios in business compliance and are relatively difficult to understand, this article will focus on them.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 4: Asymmetric encryption algorithm process

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 5: Public key and private key of the asymmetric encryption algorithm

The functions and significance of asymmetric encryption algorithms are:

(1) Realizing non-repudiation of identity

For example, if Lawyer Zou wants to treat all colleagues to KFC, he wants to send the following message to all colleagues: “2020-0503-1630-KFC”. How can everyone know that this message was sent by Lawyer Zou and not a prank by Lawyer Wang?

If using an asymmetric encryption algorithm, this problem can be easily solved. Lawyer Zou uses his private key, which he has kept confidential and never disclosed, to encrypt the message “2020-0503-1630-KFC” and send it to his colleagues. Since the message encrypted with Lawyer Zou’s “private key” can only be decrypted with Lawyer Zou’s “public key”, which has been publicly disclosed to all colleagues, everyone just needs to use Lawyer Zou’s “public key” to decrypt the ciphertext. If the decryption is successful, it can be confirmed that the message comes from Lawyer Zou.

(2) Facilitating key distribution

The symmetric encryption algorithm has a natural flaw: the security of the key distribution process cannot be guaranteed. Taking Lawyer Zou as an example, he must first tell client A that their mutual key is “213” before starting encrypted communication. However, if the key “213” leaks during the process of informing, confidentiality is no longer possible.

In fact, before the invention of asymmetric encryption algorithms, people widely used symmetric encryption algorithms. Due to the bidirectional nature of symmetric encryption algorithms, keys must be kept strictly confidential, and sometimes even delivered in person, which causes very low efficiency.

(3) Facilitating key management

Let’s return to Lawyer Zou’s story. Lawyer Zou assigned a key to each contact based on a simple symmetric encryption algorithm, such as client A’s key being “213”, colleague B’s key being “112”, and classmate C’s key being “311”. But let’s imagine that if one day, client A’s case is concluded, can key “213” be assigned to the next client? Obviously not. If Lawyer Zou frequently signs new clients, he would have to maintain a large key list to ensure that each person’s key is unique, which would inevitably cause him a headache.

Not only Lawyer Zou has this concern, but if we want to ensure confidentiality in all communications with the outside world (who wouldn’t want that?), and we are using symmetric encryption algorithms, we must ensure that in every “point-to-point” communication, the keys are unique. For example, the keys between Lawyer Zou and client A, Lawyer Zou and colleague B, colleague B and client A, colleague B and classmate C, and client A and classmate C must all be unique. As the number of communication partners increases, the number of keys will grow geometrically.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Table 3: Relationship between the number of keys and the number of participants

However, if asymmetric encryption algorithms are used, the above problems do not exist. In the scenario of asymmetric encryption algorithms, each object only has a pair of keys (public key and private key), which can complete confidential communication.

At this point, you might think that asymmetric encryption algorithms have only advantages, and it seems that symmetric encryption algorithms are unnecessary in the presence of asymmetric encryption algorithms. However, the reality is different. Although asymmetric encryption algorithms have many advantages, they have a fatal flaw: the encryption and decryption processes require a large amount of mathematical computation, which greatly affects the system’s operating speed. The time required for asymmetric encryption algorithm operations is about 1000 to 10000 times that of symmetric encryption algorithms. Therefore, asymmetric encryption algorithms are only suitable for limited scenarios, such as identity authentication, session key transmission, or “non-repudiation” (non-repudiation of identity or actions) scenarios.

The following table analyzes the advantages and disadvantages of symmetric and asymmetric encryption algorithms:

Research on Data Compliance Business Encryption Algorithms (Part 1)

Table 4: Differences between symmetric and asymmetric encryption algorithms

The most successful and well-known algorithm product in asymmetric encryption algorithms is the famous RSA. RSA is currently the most mainstream asymmetric encryption algorithm. The RSA algorithm was invented in 1977 by Ron Rivest, Adi Shamir, and Leonard Adleman from MIT. RSA is composed of the initials of their last names. To this day, almost all internet application scenarios involve RSA encryption algorithms. RSA has, to some extent, changed our way of life, so let us pay tribute to these three outstanding scientists!

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 6: The three scientists who invented the RSA algorithm

The mathematical principle of RSA is: multiplying two large prime numbers is very easy, but factoring their product is extremely difficult. Therefore, the product can be made public as the encryption key, i.e., the public key, while the two large prime numbers form the private key. The public key is completely open to the outside world, while the private key is kept strictly confidential. Data encrypted with the public key can only be decrypted with the private key, and vice versa.

The RSA encryption algorithm can be found everywhere in internet applications, and the digital certificates in our apps are generated using the RSA algorithm.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 7: Digital certificates in apps

When we browse the web on the internet, the HTTPS protocol that ensures encrypted communication is also based on the RSA encryption algorithm.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 8: HTTPS protocol

In addition to RSA, we may also encounter elliptic curve algorithms (Elliptic Curve Cryptography, ECC) and SM2 (National Secret Elliptic Curve Algorithm) in compliance application scenarios. Since RSA was invented earlier, the efficiency of these two elliptic curve algorithms is higher than that of RSA (with the same key length, the ciphertext is more difficult to crack), but RSA is still the most widely used asymmetric encryption algorithm.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 9: Announcements related to elliptic curve algorithms

It should be particularly noted that we will also mention another ECC check algorithm (Error Checking and Correction, ECC) later in the text. These two meanings of “ECC” are completely different; the ECC elliptic curve algorithm is aimed at data confidentiality, while the ECC check algorithm is aimed at data integrity, so legal professionals should not confuse them.

3. Hybrid Encryption Algorithm

(Hybrid Cipher Algorithm)

As mentioned earlier, the advantage of symmetric encryption algorithms is their fast operation speed. Their disadvantage is that they cannot verify the identities of the sender and receiver, and there are difficulties in key distribution and management.

The advantage of asymmetric encryption algorithms is that they can verify the identities of the sender and receiver, achieving non-repudiation. At the same time, key distribution and management are very convenient. Their disadvantage is that they require a large amount of mathematical computation, making them slower.

From the above characteristics, it is not difficult to see that symmetric encryption algorithms and asymmetric encryption algorithms have their strengths and weaknesses, and the advantages of one complement the disadvantages of the other. Therefore, people have attempted to use both algorithms simultaneously in business scenarios, giving rise to hybrid encryption.

The basic principle of hybrid encryption is: in scenarios where the amount of data to be encrypted is small but of high importance, such as identifying the identities of the sender and receiver, or transmitting the “session key” (we will explain the session key issue later), the computationally intensive but less efficient asymmetric encryption algorithm is used; while in scenarios where the amount of data to be encrypted is large but of lower importance (note that here, lower importance does not mean unimportant), the efficient symmetric encryption algorithm is used.

Let’s continue with an example:

Client A is Lawyer Zou’s foreign client. In a technical secret case, client A intends to send a 100G electronic evidence file from abroad to Lawyer Zou. The evidence involves the client’s core secrets and must not be seen by unrelated personnel, so the evidence file needs to be encrypted for transmission. Lawyer Zou encounters difficulties in choosing an encryption algorithm.

If he uses a symmetric encryption algorithm, Lawyer Zou must inform client A of the mutual key in the email or chat software. However, during this process, the key may leak, and once the key leaks, the consequences can be disastrous.

If he uses an asymmetric encryption algorithm, since both Lawyer Zou and client A know each other’s public keys, client A can first encrypt the electronic evidence using Lawyer Zou’s public key, and Lawyer Zou can decrypt it with his private key after receiving the evidence. However, since the data volume is 100G, using the asymmetric encryption algorithm for transmission would take a long time, making it inefficient.

(1) Both Lawyer Zou and client A have a set of public and private keys, where:

Lawyer Zou’s public key is “555”, and his private key is “666”;

Client A’s public key is “777”, and his private key is “888”;

Lawyer Zou and client A each know the other’s public key but do not know each other’s private key.

(2) Lawyer Zou chooses to use a symmetric encryption algorithm to encrypt the evidence file and uses the symmetric key “213” as the key to encrypt and decrypt the 100G evidence file.

(3) Before Lawyer Zou transmits the symmetric key “213” to client A, he first encrypts “213” using client A’s public key “777”, so client A can decrypt it with his private key “888” to obtain the key “213”.

(4) To ensure that client A can verify the message containing the key comes from Lawyer Zou, Lawyer Zou encrypts the result of the previous step (the encryption of “213” using client A’s public key) again with his private key “666”. This way, client A can use Lawyer Zou’s public key “555” to decrypt it and ensure that the message transmitting the symmetric key “213” comes from Lawyer Zou.

(5) Client A uses the symmetric encryption algorithm and symmetric key “213” to encrypt and transmit the 100G data to Lawyer Zou. After receiving the encrypted data, Lawyer Zou uses the symmetric key “213” to decrypt the data and obtain the true evidence file.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 10: Steps for encrypting and decrypting evidence and keys using a hybrid encryption algorithm

Please note that the public and private keys in the above example are only for illustrative purposes. The lengths of public and private keys in real encryption algorithms are usually much greater than those in the example—typically at least 512 bits or more, and completely random. The following image shows a pair of real public and private keys generated using RSA tools.

Research on Data Compliance Business Encryption Algorithms (Part 1)Figure 11: Real public key

Research on Data Compliance Business Encryption Algorithms (Part 1)Figure 12: Real private key

Regarding the two typical application scenarios of asymmetric and hybrid encryption algorithms: PKI (Public Key Infrastructure) and CA (Certificate Authority). We will detail these in later sections.

4. Message Digest Algorithm

(Message Digest Algorithm)

Strictly speaking, the message digest algorithm is not a typical “encryption algorithm”; at least it was not originally designed for encryption. The message digest algorithm is also known as a hash algorithm or a hashing algorithm. Hashing refers to mixing, piecing together, or rephrasing. Therefore, when legal professionals encounter terms like “hashing algorithm”, “digest algorithm”, or “message digest algorithm”, they should understand that these concepts refer to the same type of algorithm.

Through the message digest algorithm, a longer piece of data can be mapped to a shorter piece of data, and this smaller data is the hash value of the larger data. The hash value is similar to the “DNA” of a file. It is unique; once the data changes, even if it is a minor change, its hash value will also change significantly. For example, in a novel file of one million words, if even one comma is modified, the hash value of this file will change significantly.

The message digest algorithm has the characteristic of “irreversibility”. As the message digest algorithm performs many complex operations of “discarding” and “shifting” on the target file during computation, it becomes impossible to deduce the original content of the file from a hash value. Therefore, anyone who only has the hash value of a file cannot deduce the content corresponding to that hash value. This is why the message digest algorithm is not a dedicated “encryption algorithm”. Because encryption algorithms must be reversible. However, the message digest algorithm is “irreversible” and cannot be used to deduce plaintext.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 13: The “irreversible” message digest algorithm

Although the message digest algorithm is not a dedicated “encryption algorithm”, its “irreversibility” feature makes it frequently used for “field encryption” and “file signing”, or in conjunction with other encryption methods to protect highly confidential data. For example, the login passwords of websites and apps must not be stored in plaintext form in databases; they must be saved in the form of message digests.

Regarding the application scenario of “field encryption”, you may recall that we often forget the login password for a certain website. However, when we try to retrieve the password, the website usually requires us to reset the password rather than directly tell us the forgotten user password. This is because user passwords are not stored in plaintext but in the form of hashes on the website, and the website does not know our passwords, so it can only reset them. The significance of this practice is that if hackers attack and obtain the target website’s database, they will not know our user passwords.

However, it should be particularly noted that in the scenario of encrypting key fields in a database, the message digest algorithm is not absolutely secure. Although the message digest algorithm is “irreversible”, hackers can deduce the hash value corresponding to a certain user password through a “rainbow table”. The “salting algorithm” can largely resolve this issue (we will discuss the principle of rainbow tables and the use of salting algorithms later due to space limitations).

Additionally, hash values theoretically have a certain probability of collision (i.e., two different files having the same hash value). There have been news reports about Professor Wang Xiaoyun from Shandong University cracking the MD5 and SHA1 algorithms. However, this “cracking” does not refer to deducing the original content from a hash value. This “cracking” refers to using powerful computing devices to construct two different contents that have the same hash value. It is important to clarify that the files referred to here are generally not the PDF or WORD files we commonly use. In the signature field of PDF, WORD, and other files, we often need to identify local modifications of two basically identical files. For example, maliciously changing the amount in a contract from ten thousand to one hundred thousand, and then ensuring that the hash value of the modified contract file remains the same as that of the original file. Currently, it is still technically impossible to achieve such targeted modifications. Therefore, we can consider that hashing algorithms remain sufficiently secure in the signature field of PDF, WORD, etc. However, caution should be exercised when using SHA1 algorithms in signature fields such as digital certificates.

Currently, the security of the SHA1 algorithm is widely recognized as insufficient in the industry, and foreign laboratories have constructed two different PDF files with the same SHA1 value. However, the computational cost and practical effects of this technology still need to be discussed. Regarding the security issues of the SHA1 algorithm, we will not elaborate further due to space constraints. If you are interested, we can discuss it separately.

Therefore, if you see clients using MD5 or SHA1, SHA256, etc., message digest algorithms in business scenarios, do not directly conclude that these algorithms have been cracked and are unusable based on the above news (in data compliance scenarios, this mainly involves the renovation of old systems). The security issues of MD5 and SHA1 algorithms need to be judged in conjunction with specific scenarios and cannot be generalized.

In data compliance scenarios, common message digest algorithms include the following:

(1) MD Series Hash Algorithms (Message Digest)

MD is short for message digest, and the MD series algorithms include MD1, MD2, MD3, MD4, and of course, the famous MD5.

(2) SHA Series Hash Algorithms (Secure Hash Algorithm)

SHA is short for secure hash algorithm. The SHA series algorithms include SHA-1, SHA-224, SHA-256, SHA-384, SHA-512.

(3) National Secret SM3 Algorithm

SM3 is a “domestic” hashing algorithm that produces a 256-bit message digest for messages of any length, with security and efficiency comparable to SHA-256.

Research on Data Compliance Business Encryption Algorithms (Part 1)

Figure 14: Announcements related to the SM3 algorithm

The numerical part in hashing algorithms (e.g., MD5, SHA256) often represents the size of the key space in that algorithm. The larger the key space, the higher the security. Therefore, in compliance scenarios, it is recommended that clients prioritize the use of SM3, MD5, SHA256, SHA-384, and SHA-512 algorithms.

Conclusion

This article has introduced the basic concepts of encryption algorithms. I hope the above content can help everyone fully understand the relevant terminology and principles of encryption algorithms. Since algorithms are not a simple issue, and the audience of this article is legal professionals, it is necessary to convert between mathematical formulas and everyday concepts when explaining viewpoints. There may inevitably be some omissions in the related discussions, and I ask for your understanding! If you have any questions or suggestions regarding this article, please feel free to contact the author for discussion!

In the next article, we will provide detailed introductions to the specific applications of encryption algorithms in compliance scenarios. Thank you all!

Disclaimer

This article and its content are for communication purposes only and do not represent legal opinions, suggestions, or decision-making basis issued by TianTong Law Firm or its lawyers. If you need legal advice or other professional analysis, please contact the host of this article. No text, images, audio, video, or other content of this article may be reproduced without authorization. If you need to reprint or quote, please contact the official account backend for authorization and clearly indicate the source, column, and author information when reprinting.

Research on Data Compliance Business Encryption Algorithms (Part 1)

“TianTong Internet Affairs” column is hosted/written by Lawyer Zou Xiaocheng, and this column deeply studies the relevant technical and legal issues in the competitive field of the internet industry, intellectual property field, and information security field, aiming to create a new “marginal discipline”. We hope to share information, collide ideas, and spread knowledge and experience with legal colleagues and internet industry peers through this column. If you have any thoughts, opinions, or suggestions, please feel free to leave a message at the end of this article..

To view recent articles, please click the following links:

  • Timestamp or Blockchain? A Guide to Choosing Third-Party Evidence Platforms

  • SVN Server Evidence Effectiveness Recognition—Based on Practical Cases

  • Technical Liberation of Hands? Discussion on the Evidence Effectiveness of Web Crawler Programs

  • Heavy Topic! A Self-Rescue Guide for Law Firms’ Information Security Failures

  • Learn this electronic evidence trick to instantly handle massive text stamping reviews

  • Open to the entire industry! TianTong is building the first electronic evidence laboratory in the industry

  • High compensation evidence guide for software copyright infringement cases

  • Can device identifiers be collected? Thoughts on game companies collecting personal information of game users under the Personal Information Protection Law

  • A picture to help you understand the evolution of judgment thinking in gaming cases, recommended for collection!

  • Better to forget each other in the rivers and lakes—A brief discussion on the right to be forgotten

  • Analysis of similarity comparison methods for copyright infringement of computer software copyright

  • Internet dispute cases under the new anti-monopoly law judicial interpretation—From the perspective of data acquisition methods

  • Analysis of the reasonableness of co-authors filing copyright infringement lawsuits alone—Taking legendary game infringement cases as an example

  • Thoughts on the “registration place presumption standard” in litigation jurisdiction behind high fines

  • Analysis of procedural structure for competing requests for intellectual property rights—Preliminary consolidation of lawsuits

  • Data thief or public hero?—Issues of criminal liability for web crawlers and anti-monopoly regulation

  • Prometheus’s fire or Pandora’s box?—A brief discussion on the use, risks, and regulations of facial recognition technology

  • Analysis of civil case judgment thinking on virtual property in games

  • Those things behind the “Ghost Blows Out the Light” case

  • Research on virtual property transaction issues

  • Industry practices or illegal crimes?—Legal responsibilities for the lending behavior of game version numbers

  • Who stole my source code?—From beginner to expert in game enterprise code security

  • Lawyer? Programmer? White hat hacker? The TianTong internet team is here!

  • From programmer to internet lawyer, a cross-border journey of a TianTong person

Research on Data Compliance Business Encryption Algorithms (Part 1)

Leave a Comment