Many friends have asked me about what happened yesterday at our company, and I was also very shocked by the incident. Before the truth is clarified, I hope everyone can view the situation rationally. Of course, if the truth indeed reveals what has been said, it must not be tolerated.
Java‘s encryption knowledge is also one of the common areas in Java. The underlying encryption technology is indeed very complex and involves a lot of mathematical knowledge, making it quite complicated to understand. However, using encryption tools in the Java language is very simple. We only need to understand the principles and usage scenarios of these encryption techniques; we do not need to study the specific underlying implementations.
The commonly used encryption algorithms include symmetric encryption algorithms, asymmetric encryption algorithms, hash algorithms, digital signatures, and others.
As the name suggests, symmetric encryption means that the encryption and decryption processes are symmetric. A key is used for encryption, and the same key is used for decryption, which is agreed upon by both the sender and the receiver. The downside is that the risk lies with this key; once it is stolen, the information will be exposed. Therefore, the security level is not high enough. Common symmetric encryption algorithms include DES, 3DES, and AES, all of which are encapsulated in JDK.
Asymmetric encryption, as the name suggests, means that the encryption and decryption processes are not symmetric and do not use the same key. Asymmetric encryption uses the concept of a public-private key pair, meaning there are two keys: a public key and a private key. A public-private key pair has a fixed generation method; during encryption, the public key is used, and the receiver uses the corresponding private key for decryption. The receiver can generate the public-private key pair and then send the public key to the encrypting party, ensuring that the private key is not transmitted over the network, eliminating the risk of theft. For example, the underlying SSH protocol of GitHub uses public-private key asymmetric encryption. Additionally, the public key can be derived from the private key, but the reverse is not possible; it is impossible to derive the private key from the public key. Common algorithms include RSA and ECC, with ECC being widely used in Bitcoin’s underlying technology. By comparing with symmetric encryption, we can see that asymmetric encryption solves the security problem of key transmission.
The hash algorithm, simply put, converts any data into a fixed-length string. The value after hashing is almost impossible to reverse-engineer. Moreover, the hash results of two different original texts must be different. Common algorithms include MD5, SHA256, etc. A common scenario for MD5 is password storage in databases, while SHA256 can be used in mining.
Asymmetric encryption also has a problem: the content may be tampered with before sending because the public key can be stolen, allowing the thief to replace it with other content.
The solution is the digital signature. Digital signatures are the reverse of asymmetric encryption and also involve a public-private key pair, but use the private key to sign and the public key to verify the signature. For example, the sender must send not only the ciphertext encrypted with the public key but also the signature, which usually consists of the hash of the ciphertext. The receiver first verifies whether the signature is correct; if it is, then the decrypted ciphertext is the actual needed content that has not been tampered with. Note that the signature and asymmetric encryption use two different pairs of public and private keys.
The above is a brief explanation of several encryption algorithms. In addition to these, there are others like Base58, which is also crucial for Bitcoin’s underlying security. In the future, I will introduce each encryption algorithm and its usage, but only the usage will be covered, not the underlying details.
MD5
MD5 stands for Message-Digest Algorithm 5, used to ensure the integrity of information transmission. It is one of the widely used hashing algorithms in computers (also known as digest algorithms or hash algorithms), and mainstream programming languages generally have MD5 implementations. It operates on data (such as Chinese characters) to produce another fixed-length value, which is the basic principle of hashing algorithms. MD5’s predecessors include MD2, MD3, and MD4.
MD5 has the following characteristics:
1. Compression: The MD5 value calculated from any length of data is of fixed length.
2. Easy to compute: It is easy to compute the MD5 value from the original data.
3. Tamper resistance: Any modification to the original data, even changing just one byte, results in a significantly different MD5 value.
4. Strong collision resistance: Given the original data and its MD5 value, it is very difficult to find another piece of data that has the same MD5 value (i.e., to forge data).
The role of MD5 is to “compress” large amounts of information into a confidential format before signing with digital signature software using a private key (i.e., transforming an arbitrary-length byte string into a fixed-length hexadecimal string). Besides MD5, other well-known algorithms include SHA-1, RIPEMD, and Haval.
The JDK comes with the MD5 encryption algorithm, making it easy to call directly. You need to import a class:
import java.security.MessageDigest;
First, create a Spring Boot project and add a web dependency with the following content:
SHA256
Those learning Java are no strangers to hash algorithms, as every class has a hashCode method.
A hash algorithm, also known as a hashing algorithm, is a method for creating a small digital “fingerprint” from any file. Like a fingerprint, a hash algorithm serves as a unique identifier for a file, related to every byte of the file, and it is difficult to find a reverse pattern. Therefore, when the original file changes, its hash value also changes, alerting the user that the current file is not the desired one.
An excellent hash algorithm should achieve:
Forward efficiency: Given plaintext and a hash algorithm, it can compute the hash value in a limited time and with limited resources.
Reverse difficulty: Given (some) hash values, it is very difficult (almost impossible) to reverse-engineer the plaintext in a limited time.
Input sensitivity: A slight modification to the original input should produce a significantly different hash value.
Collision avoidance: It should be difficult to find two different plaintexts that yield the same hash value (collision). The likelihood of two different data blocks having the same hash value is extremely low; for a given data block, finding another block with the same hash value is very challenging.
However, in different usage scenarios, such as data structures and security fields, some characteristics may be emphasized more than others.
The Secure Hash Algorithm (SHA) is a family of cryptographic hash functions certified by FIPS. It can compute a fixed-length string (also known as a message digest) corresponding to a digital message. If the input messages differ, the probability of them corresponding to different strings is very high.
The SHA family consists of five algorithms: SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512. They are mainly used in the Digital Signature Standard (DSS) defined digital signature algorithm (DSA). The algorithm used in Bitcoin is SHA-256.
Simply put, it combines multiple unique pieces of key information into one object to generate an encrypted string through an algorithm.
Continuing from the previous project, create a utility class in the corresponding package:
package btcdemo.btcdemo.security;
public class Sha256Utils
Import the encryption class, just like with MD5:
import java.security.MessageDigest;
Below is the specific content of the algorithm:
Below is the test class content of the algorithm:
Running the program yields the following result:
The current directory structure is as follows:
Project code: https://github.com/guoyb1990/btc-demo.git
BASE64&BASE58
Base64 is one of the most common encoding methods for transmitting 8-bit byte codes over the internet. Base64 is a method of representing binary data using 64 printable characters. You can refer to RFC2045 to RFC2049 for detailed specifications on MIME.
Base64 encoding is the process of converting binary data into characters and can be used to transmit longer identifiers in an HTTP environment. For example, in the Java Persistence framework Hibernate, Base64 is used to encode a long unique identifier (usually a 128-bit UUID) into a string for use as parameters in HTTP forms and HTTP GET URLs. In other applications, binary data often needs to be encoded into a form suitable for URLs (including hidden form fields). In this case, Base64 encoding is unreadable and requires decoding to be read.
Base64 encoding requires converting 3 8-bit bytes (3*8=24) into 4 6-bit bytes (4*6=24), then padding the 6-bit bytes with two 0s to form 8-bit bytes. If there are not enough characters to make up 3 bytes, padding with 0s is used, and the output character uses ‘=’; thus, the encoded output may end with 1 or 2 ‘=’ characters.
The JDK toolkit comes with a Base64 utility class, and using Base64 is also very simple; first, create a utility class:
Base64 is reversible, and the encryption and decryption content is as follows:
As you can see, it is very simple to use. Below is the test class content:
Executing the test method yields the following result:
As you can see, Base64 is very simple.
Base58, like Base64, is an algorithm for converting binary data into a visible string, mainly used for converting large integer values. The difference is that the resulting string omits several characters that may cause ambiguity, such as 0 (zero), O (uppercase letter O), I (uppercase letter I), and l (lowercase letter L), as well as some characters that affect double-click selection, such as / and +. The result character set contains exactly 58 characters (including 9 digits, 24 uppercase letters, and 25 lowercase letters). In different application implementations, the alphabet for Base58 may differ, so there is no specific standard.
It can be seen that Base58 is a more user-friendly version of Base64, considering the user’s perspective.
Create a Base58 utility class, with the directory structure as follows:
The general structure of the utility class is as follows:
Due to the excessive content, you can download the source code for review.
Below is the test code:
The running result is as follows:
As you can see, the usage of Base58 is the same.
Project code: https://github.com/guoyb1990/btc-demo.git
—————END—————
If you like this article, feel free to long-press the image below to follow the subscription number Programmer Xiao Hui for more exciting content.