Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

CPU: This is well understood; when it was first released in 2009, Satoshi Nakamoto used a regular desktop CPU for mining. The entry barrier was extremely low, allowing households to participate in mining, which was Satoshi’s original intention for Bitcoin mining. By using reasonably priced hardware that was easy to mine with, Satoshi aimed to decentralize computing power.

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

GPU: This is also well understood; the graphics card is essential for computers, and similarly, every household could participate in mining. In 2011, GPUs took over from CPUs for mining, achieving speeds dozens of times faster than CPUs. At least during the GPU mining phase, Bitcoin was still in a fair and equitable mining stage accessible to the public, with GPUs playing a significant role in decentralization. Although large miners could hoard many graphics cards for mining, they were still limited by the high power consumption and maintenance issues of the graphics card platforms. At that time, Bitcoin was not widely known, and only industry geeks were involved.

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

FPGA: This stands for Field-Programmable Gate Array, notable for its programmability, which greatly facilitates design implementation. It also provides a feasible solution for reducing design costs, but its speed is slower compared to ASICs of the same process. It emerged as a semi-custom circuit in the domain of Application-Specific Integrated Circuits (ASICs), meaning that once a design is successfully simulated on an FPGA, it can proceed to backend design and production of dedicated ASIC chips by chip manufacturers.

Typical Chip

XILINX’s XC6SLX150 FPGA chip

Price: Approximately 500 RMB

Speed: 190M

Power Consumption: 10W

Product Examples: Pumpkin 2nd Generation, Watermelon Machine.

Main Players in China:

1. Pumpkin Zhang’s Pumpkin 2nd Generation: Integrates two XC6SLX150 chips, mining speed of 380M, priced around 2500 RMB, more expensive than contemporaneous graphics cards but advantageous due to lower power consumption, about 20W.

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

2. Guilin’s Watermelon Machine: Boldly increases the number of integrated cores, with one PCB integrating eight FPGA chips, achieving a mining speed of 1.6G, priced close to 12,000 RMB at the time.

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

Compared to contemporaneous GPUs, performance and price were similar, with the only advantage being lower power consumption. FPGA mining machines were popular for about half a year in 2012, limited by difficulties in chip procurement and low production volumes, with prices comparable to graphics cards, but hardware depreciation was not as high as that of graphics cards, leading to limited adoption and remaining within a small circle of players. During this time, manually crafting these FPGA mining boards became a hobby for hardware enthusiasts, allowing them to accumulate early BTC. In the second half of 2012, dedicated ASIC chips began trial production based on FPGA designs, such as those from Pumpkin Zhang and Shenzhen’s Kuaimao, marking the first time Chinese players led the international competition in mining hardware.

ASIC: This stands for Application-Specific Integrated Circuit, notable for its specialization. Custom-designed, it executes tasks faster than equivalent FPGA designs and can save on unused logic implementations in FPGAs. In large-scale production, costs are also lower than FPGAs. You can think of FPGAs as providing a design implementation platform, while ASICs are a custom design logic that removes unused functionalities from that platform. CPUs and GPUs are types of ASICs.

Typical Examples:

1. Pumpkin Zhang began fundraising in the second half of 2012 to develop the first-generation Avalon chip based on 110nm technology. The single-chip speed is 0.282G/S, and after mass production, the price per chip is only a few dozen RMB. Power consumption is 6.6W.

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

2. Shenzhen Kuaimao developed the Kuaimao Blade based on 130nm technology through a crowdfunding model in the second half of 2012, with chip performance similar to Avalon.

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

The latest 28NM A1 chip in 2014: single-chip speed of 20-40G/S, with a price of only a few dozen RMB after mass production, and power consumption of about 20W.

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

As we can see, with the advancement of technology, mining has entered a terrifying arms race. In the first half of 2013, an Avalon machine used 240 chips to achieve a speed of only 66G, while now, two tiny 28NM chips can achieve about 60G.

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

I don’t know if this opens Pandora’s box, but compared to graphics cards, specialized mining machines are much easier to deploy and manage, and the cost of electricity for large-scale deployment is cheaper. Bitcoin mining is gradually becoming centralized. At the recent anniversary celebration of Huobi in Beijing, domestic mining giants like Pumpkin Zhang and Antminer gathered, with some joking that these people could unite to launch a 51% attack.

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

In any case, mining is no longer a matter for the public; it is left to professionals. With the investment in these devices, mining difficulty is increasing, making it harder to break even. Several domestic oligarchs have launched mining power trading platforms, further centralizing mining. Oligarchs with massive computing power will not be willing to connect to third-party mining pools that take their mining fees. Recently, these giants are likely to establish their own mining pools, leading to further centralization of mining pools. As I write this, a certain oligarch has announced a mining pool with a two-letter domain name, indicating their determination with an investment in the millions. The greater the computing power, the higher the probability of quick returns, and mining pools must ensure their profits. We can predict that mining pools without their own mining power support will be eliminated in the future.

Looking back, in the first half of 2013, LTC (Litecoin) exploded onto the market with a scrypt algorithm that resisted professional ASIC miners, providing a new opportunity for graphics cards that had been eliminated by ASIC miners in Bitcoin mining. More graphics cards joined, steadily increasing LTC’s mining difficulty. However, the same issues persisted: deployment and maintenance were troublesome, and power requirements were high. Large-scale graphics card mining farms were widespread, which temporarily became an advantage for LTC over BTC. However, the reality is that the scrypt algorithm cannot resist ASICs; it merely requires large memory, which is expensive. When LTC’s price was insufficient to support hardware investment, only a few hardware experts in the industry utilized the XC6SLX150 chip (yes, the same FPGA) to simulate and play around, achieving relatively poor performance of only a few tens of K, compared to hundreds of K speeds of GPUs at the same price point, making it essentially useless. However, good times did not last long. By the end of 2013, LTC’s price surged past 380 RMB, leading to the emergence of ASIC chips designed with memory in mind. The invention of Litecoin, with one of its slogans being to end the unfairness of Bitcoin mining machines, proved that its efforts towards fairness were unsuccessful. Litecoin still lacks application support, and perhaps its future lies in the perspective that Bitcoin is gold and Litecoin is silver, as the entire mining machine industry investment and mining cost inputs are sufficient to support its current price.

Typical Representatives:

Silverfish Mining Machine: 55nm design, 625K, power consumption 7W. After large-scale production, the price per chip is about a few dozen RMB (requiring very high production volumes to achieve this price, thus necessitating LTC’s price to reach this level for mining machine manufacturers to enter the market). Compared to contemporaneous graphics cards, speed is similar, but power consumption has dropped from around 200W to 7W. Similar mining machines on the market include Zeus Mining Machine, also based on 55nm, but with double the power consumption compared to Silverfish, indicating a difference in chip design expertise.

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

But does this mean that graphics cards are no longer viable? In mid-2014, “Darkcoin” (now known as Dash), a competing coin based on the X11 algorithm emphasizing “anonymity,” emerged. However, the X11 algorithm is not significantly better than Litecoin’s. As for X12, X13, up to X100, the inherent design of serial algorithms being ASIC-ified is only a matter of time, as long as the coin value supports the investment in mining hardware. Last month, a foreign friend brought over an FPGA mining machine for mining Darkcoin, achieving a speed of about 280M. Any coin that can be successfully simulated on an FPGA means that producing an ASIC chip is only a matter of time. Darkcoin’s value has dropped from a high of 70 RMB to a low of around 15 RMB, indicating that without practical applications, only minor innovative competing coins will see their prices decline after the hype fades.

Thus, attempts to increase memory usage during mining (such as Scrypt, Litecoin, Yacoin, Memorycoin), attempts to increase the complexity of cryptographic hash functions (such as Quark), and attempts to switch from Blake algorithm to Keccak algorithm using 11 hashing rounds (such as X11) have all proven futile, as ASIC miners can solve these issues by increasing memory capacity and computing power.

Does this mean that the hundreds of millions of mining graphics cards worldwide are being eliminated? Is there truly an algorithm that cannot be ASIC-ified? Some say that professional miners are like heavy artillery, while GPU miners are still struggling with small arms. From a hardware design perspective, graphics cards are indeed heavy artillery; however, because they are not solely designed for a specific algorithm, their efficiency in mining is low. Nevertheless, GPUs are undoubtedly more suitable for public participation. Graphics cards are like the Windows system, while ASICs are like the Linux system. I cannot say which is more reasonable, but finding an algorithm that is entirely suitable for graphics cards and truly resists ASIC-ification is a current path to save graphics cards. I believe that given the current scale, AMD and NVIDIA will not build centralized mining pools.

In my search, the HEFTY1 algorithm caught my attention. HEFTY1 is a type of conditional branching algorithm. For conventional ASIC designs, the more branching conditions there are, the more resources are wasted. After discussions with chip design experts, they indicated that manufacturing ASICs for this algorithm would require a significant investment, which may be a genuinely ASIC-resistant cryptographic algorithm in the coming years.

Unlike algorithms like Quark and X11, which simply concatenate multiple hash algorithms, any one of these algorithms (especially those at the end of the chain) being compromised would jeopardize the security of the currency system. The HEFTY1 algorithm process is as follows:

1. First, perform the nefty1 operation on the input to obtain the result hash1 (256 bits).

2. Using hash1 as input, perform SHA256, KECCAK512, GROESTL512, and BLAKE512 operations sequentially to obtain hash2, hash3, hash4, and hash5, with the last three condensed to 256 bits.

3. Sequentially extract the first 64 bits from hash2, hash3, hash4, and hash5, and after obfuscation, form the final output result (256 bits).

Exploring Algorithms Beyond FPGA and ASIC: Revitalizing Global GPU Computing Power

The above diagram explains how the HEFTY1 algorithm combines the results of four cryptographic hash functions, providing significant security advantages. It does not rely on the long-term security performance of a single function. HEFTY1 effectively links four hash algorithms, and the system’s security is compromised only if all four algorithms are broken simultaneously.

The first competitive coin to introduce the HEFTY1 algorithm is heavycoin (abbreviated HVC), which boasts very fast transaction speeds, completing block confirmations in about two minutes. Unfortunately, HVC has not yet seen success, as its design for block output size and total amount is left to miner voting, which is a poor design. The greed of miners has led to excessive output of HVC, which merely introduced an ASIC-resistant algorithm without any applications. Before people could recognize the advantages of this algorithm, it was buried under the confusion of the contemporaneous X11 algorithm (which claimed to resist ASICs).

Appendix: Comparison data from a certain industry hardware expert on FPGA mining:

Scrypt algorithm = 20 GTX 290 graphics cards

MAX algorithm = 120 GTX 290 graphics cards

G algorithm = 500 GTX 290 graphics cards

X11 algorithm = 300 GTX 290 graphics cards

HEFTY1 algorithm = 0.5 GTX 290 graphics cards

Finding a HEFTY1 algorithm-based competitive coin that can effectively resist ASICs, suitable for both AMD and NVIDIA mining, revitalizing the global market of hundreds of millions of graphics cards, must also have a solid and broad application market, integrating POS and POW mechanisms (gradually exploring the most suitable profit-sharing model) will be more in line with current mainstream trends.

Leave a Comment