Ant Group's Wang Yi: Go+ Effectively Complements Python's Shortcomings

Introduction by Ali Sister: Python’s syntax is very flexible, incorporating many features from other languages that are considered convenient. However, the advantages of Python also imply its disadvantages. Ant Group researcher Wang Yi has gained a deeper understanding of Python’s limitations through his hands-on experience in industrial systems, and he believes that Go+ is the most reliable solution to address these issues. So what are Python’s shortcomings? How can Go+ compensate for them? This article shares Wang Yi’s views and attempts regarding how Go+ can address Python’s limitations.

Recently, Xu Shiwei’s (known as Lao Xu in the community) Go+ project stirred up a wave on Hacker News [1]. I was immediately captivated and contributed to it. Recently, Lao Xu and the community organized a video discussion, inviting me to share why I am interested in Go+ and what I hope to achieve. After the live discussion, based on feedback from the audience and suggestions from two friends—Hong Mingsheng (head of TensorFlow Runtime) and Wang Yu (Shen Diaomo)—I made some modifications. I have been working on distributed deep learning systems for thirteen years, especially since 2016 when Professor Xu Wei asked me to take over as the head of his original PaddlePaddle project. My hands-on experience with Python in industrial systems has deepened my understanding of its limitations. Go+ is the most reliable solution I have encountered to address these issues. I look forward to Go+ competing with Python, complementing its shortcomings, and on this basis, developing a project similar to numpy (let’s call it numgo+) to support tensor operations, meeting the needs of data science; and then building a deep learning foundation library similar to PyTorch (let’s call it GoTorch). If possible, it could further become a front-end language for deep learning compiler ecosystems. I currently work at Ant Group, where I am responsible for an open-source SQL compiler SQLFlow, which translates extended syntax to support AI SQL programs into Python programs. My colleagues say that if the Go+ ecosystem matures, they would be happy to let SQLFlow output Go+ programs. Many readers might think I am talking nonsense—Python is such a hot language, why does it need to be “completed”?Advantages of Python Python’s syntax is very flexible, incorporating many convenient features from other languages. For example, like C++, Python allows operator overloading, so the author of numpy overloaded arithmetic operators to perform tensor operations. Like Lisp, Python’s eval function recursively implements the Python interpreter, allowing it to interpret and execute Python expressions, so Python programs can generate themselves. This flexibility allows programmers to work freely, making it particularly suitable for exploratory work. For instance, graduate students use Python for research; data scientists use it to replace various expensive commercial systems; and in the subsequently emerging field of deep learning, Python has also rapidly flourished.Limitations of Python The advantages of Python also imply its disadvantages. I have personally experienced two pain points.Difficulty in Ensuring Code Quality The flexibility of syntax can also be described as: a program can have multiple implementations. In modern software engineering, there are no lone heroes; collaboration is key. Multiple possible implementations often lead to disputes during code reviews—disputes that are hard to settle because there may not be an objective selection standard. Many other languages also have similar issues, such as Java. The solution is for the community to establish certain design patterns, so programmers can check for applicable design patterns before writing code and follow them if available. Thus, Java programmers need to learn not only Java syntax but also design patterns. C++ has similar issues. One solution is that Google established a code style—defining which syntax can be used and which cannot. According to Rob Pike’s explanation, the allowed syntax was selected, which is the original intention of Go’s design. Python is too flexible to define a code style as detailed as C++—PEP8 merely states formatting requirements and imposes almost no restrictions on syntax selection. Python also cannot define patterns—there are too many to enumerate. Python adopts dynamic typing for flexibility, so when we look at a Python function, we must read its code carefully; otherwise, we won’t know whether it has a return value or what that return value is. Python also has syntax extensions that require programmers to specify input and output data types, but few people use them—after all, everyone is drawn to “flexibility”; if flexibility is restricted, it is better to use a statically typed language. The result is that each Python function cannot be too long; otherwise, it becomes hard to understand. However, Python programmers are drawn to flexibility; they want the freedom to write as they please, regardless of whether others understand it; as long as they understand it themselves, that’s enough, especially since they will graduate after publishing their papers. Splitting functions to refine granularity? Impossible; it will never happen. Is there well-written Python code? Yes, there is. For example, Google Tangent. This is a niche project with only two authors. Its code structure is clear—each function is generally within ten lines of code, and the code is as long as the comments, making it easy to understand. However, this contradicts the impression that Python has a large user base. When I was responsible for the PaddlePaddle project, in addition to my efforts to learn and summarize Python patterns, I also configured CI to call various tools for source code inspection, but it was in vain; these tools were not intelligent enough to automatically comment code or split long function definitions. Difficulty in Optimizing Computational Efficiency Python’s rich syntax and strong flexibility make it complex to write interpreters, and optimizing performance is also challenging. In contrast, Go’s syntax is simple, and its expressive power far exceeds that of C, but the total number of keywords is less than C, making performance optimization of Go programs relatively easier. Within a few years after Go’s inception, the Go compiler’s performance optimization level quickly approached that of GCC for C++ programs, while C++ and Python, like Python, have rich syntax, making it difficult to develop performance optimization features in compilers. Some have attempted to write a Python compiler to replace the interpreter, thereby optimizing performance before program execution. However, Python’s syntax is more flexible than C++, making it nearly impossible to write a compiler that fully supports Python’s standard syntax. Several attempts have thus been abandoned. The common practice now is for the interpreter to perform runtime optimizations (JIT compilation), as it has runtime information, making it relatively easier than compilers. In the AI field, deep learning training is very resource-intensive. TensorFlow’s graph mode solution is: the Python program written by the user does not actually perform training during execution; instead, it outputs the training process into a data structure called a “computation graph,” which is executed by the TensorFlow runtime, this “interpreter.” As long as the execution efficiency of the TensorFlow runtime is ensured, it is not limited by the efficiency of the Python interpreter. TensorFlow’s graph mode is well-intentioned but somewhat redundant—the source program, various layers of IR, and binary code have long been used to describe computational processes, and the computation graph invented in the early days of the TensorFlow project reinvented the wheel in a non-professional manner—the graph struggles to express if-else, loops, function definitions and calls, let alone advanced control flow structures like closures, coroutines, and threading. The non-professional compiler design by AI engineers makes LLVM’s author Chris Lattner laugh, prompting him to attempt to replace Python with Swift for TensorFlow as the front-end language and use MLIR instead of TensorFlow’s “computation graph” [2]. Attempts to Address Limitations While I was responsible for PaddlePaddle, to validate Paddle Fluid’s capabilities, my colleague Chen Xi and I created an unmanned boat, attempting to use Fluid to write imitation learning methods to enable the boat to learn the driving skills of human drivers; details can be found in a series of blogs [3]. However, bringing a MacBook Pro running Python programs onto the boat would consume too much power, and embedded devices are not suitable for running training programs written in Python. If we upload data to the server for training after each stop, the boat’s learning progress from humans would be too slow. To address this, another colleague, Yang Yang, wrote Paddle Tape, implementing PyTorch’s automatic differentiation capability in C++, combining it with many basic computation units (operators) written in C++ accumulated by Paddle Fluid. Tape is entirely a deep learning system implemented in C++, with no relation to Python. In early 2019, my friend Hong Mingsheng was responsible for the Swift for TensorFlow project at Google, which is also an attempt to reduce Python in AI infrastructure. At that time, he invited me to share the stories of Paddle Tape and the unmanned boat with Chris Lattner’s team and modified the slides [4]. In my role at Ant Group, I am responsible for an open-source distributed deep learning training system, ElasticDL, which has attempted to call TensorFlow’s graph mode, eager execution mode, PyTorch, and Swift for TensorFlow, and I was greatly inspired by the design philosophy of Swift for TensorFlow and its strategy of coexisting with the Python ecosystem.Go+ and Data Science The above attempts remind me that the criteria for language selection must include: clear and concise syntax and stable syntax that is easy to learn. I also hope that the users of the language are a group with a spirit of exploration. Go+ and its user community based on the Go community happen to meet these conditions. Before the emergence of Go+, there were also attempts to use Go for data science and tensor operation libraries implemented in Go (such as gonum), but they were not as concise as Python programs using numpy. A direct reason is that Go’s constants require specifying data types, while Python’s do not. I have written several comparisons [5]. To define a constant of ndarray type in Go, the user needs to write:

x :=numgo.NdArray(          [][]float64{            {1.0, 2.0, 3.0},            {1.0, 2.0, 3.0}})

Whereas in Python, it is:

x = numpy.ndarray(        [[1.0,2.0, 3.0],         [1.0,2.0, 3.0]])

With Go+ automatically inferring data types, the syntax becomes almost identical to Python:

x :=numgo.NdArray(    [[1.0, 2.0, 3.0],         [1.0,2.0, 3.0]])

Furthermore, Lao Xu added a comment explaining that Go+ is preparing to support MATLAB’s tensor definition syntax. This makes the program even simpler:

x :=numgo.NdArray(     [1.0, 2.0, 3.0;      1.0, 2.0, 3.0])

Similar convenient syntax improvements have already accumulated in Go+, with examples available in [6]. These syntax extensions are sufficient to greatly simplify data science programming. Moreover, the Go+ compiler is responsible for translating Go+ programs written using these syntactic sugars into Go programs. This allows them to be compiled together with other libraries written in Go, thus reusing code from the Go ecosystem. Reusing the Go ecosystem is a strong point of the Go+ language. Throughout Go’s development, many foundational technologies for scientific computing have been accumulated, such as the encapsulation of Go data types for tensors. These data types also have efficient Go implementations, partly because Go programs can easily call C/C++ programs, including well-established foundational libraries in the scientific computing field like LAPACK, and even NVIDIA’s GPU interface library CUDA. It is worth noting that these C/C++ based foundational libraries are also the basis of Python’s data science ecosystem, which is why the title of this article is Go+ Complements Python’s Ecosystem.Go+ and Deep Learning Compilers The previous section mentioned deep learning technology, which is another field where Python is widely used, naturally connected to data science, as the tensor data structures of PyTorch and TensorFlow are similar to numpy’s ndarray. In the deep learning field, compilers are the latest mainstream research direction. Currently, the Go community has many backend system developers; during the live video, an audience member commented that they are not an AI engineer and do not pay attention to AI. If this is truly the case, it is not just a technical ideal issue but also a lack of responsibility towards one’s job. The boundary between backend systems and AI systems is becoming increasingly blurred because backend systems refer to the backend systems of internet services; the entire internet economy is built on servers tirelessly replacing humans to serve the public, and AI is the foundation of this logic, as detailed in one of my older articles [7], which enumerates the human professions that have been eliminated by AI technology over the past twenty years. Moreover, this boundary will completely disappear in the near future, as technologies like online learning, reinforcement learning, imitation learning, and federated learning replace supervised learning to become the mainstream technologies of internet intelligence (including traditional search, advertising, recommendations, as well as emerging fields like autonomous driving and financial intelligence). AI systems will no longer be divided into training and prediction parts, nor will AI engineers be responsible for the former while backend engineers handle the latter. In the AI field, one important reason deep learning has surpassed traditional machine learning is that each model (which can be understood as a description of knowledge structure) in traditional machine learning often corresponds to one or even multiple training algorithms; whereas in deep learning, almost all models are trained using one algorithm, stochastic gradient descent (SGD), or its variants. This allows infrastructure engineers to be responsible for developing training systems, while model researchers can reuse them, significantly reducing the engineering burden of research and improving the efficiency of model development. The core issue of deep learning systems is autodiff, which is determined by the mathematical characteristics of the SGD algorithm. The SGD algorithm can infer model parameters from training data by alternately executing the forward computation process and the backward computation process. The model plus parameters equals knowledge. The engineering challenge here is that when model researchers define models, they also describe the forward computation process, but the backward computation process is difficult for humans to describe; it is best to have a program automatically derive the backward computation process from the forward computation process. This automatic derivation is called autodiff. Currently, there are two strategies for autodiff. The first is runtime derivation, also known as dynamic net and tape-based approach. The basic idea is that no matter how complex the forward computation process is, even if it includes if-else, loops, function definitions and calls, or even coroutines and multithreading, as long as the basic operations (operators) executed sequentially are recorded in a tape, then the backward computation process is to backtrack through the records in this tape and sequentially call the corresponding gradient operator for each operator. This is the strategy adopted by PyTorch, TensorFlow eager execution, and Paddle Tape. This strategy is not closely related to compilers but has some relation to JIT compilation. The second strategy is to derive the backward computation process before execution, which requires introducing a compiler specifically for autodiff. TensorFlow graph mode, Caffe/Caffe2, Paddle Fluid, Google Tangent, Julia, and Swift for TensorFlow use this strategy. Generally speaking, a compiler translates the source program described in the source language into the target program described in the target language. However, the first three technologies have taken shortcuts by not introducing the source language; instead, they allow users to describe the forward computation process through Python libraries. Google Tangent, Julia, and Swift for TensorFlow allow users to define functions using Python, Julia, and Swift languages, respectively, to describe the forward computation process and can translate the forward computation function into a backward computation function. Strictly speaking, the author of Julia has implemented various autodiff schemes: some at runtime, some at compile time, and some mixed. Ming Sheng reminded me while helping me revise this article:For a different vision, where the same language is used to both implement kernels and construct + execute programs/graphs based on the kernels, see [8]. Here, the kernel refers to the implementation of the basic operational unit operator in deep learning. Both compile-time and runtime autodiff strategies are applicable to Go+ and do not hinder Go+ from reusing existing technologies. Just as the data science field should reuse foundational libraries like LAPACK, the deep learning field should also reuse basic operators and gradient operators. The implementation of the runtime autodiff strategy using tape is simpler. I remember Yang Yang developed Paddle Tape in just a week. In contrast, the compilation strategy is much more complex. Paddle Fluid, with more than twenty people working on it, took several months based on the work of Yuan Yu from the TensorFlow team [9] to finalize autodiff for if-else, loops, function definitions, and calls. These attempts remind us of the importance of reusing core technologies from the community. For example, using MLIR instead of computation graphs to describe more complex control flows—computation graphs cannot describe goroutines and selects. Using TVM as the backend compiler and employing deep learning techniques to learn how to optimize deep learning programs. All these technologies’ outputs are calls to basic operators. From this perspective, the operators accumulated in the deep learning technology ecosystem are similar to built-in functions. This is also what Hong Mingsheng repeatedly reminded me while revising this article. I hope that in the near future, Go+ can serve as a new front-end language for deep learning, alongside Python, Julia, and Swift, reusing lower-level IR, compiler backends, and basic operators.Conclusion I understand that the core tactical work of the Go+ project in the future is to maintain the simplicity of Go’s syntax while reasonably simplifying syntax—avoiding the incorporation of too much flexibility like Python and C++, while being appropriately more flexible on top of Go’s minimalist syntax specifications. Additionally, through community collaboration to develop exploratory projects like numgo+ and GoTorch, enriching the technical ecosystem is a strategic direction for the community. Even further, it aims to become a front-end language for deep learning compilers, reusing the foundational computational technologies accumulated by the community over the years. Finally, I would like to thank Lao Xu and the core contributors of Go+, Chai Shushan and Chen Dongpo, outstanding contributors from the Go community, Asta Xie, and my colleagues, core contributors from the ONNX community, Zhang Kexiao for proofreading.

Related Links[1]https://news.ycombinator.com/item?id=23550042[2]https://www.tensorflow.org/mlir/dialects[3]https://zhuanlan.zhihu.com/p/38395601[4]https://github.com/wangkuiyi/notes/tree/master/s4tf[5]https://github.com/qiniu/goplus/issues/307[6]https://github.com/qiniu/goplus/tree/master/tutorial[7]https://zhuanlan.zhihu.com/p/19901967[8]https://julialang.org/blog/2018/12/ml-language-compiler/[9]https://arxiv.org/pdf/1805.01772.pdf

Technical Open Class“Core Programming in Go”

Go is a statically typed, compiled, concurrent programming language with garbage collection capabilities, offering many advantages. This course consists of 182 lessons, starting from the basic syntax and development tools to introduce you to Go, and further learning about variables, data types, operators, program control flow, functions, arrays and slices, Maps, and other related knowledge.

Click “Read the original text” to learn Go language~

Follow “Ali Technology”to grasp the pulse of cutting-edge technologyClick me to learn Go language.

Ant Group’s Wang Yi: Go+ Effectively Complements Python’s Shortcomings

Leave a Comment Cancel reply

Related posts

Leave a Comment Cancel reply