Is Python Being Abandoned by the Academic Community? Scientists May Have Found a Better Alternative

Some scientists have begun using a new programming language that outperforms Python, which, like Python, can be “immediately usable” and has even greater computational power.

This article is reprinted from the WeChat public account “Nature Portfolio”

Original author: Jeffrey M. Perkel

In 2015, bioinformatician Johannes Köster was, in his own words, “almost full-time writing Python.” At that time, he had already created a popular tool—a workflow management tool called Snakemake—using Python. Now, the computational power required for his upcoming project exceeds what Python can provide. So he began searching for new tools.Köster, who is now at the University of Duisburg-Essen in Germany, is looking for a programming language that can have Python’s “immediate usability” while possessing the speed of languages like C and C++. In other words, “a high-performance language that is, let’s say, ergonomically designed for humans,” he explained. The language he found is Rust.Is Python Being Abandoned by the Academic Community? Scientists May Have Found a Better AlternativeA cartoon symbolizing speed and safety for programmers. Illustration: Project TwinsRust was developed in 2006 by Graydon Hoare while working at the California-based browser company Mozilla as a small project. Rust combines the performance of languages like C++, a friendly syntax, a focus on code safety, and a well-designed set of tools to simplify the development process. Part of Mozilla’s Firefox is written in Rust. Microsoft has also reportedly rewritten parts of the Windows operating system in Rust. The annual Stack Overflow Developer Survey has ranked Rust as the “most loved” programming language for five consecutive years. The code-sharing site GitHub states that Rust was the second fastest-growing language on the platform in 2019, with a growth rate of 235% compared to the previous year.Scientists are also looking towards Rust. For example, Köster used it to write an application called Varlociraptor. This application can align millions of gene sequences with billions of gene bases to identify genetic mutations. “The data volume is enormous,” he said, “so the alignment must be as fast as possible.” However, the power of Rust comes at a cost: the learning curve is steep.“It takes some time to get started,” said Carol Nichols, founder of the Pennsylvania consulting company Integer 32 and a member of the Rust core team. “But it allows me to do things I couldn’t do before. I think the time spent is worth it.”Warning: No Safety RailsThe workflows for analyzing scientific data typically use languages like Python, R, and Matlab. These languages interpret and execute code line by line. This model is useful for exploring data, but it is not fast.C and C++ are fast, but they lack “safety rails.” Stockholm-based Rust programmer (who call themselves Rustaceans), Ashley Hauck, said. For example, nothing prevents C and C++ programmers from accessing memory that has already been released back to the system or freeing the same block of memory twice. In the best case, this can cause a program to crash, but it can also return meaningless data or create security vulnerabilities. According to Microsoft’s research, 70% of the security vulnerabilities they patch each year are related to memory safety.Memory RulesThe model used by Rust allocates each piece of memory to a single owner and restricts who can access it. Code that violates these rules simply cannot compile—it has no chance of crashing. “Their memory management system is based on this lifecycle concept, allowing the compiler to track when memory is allocated, when it is released, who owns it, and who can access it,” said Rob Patro, a computational biologist at the University of Maryland. “A whole class of correctness issues can be avoided through the design of the language.”This concept also helps ensure that parallel computing—software that performs calculations using multiple processors—can be executed safely. For example, it can prevent multiple threads from accessing the same data simultaneously.The result is that this language is easier to maintain and debug, but harder to learn. “No other mainstream language has these concepts, and they are central to understanding how to program in Rust,” Nichols said. Stephan Hügel, who researches geographic data visualization at Trinity College Dublin, estimates that he spent two to three months rewriting a Python algorithm that transforms geographic coordinates into another reference system into Rust, achieving a fourfold increase in execution speed. Richard Apodaca, founder of a cheminformatics software company in California called Metamolecular, said he spent six months becoming proficient in Rust.Focusing on UsabilityTo address this issue, Rust developers have worked to improve the user experience, said Manish Goregaokar, head of the Rust developer tools team in California. For example, the compiler returns particularly informative error messages, even highlighting the erroneous code and suggesting how to fix it. “If your language wants to introduce new concepts, it better be easy to use,” Goregaokar explained.The Rust community also provides extensive documentation and online help, including a popular online detailed “Book” and a “Cookbook” that introduces how to solve common problems. The Rust toolchain—the tools programmers use to turn code into programs—is highly praised by users (see below “Let’s Oxidize Together”). “The tools and architecture of Rust are really great,” Patro said. Compared to the many compilers and auxiliary applications that C programmers have to deal with, Rust programmers only need one tool called Cargo to compile Rust code, run tests, automatically generate documentation, upload code to repositories, and more. It also automatically downloads and installs third-party packages. One of Cargo’s plugins, Clippy, can highlight common mistakes and “non-idiomatic” Rust code, which Patro describes as “absolutely fantastic.”

Let’s Oxidize Together

Below, we explore some features of the Rust language by creating a program that reads GenBank files.• Visit www.rust-lang.org/learn/get-started to install Rust• Copy the code from GitHub https://github.com/jperkel/gb_read• Execute “cargo run” in the command line to download external dependencies and compile the program. The default setting of the program is to read the GenBank file “nc_005816.gb” from the GitHub repository, but you can choose to read other files by using “cargo run “.• Use “cargo test” to run the tests in the repository.• Use “cargo doc –open” to create and read documentation.Popular development environments also have Rust plugins, such as Microsoft’s Visual Studio Code and JetBrains’ IntelliJ. There is also an online Rust Playground that allows real-time experimentation with Rust code. David Lattimore, who lives in Sydney, created a Rust kernel in Jupyter notebooks, providing an interactive environment similar to Python’s REPL.Another significant advantage for Rust programmers is its ecosystem of third-party packages (which Rust calls “crates”), which currently has over 50,000 (see below “Rust is Becoming More Popular”). These packages encapsulate algorithms from various fields such as bioinformatics (Köster’s Rust-Bio), geosciences (Geo-Rust project), and mathematics (nalgebra). However, Nichols said, “If the library you want doesn’t have a Rust version, that’s a significant disadvantage of Rust.” Of course, programmers can sometimes use Rust’s “foreign function interface” to bridge the gap.Is Python Being Abandoned by the Academic Community? Scientists May Have Found a Better Alternative

Source: http://www.modulecounts.com

Oxidizing CodeNo matter how the programming process unfolds, one undeniable point is that Rust is incredibly fast. In May, Heng Li, a bioinformatician at the Dana-Farber Cancer Institute in Massachusetts, tested various programming languages on a computational biology task that required reading 5.7 million gene records. Rust outperformed C to take first place. “If we want to write a high-performance parallel computing program that is fast and memory-efficient, Rust is the ideal choice,” Li said.Luiz Irber, a bioinformatician at the University of California, Davis, rewrote (in Rust slang, called “oxidizing”) a tool called Sourmash—used for gene searching and classification—to make the code easier to maintain, utilize modern programming features, and be usable on the web, he said.After team member Avi Srivastava returned from an internship at the California biotechnology company 10x Genomics, where he developed open-source tools in Rust, Patro’s team, led by graduate student Hirak Sarker, used Rust to write a gene expression analysis tool called Terminus. “The beauty of Rust is that it makes debugging incredibly simple because memory management is so much better.” Srivastava, who now works at the New York Genome Center, said.For many Rust programmers, the “humanity” here is also very appealing. Hauck, a member of the LGBT+ community, said the Rust user community has made efforts to welcome her. This community, she said, “has always strived to be an inclusive community—understanding the impact of diversity; very understanding of how to write and enforce a code of conduct.”“One of the reasons I continue to use Rust may be this very important reason,” Hauck said. “The Rust community is just amazing.”

The original article was published under the title Why scientists are turning to Rust in the technical feature section of Nature on December 1, 2020.

Copyright Notice:

This article was translated by the Springer Nature Shanghai office. The Chinese content is for reference only; all content is subject to the original English version. Feel free to share it with friends, and for reprints, please email [email protected]. Unauthorized translations are considered copyright infringement, and the copyright holder reserves the right to pursue legal responsibility.

© 2021 Springer Nature Limited. All Rights Reserved

Cover image source:Goran Ivos on Unsplash

This article is reprinted from the WeChat public account “Nature Portfolio”

(ID:nature-portfolio)

Is Python Being Abandoned by the Academic Community? Scientists May Have Found a Better Alternative

▽ Highlights ▽

Is Python Being Abandoned by the Academic Community? Scientists May Have Found a Better Alternative

Is Python Being Abandoned by the Academic Community? Scientists May Have Found a Better Alternative

Is Python Being Abandoned by the Academic Community? Scientists May Have Found a Better Alternative

Click “View” and share with more friends!Is Python Being Abandoned by the Academic Community? Scientists May Have Found a Better Alternative

Leave a Comment