Decoding: The Real Reason Python is Taking the Universe by Storm

Decoding: The Real Reason Python is Taking the Universe by StormAuthor: Jeff Knupp

Translation: Wu Lei, Huo Jing

It is well known that Python is currently the most widely used and fastest-growing programming language. Its elegant and concise syntax, along with strong support from third-party libraries, are reasons why Python is thriving across various industries. However, you may not know that the rapid growth of Python’s user base has deeper secrets behind it.

Let us start with the rise of big data in recent years to unveil the true reason behind Python’s popularity.

The Frustrated Big Data Programmer

With the rise of big data, most industries found themselves in a state of panic: they spent a lot of time and money building their big data pipelines, but their return on investment was very low. In the relentless competition, although they could extract increasingly growing data, most companies did not have a clear plan for handling the data they extracted. At that time, almost everyone believed that having a large amount of data storage would make subsequent analysis easier, and the business value of the data would become apparent. This may sound foolish today, but most people still believed that as long as enough data was obtained, the patterns and information behind the data would automatically emerge.

The ‘Data Scientist’ Called by the Times

Subsequently, the industry almost simultaneously awakened, realizing that the various insights they desired and the questions they wanted answered required rigorous mathematical analysis and validation. SQL queries could reveal the most obvious patterns and trends in the data, but to extract the most useful information from the data, a completely different set of skills rooted in mathematics and applied mathematics was needed. However, such talent seemed to exist only in academia. Moreover, those responsible for analyzing these massive datasets not only needed a very strong mathematical background but also needed to be able to write software. This explains why the position of ‘data scientist’ frequently appeared on job boards.

The ‘Web Development Language War’ Between Ruby and Python

Going back a bit, before big data truly became popular, Ruby and Python had a fierce battle to become the most popular ‘web development language.’ Both are very suitable for developing web applications. Ruby’s popularity was closely related to the Rails framework. In that era, most who called themselves ‘Ruby programmers’ were actually more like ‘Rails programmers.’ Meanwhile, Python had already been quite established in academia and a few different industries. In Python, the closest counterpart to Rails is Django, which, although released before Rails, seemed to lag far behind in popularity.

Decoding: The Real Reason Python is Taking the Universe by Storm

Many believe that the performance of Python and Ruby is quite similar, and ultimately only one language will win the ‘web development language war.’ However, in reality, Ruby’s popularity is closely tied to Rails, while Django only represents a small part of an already vibrant Python ecosystem. The fact is that the importance of the ‘web development language war’ is far less than people expected. Even though Ruby, in many respects, won this battle with Rails, it did not prevent Python from becoming the most popular language today. Why is that?

The Major Contribution of Oliphant

To uncover this mystery, we must mention a big player: Travis Oliphant. Back in 2006, Travis Oliphant was still an assistant professor at BYU and had not yet founded Anaconda (note: Anaconda is one of the most successful commercial data science platforms based entirely on Python). A year earlier, he developed NumPy, referencing the scientific computing library Numeric. He later became the founder of SciPy and also served as the director of the PSF.

In 2006, he submitted PEP 3118 with Carl Banks, which was a revision of Python’s ‘buffer protocol.’ This laid an important foundation for Python’s rise.

Python’s Buffer Protocol: The Primary Reason for Python’s Global Popularity

The buffer protocol is (and still is) a very low-level API used by other libraries to directly manipulate memory buffers. These are buffers created and used by the interpreter to store certain types of data in contiguous memory (initially, mainly ‘array-like’ data types and data structures with sizes predetermined).

The main motivation for providing such an API is to eliminate the need to copy data when only reading, clarify the semantics of buffer ownership transfer, and store data in contiguous memory (even in the case of multidimensional data structures), where read access is very fast. The ‘other libraries’ that will use this API are generally written in C and are very performance-sensitive. This new protocol means that if I create an int array in NumPy, other libraries can directly access the underlying memory buffer instead of accessing it indirectly or copying the data before using it.

Now the question arises: what type of programmer would benefit from fast, zero-copy access to large amounts of data?

Of course, it is the data scientists!

Let us summarize the development process:

  • Oliphant and Banks proposed a revision of Python’s buffer protocol to simplify direct access to the underlying memory of certain data structures driven by the early work of the NumPy project.

  • PEP 3118 (https://www.python.org/dev/peps/pep-3118/) was submitted, recognized, and implemented.

  • Thanks to the implementation of PEP 3118, Python quietly became a very attractive compiled language. Based on this, many numerical computing libraries based on C language extensions were developed (note: C language extensions can easily achieve data sharing and manipulation).

  • Python and Ruby faced off on the web, with most believing that the ‘web development language war’ would have a clear outcome.

  • With the price of magnetic storage devices plummeting, storing large amounts of data for future analysis became feasible (as data became very cheap, it was best to save it first without even considering what to analyze).

  • The demand for a new generation of programmers changed: programmers with a statistical background, preferably with an applied mathematics background, and some prior programming experience began to be snatched up—the era of data scientists had arrived!

  • Data scientists wanted a language that was both expressive and fast (with good numerical computing library support), and all these needs pointed to Python.

As we have seen, Python has become immensely popular, rising to become the most favored programming language.

This concludes today’s revelation about Python from the Digest Bacteria~

If you have read this far and are already a Python fan, take a moment to be secretly happy, as you now have a deeper understanding of the magic of Python.

If you want to become a data scientist but haven’t yet dived into Python, are you feeling a bit eager now?

Stay tuned to Digest Bacteria for more exclusive insights related to Python and data science!

Decoding: The Real Reason Python is Taking the Universe by Storm

Volunteer Introduction

Reply Volunteer” to join usDecoding: The Real Reason Python is Taking the Universe by Storm

Decoding: The Real Reason Python is Taking the Universe by Storm

Previous Exciting Articles

Click the image to read

Advancing NLP Research with Open Corpora, Incubating Phenomenal Products | Interview with Ali AI Labs’ Nie Zaiqing

Decoding: The Real Reason Python is Taking the Universe by Storm

Decoding: The Real Reason Python is Taking the Universe by Storm

Decoding: The Real Reason Python is Taking the Universe by Storm

Leave a Comment