Why Our Company Still Uses Python for Development

My booklet: (Beginner’s Guide to Quantitative Stock Analysis with Python), original price299, limited time special price of 2 cups of coffee, price will increase by 10 yuan after reaching 100 people.

Author: Wada Xiwa

https://www.zhihu.com/question/278798145/answer/3416549119

Why Our Company Still Uses Python for Development

In recent years, I have often seen some large companies that heavily used Python migrating to other language technology stacks. However, what about those small companies/small teams?

I have always wanted to understand how those companies that still insist on using Python and have a certain scale of business use the Python technology stack for development, what difficulties/lessons they encounter, and what excellent experiences they have?

By chance, I saw a response to the question “Why do software companies rarely use Python for web development?” on Zhihu, and I would like to share it with everyone.

Why Our Company Still Uses Python for Development

Author: Wada Xiwa

(https://www.zhihu.com/question/278798145/answer/3416549119)

Response:

I have been using Python for over 10 years now. The longest-running project I maintain has an annual transaction volume of several hundred million yuan and is an e-commerce platform. The concurrency is not high, usually around dozens, but during holidays it can exceed 100. At peak times, I have never seen it reach 200. The total number of orders in the database is around 50 million, increasing by tens of thousands every day. The project has been running for seven or eight years, still using Python 2.7 + Django 1.8, with no plans to upgrade.

Currently, we have 1 server with 4 cores and 8GB, and 3 servers with 8 cores and 16GB on Alibaba Cloud. The database and Redis are also hosted on Alibaba Cloud, costing about 50,000 yuan a year. We use Qiniu for CDN, which also costs several tens of thousands a year. There are three programmers, including myself, maintaining it, and we are adding new features almost every week. After several years of adjustments, effective code is estimated to be less than 70%, with some code not being used due to business reasons.

In 2021, we developed another system using Python 3.8 + Django 3, which usually has low traffic but spikes during holidays. So far, the highest record is 350 orders per minute, with a daily total of 150,000 orders, generating approximately 15 million in transaction volume that day. Normally, we have two servers with 8 cores and 16GB, and during holidays, we scale up to 6 servers, with temporary upgrades for the database and others. There are four programmers, including myself, maintaining it.

We also have a few smaller projects that haven’t gained much traction, typically with two people responsible for one project, and one person handling two projects in a cross-functional manner.

Currently, the entire company’s backend technology stack is Python + Django + Gunicorn (with a small project using Tornado). The company has accumulated some foundational frameworks based on Django, and the staff is not large, with about 14-15 programmers who are generally familiar with this framework. Newcomers usually go through a process of strengthening their Python fundamentals -> learning Django -> learning the company’s framework -> entering project development.

The company places more emphasis on naming conventions, styles, etc., and pays more attention during code reviews. After becoming familiar, everyone has developed a good tacit understanding, so the drawbacks of Python as a dynamic language have not been significantly manifested.

In the early days, due to lack of experience, we encountered crashes when concurrency increased slightly. Later, we upgraded the database (which was originally self-built) and implemented some Redis caching, which has significantly reduced such occurrences.

Some queries constructed by Django are overly complex or not optimized, leading to slow queries. The current solution is to regularly monitor slow logs and find the code that needs optimization. The database also needs to be upgraded according to business needs. This is actually the same for any language.

I have encountered most programming languages, but I feel that Python allows me to express my intentions to the computer as easily as my mother tongue.

From my perspective, once programmers become familiar with Python, they only need to understand the business and convert requirements into code without spending too much time on the technical aspects. Python has a rich library ecosystem, and most problems encountered have existing solutions. Django’s ORM is also excellent, allowing programmers to operate the database conveniently without worrying about table structure changes, complex queries, etc.

Of course, there are drawbacks, such as Python being relatively cumbersome, especially as projects grow larger and more complex, leading to longer startup times and increased memory usage.

Django’s ORM, while convenient, can also lead to inefficient code. For example, I often see people constructing overly complex queries that result in too many joins, leading to long query times, or retrieving unnecessary fields all at once, as well as excessive data queries within for loops.

However, I believe these drawbacks are not fatal. Compared to labor costs and development efficiency, the cost of increasing cloud server resources is very low. Moreover, for performance, most projects do not reach the stage where optimization is necessary before they fail. Some standards or usage methods can be improved through training, and the quality of code written by most people will gradually improve.

Besides web applications, we also use Python on some hardware devices (mostly Linux-based single-board computers like Raspberry Pi, 7688, etc.). The advantage is that development can be done on a computer and then directly deployed to the device without needing to hire embedded engineers. After encapsulating the hardware calls, any backend developer in the company can develop.

We also use Python for image processing, web scraping, automated testing, and CI/CD.

For small teams, Python’s low entry barrier and high efficiency are more valuable than the elusive performance loss type, provided that standards are established, quality is emphasized, and continuous attention and optimization are maintained.

I did not expect this response to attract so much attention, so I’ll add a few more points.

The system mentioned earlier, which handles 350 orders per minute, primarily aggregates orders from several food delivery platforms into the system, allowing merchants to call delivery personnel through an aggregated delivery platform. The entire process involves synchronizing food delivery orders and managing features.

Order notifications from the food delivery platforms (new orders, order status changes, etc.) notify our system via HTTP requests. Initially, we implemented a synchronous approach, meaning that upon receiving a request, we would call the food delivery platform’s order query interface (and several accompanying interfaces) to obtain order details, create orders in the database, and then respond. Due to the large number of network requests, it took considerable time, and as soon as the concurrency increased slightly, we couldn’t handle it. We tried running multiple machines and processes, but the effect was minimal.

I remember that in the early days, processing 30 orders per minute was basically the limit; exceeding that led to noticeable slow responses. However, the food delivery platforms required us to respond within a specified timeframe, so this synchronous processing approach couldn’t last long before we hit a significant bottleneck. We tried multi-threaded task queues, but the results were unsatisfactory and there was a risk of task loss.

Later, we used Celery. After receiving a notification, we placed the message in the Celery queue and returned immediately, allowing Celery worker processes to handle it gradually, thus avoiding overload during peak times. Since placing messages in the Celery queue is a very quick operation, the system can respond to food delivery platform notifications instantly.

Based on the backlog of messages, we adjust the number of Celery worker processes accordingly and can allocate different queues based on message priority, ensuring that new order notifications are processed in a timely manner so that merchants are aware of new orders needing attention.

Initially, we used Redis for Celery’s message distribution, but later switched to RabbitMQ for easier monitoring. After several years of iteration, we are relatively confident in handling peak periods during holidays. We can temporarily increase cloud resources as needed, and Celery’s worker processes are designed for automatic scaling. In principle, unless we encounter extremely extreme situations, we believe we can handle it.

In addition to the aforementioned systems, about seven or eight years ago, we used Python 2 + Django 1.8 to develop a data reporting system for the government. Each year, we would open it for a week for companies to report data, with around 4,000-5,000 companies participating, each filling out seven or eight forms. I didn’t pay attention to the concurrency at that time, but conservatively estimated it would be in the dozens.

Initially, we ran it using Django’s built-in runserver mode (due to inexperience), which easily led to stalling. After switching to Gunicorn and running multiple processes, we no longer faced language-level stalling issues. When it was slow, it was likely due to high database load or MongoDB data aggregation.

The server configuration was not high, only 2 cores and 8GB, running Python Web, MySQL, MongoDB, and several other application processes. This system ran for three years, and in the fourth year, due to changes in government relations, another company was assigned to redevelop it. Their functionality was not as extensive as ours and was more cumbersome; I don’t know what language they used.

For this project, the Python + MongoDB approach provided us with great flexibility because the data reported each year is different, and the statistical indicators also vary. The entire system supports customizable reporting forms, data validation, data import/export, and custom statistics. I feel it would be very difficult to achieve similar results with another language, or it would require a significantly higher cost.

Of course, this system did not require much maintenance; it was essentially a one-time development, and we only needed to ensure it remained accessible. At that time, I led a junior programmer in its development. I handled the core architecture and most of the code implementation, while he worked on simpler logic, UI, and table definitions, which he might not have fully understood given the complexity of my code. The maintainability of such complex systems largely depends on standards, documentation, and training, rather than language-level type constraints.

We also developed an internal office system for a travel agency, primarily targeting Southeast Asian travel agencies. It supports multiple languages and currencies, covering almost all daily operations of a travel agency, including planning, group formation, hotel and transportation bookings, shopping, guiding, customer management, accounting, revenue, finance, reports, and charts.

This was also done using Python 2 + Django 1.8. We deployed an independent web process and a database for each travel agency (the database name was independent, but each ran on a single MySQL instance). After starting, each web process consumed about 170MB of memory. We used 2 cores and 8GB machines, where each machine could serve about 40 clients. Generally, the daily user data from clients was around 10, while larger travel agencies could have 20-30 employees operating simultaneously, with most checking or entering data, estimating that the concurrency for each agency would not exceed 10.

At the beginning of each month, when all agencies were doing accounting and data exporting, they occasionally reported slowdowns. From my observations, most were related to database performance issues. Our solution was to advise clients to wait a while before exporting (or if the data to be exported was large, to do it at night). As long as a few agencies staggered their data exports, there were no issues.

To save costs, we also self-hosted the database on cloud servers, pushing the limits of the cloud servers. During the day, the servers typically ran at over 80% CPU utilization and over 90% memory usage, with CPU usage peaking during data exports.

Before 2020, we had over 100 clients, but tourism was largely halted during the three years of the pandemic. This year, we have started to recover some clients, but the situation is nowhere near what it was, as revenues have plummeted and clients’ willingness to pay has significantly decreased.

I started working with Python around 2012, after using Java and C# extensively. Java was used for Android and web development, while C# was for Windows desktop applications and Windows Phone development.

Before, I found the XML configuration of the Java SSH framework very cumbersome, and I wonder if Spring is more convenient now. Apart from the framework issues, I feel that Java itself is relatively verbose.

C# feels better than Java, but it falls short in cross-platform scenarios. Therefore, I now choose C# only for desktop applications; otherwise, I wouldn’t choose it, especially since most desktop applications are now web-based.

Python, on the other hand, is simple enough to allow people to focus on business, which is why the company decided to use Python as its primary language (even though my familiarity with Python at that time was not as great as with Java and C#). The entire team has also shifted towards Python, but I believe that with the development of AI, Python will become increasingly popular.

A few years ago, it was relatively difficult to recruit Python developers; most came from other languages and gradually adapted to Python. In recent years, there have been more people with a basic understanding of Python (thanks to public account advertisements?), but most still need to go through a familiarization and strengthening process.

Generally, those who have worked on a project for more than half a year become quite proficient, while those who are slower may take over a year. The key factor is interest; some people genuinely enjoy programming and will continue to make progress in their spare time.

Different team experiences cannot be fully replicated. I am one of the founders of the company, and I essentially determine the technical direction, so there are no issues with turnover.

I am still very interested in using Python to solve most of the technical problems we encounter, including how to standardize processes and train people to improve code control and skill levels.

In summary, over the years, we have accumulated a certain amount of technical and management experience, and to be honest, I feel less confident about switching to other languages.

Why Our Company Still Uses Python for Development

Finally, I would like to recommend the content of the quantitative booklet written by our team, 45 articles! Covering Python installation, getting started, data analysis, scraping historical + real-time data of stocks and funds, as well as how to write a simple quantitative strategy, backtesting strategies, and how to read capital curves, all are included! Extremely valuable!

Welcome to subscribe:Original price299, early bird price of the cost of 2 cups of coffee, for permanent access. The price will increase again after reaching 400 people, and the current price is very low, just 2 cups of milk tea, for lifetime subscription + course source code, plus a permanent support group. 48-hour no-reason refund, feel free to enjoy!

Why Our Company Still Uses Python for Development

Recommended Reading:

Quantitative: How to Use Python to Scrape Historical + Real-time Stock Data!|Practical Stock Analysis Series Using Pandas to Extract Insights from Wuliangye Stock Prices!|Practical Stock Data Analysis Series Pandas Rolling Operations|Quantitative Stock First Step, Using Python to Draw Stock K-line, Double Moving Average Chart, Visualize Your Stock Data!|How to Use Python to Scrape All 800+ ETF Fund Data!|How to Write a Double Moving Average Strategy with Python!|How to Develop a Multi-strategy Robot with Python! Part One!|Python Quantitative Series-How Much Can You Earn by Buying Wuliangye with a Bollinger Strategy?|Only 4 Seconds! Use Python to Get 34 Years of Historical Daily Data for the Shanghai Composite Index!

Getting Started: The Most Comprehensive Zero-Basis Questions to Learn Python | Zero-Basis Learning Python for 8 Months | Practical Projects | Learning Python is This Shortcut

Insights:Scraping Douban Short Reviews of the Movie “Us” | 38 Years of NBA Best Player Analysis |From High Expectations to Poor Reviews! Detective Chinatown 3 is Disappointing| Laughing at the New Legend of the Condor Heroes|Riddle Answering King|Using Python to Create a Massive Sketch of Beautiful Girls | The Popularity of Mission Impossible, I Used Machine Learning to Create a Mini Recommendation System for Movies

Fun:Pinball Game | Sudoku |Beautiful Flowers|200 Lines of Python to Create a “Cool Running” Game!

AI: Robots that Can Write Poetry|Coloring Pictures|Income Prediction|Using Machine Learning to Create a Mini Recommendation System for Movies

Tools: PDF to Word, Easily Handle Tables and Watermarks!|One-Click Save HTML Pages as PDF!| Goodbye PDF Extraction Fees!|Creating the Most Powerful PDF Converter with 90 Lines of Code, One-Click Conversion of Word, PPT, Excel, Markdown, HTML|Creating a Low-Cost Flight Alert Tool for DingTalk!|60 Lines of Code to Create a Voice Wallpaper Switcher, Enjoy Beautiful Girls Every Day!

Leave a Comment