△△Please star “Python Cat” to avoid missing article updates
Hello, I am Cat Brother. In recent years, I have often seen some large companies that heavily used Python migrate to other language technology stacks, but what about those small companies/small teams?
I have always wanted to understand how those companies that still insist on using Python and support a certain scale of business use the Python technology stack for development, what difficulties/lessons they encounter, and what excellent experiences they have?
By chance, I saw an answer under the question “Why do software companies rarely use Python for web development?” on Zhihu, and I would like to share it with everyone here.
Author: Wada Xiwa
Source: https://www.zhihu.com/question/278798145/answer/3416549119
Copyright Statement: This article is reproduced with the permission of the original author Python Cat. To facilitate reading, the title and layout have been slightly edited. Copyright belongs to the original author. If you need to reprint, please contact the original author.
I have been using Python for more than 10 years, and the longest-maintained project has an annual transaction volume of several hundred million, which is an e-commerce platform with not a large concurrency, usually dozens of concurrent requests, and over a hundred during holidays. At the extreme, I have never seen it exceed 200, and the maximum number of orders in the database is about 50 million, increasing by tens of thousands every day. The project has been running for seven or eight years, still using Python 2.7 + Django 1.8, and there are no plans to upgrade.
Currently, we are equipped with one 4-core 8G server and three 8-core 16G servers on Alibaba Cloud, and the database and Redis are also on Alibaba Cloud, costing about 50,000 a year. We use Qiniu for CDN, which costs about several tens of thousands a year. There are three programmers, including myself, maintaining it, and we basically add new features every week. After several years of adjustments, the effective code is estimated to be less than 70%, and some code has not been used for business reasons.
In 2021, I developed another system using Python 3.8 + Django 3. There is usually not much volume, but during holidays, it can spike. The highest record so far is 350 orders per minute, with 150,000 orders in one day, and that day’s transaction volume was nearly 15 million. We have two 8-core 16G servers configured during normal times, and during holidays, we will temporarily expand to six servers, and the database will also be upgraded temporarily.
There are also several small projects, but they haven’t taken off much, and there isn’t much volume, roughly one person is responsible for one project, and one person is responsible for two projects in a cross manner.
Currently, the entire company’s backend technology stack is Python + Django + Gunicorn (there’s a small project using Tornado). The company has accumulated some basic framework based on Django, and the entire company is not large, with about fourteen or fifteen programmers who are basically familiar with this framework. When new people come in, they generally go through a process of strengthening Python basics -> learning Django -> learning the company framework -> entering project development.
The company has more requirements for naming, style, etc., and pays more attention during code reviews. After everyone becomes familiar, they have a good tacit understanding, so the disadvantages of Python as a dynamic language have not been reflected much.
In the early days, due to lack of experience, there were cases where the system crashed slightly when concurrency increased. Later, we upgraded the database (it was self-built in the early days) and implemented some Redis caching, which has significantly reduced these occurrences.
Some of the query languages constructed by Django are too complex or have not been optimized, leading to some slow queries. The current solution is to regularly monitor slow logs and find the code that appears to optimize. The database itself also needs to be upgraded according to business needs. This is actually the same for any language.
I have encountered most programming languages, but I feel that Python allows me to express my ideas to the computer as easily as my mother tongue.
Regarding Python, from my own feeling, once familiar, programmers only need to understand the business and convert requirements into code, without spending too much time on technical issues. Python has a lot of libraries, and most of the problems encountered have ready-made solutions. Django’s ORM is also great, allowing programmers to easily operate the database without worrying about table structure changes or complex queries.
There are also disadvantages, such as Python being relatively cumbersome, especially as the project becomes larger and more complex, the startup loading time increases, and the memory usage also grows.
Django’s ORM brings convenience but also some inefficient code, such as frequently seeing people constructing complex queries that lead to too many joined tables and long query times, or often retrieving unnecessary fields all at once, and a large number of queries within for loops, etc.
However, I think these disadvantages are not fatal, because compared to labor costs and development efficiency, the cost of increasing cloud servers is extremely low. Moreover, for performance, most projects do not reach the stage that requires optimization before they fail. Some specifications or usage methods can be improved through training, and the code quality written by most people will gradually improve.
Besides the web itself, we also use Python on some hardware devices (mostly Linux single-board computers like Raspberry Pi, 7688, etc.). The benefit is that development can be done on a computer and then directly run on the device, without the need to hire dedicated embedded engineers, and basically, after encapsulating the hardware calling parts, any backend developer in the company can develop.
We also use Python in areas such as image processing and recognition, web scraping, automated testing, and CICD.
For small teams, Python’s low threshold and high efficiency are more valuable compared to the elusive performance loss type, of course, the premise is to establish specifications, emphasize quality, and maintain continuous attention and optimization.
I didn’t expect so many people to pay attention to this answer, so I’ll add a few more points.
The system mentioned above that processes 350 orders per minute mainly aggregates orders from several food delivery platforms into the system, allowing merchants to use the aggregated delivery platform to call riders for delivery, involving the synchronization of food delivery orders and delivery orders, as well as some management functions.
The order notifications from the food delivery platforms (new orders, order status changes, etc.) are notified to our system via HTTP requests. In the early days, we did it synchronously, meaning after receiving the request, we called the food delivery platform’s order query interface (and several other supporting interfaces) to retrieve order details and create orders in the database, then respond. Because a large number of network requests take considerable time, the system could not handle even a slight increase in concurrency. We tried opening several machines and multiple processes, but the effect was not significant.
I remember that in the early days, processing 30 orders per minute was basically the limit, and any more would result in obvious slow responses, while the food delivery platform’s notifications required us to respond within a specified time, so this synchronous processing approach could not last long before hitting a significant bottleneck. We tried using multithreading task queues, but the results were not good and there was a risk of task loss.
Later, we used Celery, which allowed us to put the message into the Celery queue upon receiving the notification and return immediately, letting Celery’s worker processes handle it slowly, avoiding being overwhelmed during peak periods. Because placing messages into the Celery queue is a very fast operation, the system can respond to the food delivery platform’s notification messages promptly.
According to the message backlog situation, we adjust the number of Celery worker processes accordingly, and can allocate different queues based on message priority, ensuring that new order notification messages can be processed in a timely manner, allowing merchants to know as soon as possible that they have new orders to process.
Initially, we used Redis for Celery’s message distribution, but later switched to RabbitMQ for easier monitoring. After several years of iteration, we are relatively confident in handling peak periods during holidays, and we can temporarily increase cloud resources as needed, and the Celery worker processes are also set to auto-scale. In principle, unless we encounter extremely extreme situations, we are confident that we can handle it.
In addition to the above, about seven or eight years ago, we used Python 2 + Django 1.8 to build a data reporting system for the government. Each year, we open it for a week for companies to fill in their data, with about four to five thousand companies participating, each filling in seven or eight forms, and the concurrency was not paid much attention to at that time, but conservatively estimating it would be dozens.
Initially, we ran it using Django’s built-in runserver mode (actually due to inexperience), and it easily encountered stalling issues. Later, after running several processes through Gunicorn, we no longer encountered language-level stalling issues. When it was slow, it was mostly due to high database load or MongoDB performing data aggregation.
The server configuration was not high, just 2C8G, running Python Web, MySQL, and MongoDB, along with several other application processes. This system ran for three years, and in the fourth year, due to changes in government relations, it was redeveloped by another company, which had fewer functions than ours and was less user-friendly, and I don’t know what language they used.
In this project, the Python + MongoDB approach gave us great flexibility because the data filled in each year is different, and the statistical indicators also vary. The entire system supports custom form filling, data validation, data import/export, and custom statistics. I feel it would be very difficult to achieve such results with another language, or achieving the same results would come at a higher cost.
Of course, this system had very few maintenance issues; it was basically a one-time development, and afterwards, it just needed to be accessible. At that time, I led a junior programmer in development; I was responsible for the core architecture and most of the code, while he worked on simpler logic, UI, form definitions, etc. He might not have found it easy to understand the code I wrote. The maintainability of the code for such complex systems greatly depends on specifications, documentation, and training, rather than type constraints at the language level.
We also developed an internal office system for travel agencies, mainly targeting Southeast Asian travel agencies, supporting multiple languages and currencies, covering almost all daily operations of travel agencies, including planning, group formation, group splitting, hotel transportation, shopping, guides, clients, accounting, revenue, finance, reports, charts, etc.
This was also done using Python 2 + Django 1.8. We deployed a separate web process + a database for each travel agency (the database name is independent, but they all run on one machine with MySQL). After each web process starts, it occupies about 170MB of memory. We used 2C8G machines, and each machine could serve about 40 clients. Generally, the daily user data for each client is around 10, and larger travel agencies might have 20 to 30 employees operating simultaneously. Most of them are checking or entering data, and the concurrency for each agency is estimated not to exceed 10.
At the beginning of each month, when each agency was doing accounting and exporting data, they occasionally reported stalling issues. From my observation, most of it was due to performance issues at the database level. Our solution was to tell clients to wait a while before exporting (or if they needed to export a lot of data, let them do it at night). As long as a few agencies stagger their data exports, it’s generally fine.
To save costs, we also built the databases on cloud servers, almost pushing the cloud servers to their limits. During the day, the servers were generally running at over 80% CPU usage and over 90% memory usage, and when exporting data, the CPU would hit 100%.
Before 2020, we had over 100 clients, but during the three years of the pandemic, tourism was basically non-existent. In recent years, we have regained some clients, but it’s nowhere near what it used to be, as income has sharply declined, and clients’ willingness to pay has also decreased significantly.
I started using Python around 2012. Before that, I mostly used Java and C#. Java was used for Android and web development, while C# was used for Windows desktop applications and Windows Phone development.
Before, I found the XML configuration of the Java SSH framework quite cumbersome, and I don’t know if Spring is more convenient now. Besides the framework issues, I found Java itself relatively tedious.
I found C# to be better than Java, but it fell short in cross-platform capabilities, so now I would choose C# only for desktop applications; otherwise, I wouldn’t choose it. Moreover, nowadays, most desktop applications are done via web.
Python, on the other hand, feels simple enough to allow one to focus on business, which is why the company chose Python as its main language (although at that time, I was not as familiar with Python as I was with Java and C#). The entire team was also built around Python, but I also fully express that with the development of AI, Python will become popular.
In previous years, it was relatively difficult to recruit Python developers; most came from other languages and gradually adapted and became familiar. In recent years, there have been more people with Python backgrounds (thanks to public account advertisements?), but most levels are not high and still require a process of familiarization and strengthening.
Generally, those who are good can become quite proficient after half a year on the project, while those who are slower might take over a year. The key still lies in interest; some people just have a natural inclination towards programming and will work on their own interests after work, progressing quickly.
Different team experiences cannot be completely replicated. I am one of the founders of the company, and I basically make the technical decisions, and there are no issues like resignations.
I am still very interested in using Python to solve most of the technical problems we encounter, including how to standardize and train people to improve code controllability, and I am confident about that.
Overall, these years of ups and downs have accumulated some technical and management experience, and to be honest, I am not as confident about switching to another language.
That’s all for today’s sharing. A reminder, our book giveaway activity “Python Craftsman” is still ongoing, don’t miss it! –> View Activity
