Author: Wada Xiwa
https://www.zhihu.com/question/278798145/answer/3416549119
In recent years, I have often seen some large companies that heavily used Python migrating to other language tech stacks, but what about those small companies/small teams?
I have always wanted to understand how those companies that still insist on using Python, and have a certain scale of business, use the Python tech stack for development, what difficulties/lessons they encounter, and what excellent experiences they have.
By chance, I saw an answer under the question “Why do software companies rarely use Python for web development?” on Zhihu, and I would like to share it with everyone.
Author: Wada Xiwa
(https://www.zhihu.com/question/278798145/answer/3416549119)
Response:
I have been using Python for over 10 years now. The longest-maintained project I have has an annual transaction volume of several hundred million, which is an e-commerce platform. The concurrency is not large, usually around dozens, and during holidays it goes over 100. At its peak, I never saw it exceed 200. The total number of orders in the database is around 50 million, increasing by several thousand every day. The project has been around for seven or eight years and still uses Python 2.7 + Django 1.8, with no plans to upgrade.
Currently, we have one 4-core 8GB server and three 8-core 16GB Aliyun servers. The database and Redis are also on Aliyun, costing about 50,000 a year. We use Qiniu for CDN, which costs about several tens of thousands a year. There are three programmers, including myself, maintaining it, and we basically add new features every week. After several years of adjustments, I estimate that effective code is less than 70%, as some code is no longer used due to business reasons.
In 2021, we developed another system using Python 3.8 + Django 3. Usually, there is not much load, but during holidays, it can spike. So far, the highest record is 350 orders per minute, with 150,000 orders in a day, and that day had a transaction volume of about 15 million. We usually configure two 8-core 16GB servers, and during holidays, we expand to six servers, and the database is also temporarily upgraded.
Additionally, we have a few small projects that haven’t taken off and have little volume, usually with two people responsible for one project, and one person simultaneously handling two projects in a cross manner.
Currently, the entire company’s backend tech stack is Python + Django + Gunicorn (there’s a small project using Tornado). The company has accumulated some basic frameworks based on Django, and the entire company is not large, with about 14 to 15 programmers who are familiar with this framework. When new employees come in, they usually go through a process of strengthening their Python basics -> learning Django -> learning the company’s framework -> entering project development.
The company has stricter requirements on naming conventions, styles, etc., and pays more attention during code reviews. After everyone gets familiar, we all have a tacit understanding, so the disadvantages of Python as a dynamic language have not been manifested much.
In the early days, due to lack of experience, we encountered situations where the system crashed slightly under increased concurrency. Later, we upgraded the database (which was self-built in the early days) and implemented some Redis caching, which has reduced such occurrences significantly.
Some of the query languages constructed by Django are overly complex or not optimized, leading to slow queries. The current solution is to regularly monitor slow logs, identify the code causing the slowdown, and upgrade the database according to business needs. This is actually the same for any language.
I have encountered most programming languages, but I find that Python allows me to express my ideas to the computer easily, like my mother tongue.
From my perspective, once programmers are familiar with Python, they only need to understand the business and convert requirements into code without spending too much time on technical issues. Python has a rich library, and most problems encountered have ready-to-use solutions. Django’s ORM is also great, allowing programmers to operate databases conveniently without worrying about table structure changes and complex queries.
There are drawbacks, such as Python being relatively cumbersome, especially as projects grow larger and more complex. The startup loading time increases, and memory usage also grows.
While Django’s ORM brings convenience, it also leads to some inefficient code. For example, I often see people constructing overly complex queries that lead to long join times or retrieving unnecessary fields all at once, as well as a large number of data queries within for loops.
However, I don’t think these drawbacks are fatal because compared to labor costs and development efficiency, the cost of increasing cloud server resources is extremely low. Moreover, for performance, most projects do not reach the optimization stage before they fail. Some norms or usage methods can be improved through training, and the overall quality of the code written by most people will gradually improve.
Besides the web itself, we also use Python on some hardware devices (mostly single-board computers with Linux, such as Raspberry Pi, 7688, etc.). The advantage is that we can develop on a computer and directly run it on the device without needing to hire embedded engineers. Once the hardware calling part is encapsulated, any backend developer in the company can develop it.
We also use Python in areas like image processing, web scraping, automated testing, and CICD.
For small teams, Python’s low threshold and high efficiency are more valuable than the so-called performance loss types that are difficult to grasp. Of course, the premise is to establish norms, emphasize quality, and maintain continuous attention and optimization.
I didn’t expect this answer to attract so much attention, so I will add a few more points.
The system mentioned above that processes 350 orders per minute mainly aggregates orders from several food delivery platforms into the system and allows merchants to use the aggregated delivery platform to call couriers for delivery. The entire process involves synchronizing food delivery orders and delivery orders, as well as some management functions.
The order notifications from the food delivery platform (new orders, order status changes, etc.) are notified to our system via HTTP requests. In the early days, we did it synchronously, meaning that after receiving a request, we called the order query interface of the food delivery platform (and several other supporting interfaces) to obtain order detail data, create the order in the database, and then respond. Due to the large number of network requests, it took considerable time, and the system couldn’t handle even a slight increase in concurrency. We tried running multiple machines and processes, but the effect was minimal.
I remember that in the early days, processing 30 orders per minute was basically the limit. Beyond that, there would be noticeable slow responses, while the food delivery platform required us to respond within a specified time, so this synchronous processing approach couldn’t last long before we encountered significant bottlenecks. We tried multithreading task queues, but the results were unsatisfactory, and there was a risk of losing tasks.
Later, we used Celery. After receiving a notification, we placed the message in the Celery queue and returned, allowing Celery’s worker processes to handle it slowly, thus avoiding being overwhelmed during peak periods. Since putting messages into the Celery queue is a very fast operation, the system can respond immediately to the notifications from the food delivery platform.
Based on the message backlog situation, we appropriately adjust the number of Celery worker processes and can allocate different queues according to message priority, ensuring that new order notifications can be processed promptly, allowing merchants to know as soon as possible that there are new orders to handle.
Initially, we used Redis for Celery’s message distribution, but later switched to RabbitMQ for easier monitoring. After several years of iteration, we are relatively confident in handling peak periods during holidays. We can temporarily increase cloud resources, and the Celery worker processes have also been set to auto-scaling. In principle, unless we encounter extremely extreme situations, we are confident we can handle it.
In addition to the above, about seven or eight years ago, we used Python 2 + Django 1.8 to create a data reporting system for the government. Each year, we would open it for a week for companies to fill in their data, with about 4,000 to 5,000 enterprises participating, each filling in seven or eight forms. I didn’t pay attention to the concurrency at that time, but conservatively estimated it would be dozens.
Initially, we ran it using Django’s built-in runserver mode (which was also due to inexperience), and it easily encountered stalling issues. Later, after running several processes with Gunicorn, we no longer faced stalling issues at the language level. When it was slow, it was mostly due to high database load or MongoDB doing data aggregation.
The server configuration was not high, only 2C8G, running Python Web, MySQL, and MongoDB, along with a bunch of other application processes. This system ran for three years, and in the fourth year, due to changes in government relations, another company was brought in to redevelop it. Their features were not as extensive as ours, and it was harder to use, and I don’t know what language they used.
In this project, the Python + MongoDB approach gave us great flexibility because the data filled out each year is different, and the statistical indicators also vary. The entire system supports custom reporting forms, data validation, data import/export, and custom statistics. I feel it would be very difficult to achieve such an effect with another language, or the cost would be much higher for the same effect.
Of course, this system had little maintenance work; it was basically a one-time development, and we just needed to ensure it was accessible. At that time, I led a junior programmer in development. I was responsible for the core architecture and most of the code implementation, while he handled simpler logic, UI, table definitions, etc. He might not have easily understood the code I wrote. The maintainability of the code for such complex systems largely depends on norms, documentation, and training rather than language-level type constraints.
We also developed an internal office system for travel agencies, mainly targeting Southeast Asian travel agencies, supporting multiple languages and currencies, covering almost all daily operations of travel agencies, including planning, group formation, splitting groups, hotels, transportation, shopping, tour guides, customers, accounting, revenue, finance, reports, charts, etc.
This was also done using Python 2 + Django 1.8. We deployed a separate web process + a database for each travel agency (the database names were independent, but each ran a MySQL on one machine). After starting each web process, the memory usage was about 170MB. We used 2C8G machines, and each machine could serve about 40 clients. Generally, daily user data usage for clients was around 10, with larger travel agencies having about 20 to 30 employees operating simultaneously. Most of them were checking and entering data, with each agency’s concurrency estimated not to exceed 10.
At the beginning of each month, when each agency was doing accounting and exporting data, they occasionally reported stalling issues. From my observation, most of these were due to database performance issues. Our solution was to advise clients to wait a while before exporting (or if the data to be exported was large, we let them export at night). As long as a few agencies staggered their exports, there were no issues.
To save costs, we also self-built the database on cloud servers, almost squeezing the cloud servers to their limits. During the day, the servers were basically running at over 80% CPU usage and over 90% memory usage, and during data exports, the CPU would be fully utilized.
Before 2020, we had over 100 clients, but during the three years of the pandemic, we hardly did any tourism. In recent years, we have regained some clients, but it is not comparable to before, as income has sharply decreased, and clients’ willingness to pay has also declined significantly.
I started using Python around 2012, before which I mostly used Java and C#. I used Java for Android and Web development, while C# was for Windows desktop and Windows Phone development.
Before, I found Java’s SSH framework XML configuration very cumbersome, feeling like I was mostly writing nonsense. I don’t know if Spring is more convenient now, but besides the framework hassle, I feel that Java itself is relatively cumbersome.
C# feels better than Java, but it falls short in cross-platform capabilities. So now, I choose C# mainly for desktop applications and won’t choose it in other situations, especially since most situations now use web applications for desktop purposes.
Python, on the other hand, feels simple enough to allow people to focus on business, which is why our company chose Python as the primary language (though at that time, I was actually less familiar with Python than with Java and C#). The entire team has also been built around Python, but I also want to express that with the development of AI, Python will also become more popular.
In previous years, it was relatively difficult to recruit Python developers. Most came from other languages and gradually adapted and became familiar with Python. In recent years, there have been more people with Python backgrounds (thanks to public account advertisements?), but most levels are still not high and require a familiarization and strengthening process.
Generally, good programmers can become relatively proficient after six months on the project, while slower ones might take over a year. The key factor is still interest; some people just have a natural inclination for programming and will make rapid progress by working on personal projects after hours.
Different team experiences cannot be completely replicated. I am one of the founders of the company, and I determine most technical matters, so there are no issues like turnover.
I am still very interested in using Python to solve most of the technical problems we encounter, including how to standardize and lead people to make the code controllable and improve personnel skills.
Overall, over the years, I have accumulated some technical and managerial experience, and honestly, I feel less confident about switching to another language.
1. 8 Essential Skills for Python Web Scraping Experts!
2. A Detailed Explanation of 66 Built-in Functions in Python! Includes Code
3. This “Temporary Solution” Has Been Used by Windows for 30 Years, Netizens Say: The Joke Comes from Reality