(Click Above to Quickly Follow and Set as Star, Let’s Learn Python Together)
Author: Wada Xiwa
https://www.zhihu.com/question/278798145/answer/3416549119
In recent years, we often see some large companies that heavily used Python migrating to other language tech stacks. But what about small companies or small teams?
I have always wanted to understand how those companies that still insist on using Python and have a certain scale of business use the Python tech stack for development, what difficulties or lessons they encounter, and what excellent experiences they have?
By chance, I saw an answer under the question “Why do software companies rarely use Python for web development?” on Zhihu, and I would like to share it with everyone here.
Author: Wada Xiwa
(https://www.zhihu.com/question/278798145/answer/3416549119)
Response:
I have been using Python for over 10 years. The longest-maintained project I have has an annual transaction volume of several hundred million and is an e-commerce platform. The concurrency is not large, usually around dozens, and during holidays, it can exceed 100. At its peak, I have never seen it exceed 200. The maximum total number of orders in the database is around 50 million, with thousands added daily. The project has been around for seven or eight years, still using Python 2.7 + Django 1.8, and there are no plans to upgrade.
Currently, we are equipped with one server with 4 cores and 8GB, and three servers with 8 cores and 16GB on Alibaba Cloud. The database and Redis are also on Alibaba Cloud, costing about 50,000 a year. We use Qiniu for CDN, which also costs several tens of thousands a year. There are three programmers, including myself, maintaining it, and we add new features almost every week. After several years of adjustments, I estimate that effective code is not even 70%, as some code is no longer used due to business reasons.
In 2021, we developed another system using Python 3.8 + Django 3. It usually has little traffic, but during holidays, it can spike. So far, the highest record is 350 orders per minute and 150,000 orders in a day, with a transaction volume of about 15 million that day. We usually have two servers with 8 cores and 16GB, and during holidays, we expand to six servers, with temporary upgrades to the database as well. There are four programmers maintaining it, including myself.
There are also a few small projects that haven’t taken off and don’t have much volume, with about two people responsible for one project, and one person managing two projects in a cross-over manner.
Currently, the entire backend tech stack of the company is Python + Django + Gunicorn (with a small project using Tornado). The company has accumulated a basic framework based on Django, and the total number of people in the company is not large, about fourteen to fifteen programmers, all of whom are familiar with this framework. When new people join, they generally follow a process of strengthening their Python basics -> learning Django -> learning the company framework -> entering project development.
The company has more requirements regarding naming, style, etc., and pays more attention during code reviews. Once everyone is familiar, we have a tacit understanding, so the disadvantages of Python as a dynamic language are not reflected much.
In the early days, we had no experience and encountered issues where concurrency would cause crashes. Later, we upgraded the database (which was self-built in the early days) and implemented some Redis caching, which significantly reduced these occurrences.
Some query languages constructed by Django are overly complex or not optimized, leading to slow queries. Our current solution is to regularly monitor slow logs, find the problematic code for optimization, and the database itself also needs to be upgraded according to business needs. This is actually the same for any language.
I have experience with most programming languages, but I find Python allows me to express my ideas to the computer as easily as my native language.
From my perspective, once programmers are familiar with Python, they only need to understand the business and convert requirements into code without spending too much time on technical aspects. Python has a rich library, and most issues encountered have ready-made solutions available. Django’s ORM is also great, allowing programmers to easily operate the database without worrying about table structure changes, complex queries, etc.
There are also downsides, such as Python being somewhat cumbersome, especially as projects grow larger and more complex, leading to longer startup and load times, and higher memory usage.
Django’s ORM, while convenient, can lead to inefficient code. For instance, it’s common to see people constructing complex queries that result in too many joins, leading to long query times, or querying unnecessary fields all at once, and performing large data queries inside for loops.
However, I believe these downsides are not fatal because compared to labor costs and development efficiency, the cost of increasing cloud servers is extremely low. Moreover, for performance, most projects won’t reach the optimization stage before they fail. Some specifications or usage methods can be improved through training, and the overall code quality of most people will gradually improve.
In addition to web development, we also use Python for some hardware devices (mostly Linux-based single-board computers like Raspberry Pi, 7688, etc.). The benefit is that development can be done on a computer and then directly run on the device, eliminating the need to hire embedded engineers. After encapsulating the hardware calling parts, any backend developer in the company can develop it.
We also use Python for image processing and recognition, web scraping, automated testing, and CI/CD.
For small teams, Python’s low threshold and high efficiency are more valuable compared to the elusive performance loss types, provided that standards are established, quality is emphasized, and continuous attention and optimization are maintained.
I didn’t expect this answer to attract so much attention, so I will add a few more points.
The system I mentioned that processes 350 orders per minute mainly aggregates orders from several food delivery platforms into the system, allowing merchants to use an aggregated delivery platform to call riders for delivery. The entire process involves synchronizing food delivery orders and delivery orders along with some management functions.
Notifications of orders from the food delivery platform (new orders, order status changes, etc.) are sent to our system via HTTP requests. In the early days, we used a synchronous approach, meaning that upon receiving a request, we would call the food delivery platform’s order query interface (and several other supporting interfaces) to obtain order detail data and create an order in the database before responding. Due to the large number of network requests, this took considerable time, and when the concurrency increased slightly, it couldn’t handle it. We tried running multiple machines and processes, but it had little effect.
I remember that in the early days, being able to process 30 orders per minute was basically the limit. Any more than that would result in obvious slow responses, and the food delivery platform’s notifications required us to respond within a specified time. Hence, this synchronous processing approach couldn’t last long before hitting a significant bottleneck. We tried using multi-threaded task queues, but the results were unsatisfactory and there was a risk of task loss.
Later, we used Celery. After receiving a notification, we would place the message in the Celery queue and return immediately, allowing Celery worker processes to handle it gradually, avoiding being overwhelmed during peak times. Since placing messages in the Celery queue is a very fast operation, the system can immediately respond to notifications from the food delivery platform.
Based on the backlog of messages, we adjust the number of Celery worker processes accordingly, and we can assign different queues based on message priority. This ensures that notifications of new orders can be processed promptly, allowing merchants to know about new orders that need to be handled as soon as possible.
Initially, we used Redis for Celery’s message distribution, but later switched to RabbitMQ for easier monitoring. After several years of iteration, we are relatively confident about handling peak times during holidays. We temporarily increase cloud resources as needed, and Celery’s worker processes are automatically scaled. In principle, as long as we don’t encounter extremely extreme situations, we are confident we can handle it.
In addition to the above-mentioned systems, about seven or eight years ago, we used Python 2 + Django 1.8 to develop a data reporting system for the government, which opens up for a week each year for enterprises to fill in data. Approximately four to five thousand enterprises fill out several forms each, and I conservatively estimate that the concurrency would also be in the dozens.
Initially, we ran it using Django’s built-in runserver mode (due to inexperience), and it was easy to encounter stalling issues. Later, after running several processes through Gunicorn, we no longer experienced language-level stalling issues. When it was slow, it was mostly due to high database load or MongoDB doing data aggregation.
We had a low server configuration of only 2C8G, running Python Web, MySQL, MongoDB, and a bunch of other application processes. This system ran for three years, and in the fourth year, due to changes at the government level, it was redeveloped by another company. Their functionality was less than ours, and it was harder to use, and I don’t know what language they used.
In this project, the Python + MongoDB approach provided us with great flexibility because the data reported each year is different, and the statistical indicators also vary. The entire system supports customizable reporting forms, data validation, data import/export, and custom statistics. I feel it would be very difficult to achieve such results in another language, or that achieving similar results would come at a higher cost.
Of course, this system doesn’t require much maintenance; it was basically developed once and then just needed to ensure access. At that time, I led a junior programmer in development. I was responsible for the core architecture and most of the code implementation, while he handled simpler logic, UI, and table definitions. He might have found it difficult to understand the pile of code I wrote. I believe the maintainability of such complex system code largely depends on standards, documentation, and training, rather than type constraints at the language level.
We also developed an internal office system for a travel agency, mainly targeting Southeast Asian travel agencies, supporting multiple languages and currencies, covering almost all daily operations of travel agencies, including planning, group formation, hotel transport, shopping, guides, guests, accounting, revenue, finance, reports, charts, etc.
This was also done using Python 2 + Django 1.8. We deployed a separate web process and database for each travel agency (the database names are independent, but each machine runs one MySQL). Each web process consumes about 170MB of memory when running. We used 2C8G machines, and each machine could provide services for about 40 clients. Generally, each client’s daily usage data is around 10, and larger travel agencies might have 20 to 30 employees operating simultaneously. The concurrency for most clients wouldn’t exceed 10.
At the beginning of each month, when each agency is doing accounting and exporting data, they occasionally report lag issues. From my observation, most of these are performance issues at the database level. Our solution is to tell clients to wait a while before exporting (or if they need to export a lot of data, we ask them to do it at night). As long as a few agencies stagger their data exports, it’s usually fine.
To save costs, we also self-built the database on cloud servers, almost squeezing the cloud servers to their limits. During the day, the servers usually run at over 80% CPU usage and over 90% memory usage, and when exporting data, the CPU can max out.
Before 2020, we had over 100 clients, but during the three years of the pandemic, tourism was basically non-existent. The last few years might have been the quietest time for those servers. Last year and this year, we have recovered some clients, but it cannot be compared to before, as income has sharply decreased, and clients’ willingness to pay has also dropped significantly.
I started using Python around 2012, before which I primarily used Java and C#. Java was used for Android and web development, while C# was for Windows desktop and Windows Phone development.
I found Java’s SSH framework XML configuration quite cumbersome, feeling like I was mostly writing nonsense. I wonder if Spring is more convenient now. Aside from the framework’s issues, I feel Java itself is relatively cumbersome.
C# feels better than Java, but it lags behind in cross-platform capabilities, so I now only choose C# for desktop applications. In other cases, I wouldn’t choose it, especially since most desktop applications are now also web-based.
Python, on the other hand, feels simple enough to allow focus on business, which is why the company decided on Python as the primary language (even though I was not as familiar with Python as with Java and C# at that time). The entire team was built around Python, but I also want to express that with the development of AI, Python will also become popular.
In recent years, hiring Python developers has been relatively challenging. Most of them came from other languages and gradually adapted and became familiar with Python. In recent years, there are more people with a Python background (thanks to public account advertisements?), but most of them are not very skilled and still need a familiarization and strengthening process.
Generally, those who perform well on projects for more than six months can become quite proficient, while those who are slower may take over a year. More importantly, it depends on interest; some people are just inherently passionate about programming and will work on their interests after work, which leads to rapid progress.
Different teams’ experiences cannot be entirely replicated. I am one of the founders of the company, and I determine the technical aspects, so there are no issues with employee turnover.
I am very interested in using Python to solve most of the technical issues we encounter, including how to standardize and how to guide people to ensure code control and personnel improvement, and I am confident in this.
Overall, over the years, I have accumulated some technical and management experience through ups and downs, and to be honest, I feel less confident about switching to other languages.
(End)
Is learning Python too lonely for one person?
Add the editor's WeChat to join the group and learn together!
Did you gain anything from reading this article? Please share it with more people.
Follow "Python Matters" to become a full-stack development engineer.
Those who click "Looking" will all become better-looking!