Why Our Company Still Uses Python for Development

Author: Wahda Xi Wa

https://www.zhihu.com/question/278798145/answer/3416549119

In recent years, I have often seen some large companies that heavily used Python migrating to other language tech stacks. But what about small companies or small teams?

I have always wanted to understand how those companies that still insist on using Python, and have a certain scale of business, use the Python tech stack for development, what difficulties/lessons they encounter, and what excellent experiences they have.

By chance, I saw an answer under the question “Why do software companies rarely use Python for web development?” on Zhihu, and I want to share it with everyone.

Why Our Company Still Uses Python for Development

Author: Wahda Xi Wa

Response:

I have been using Python for over 10 years. The longest-maintained project I have has an annual transaction volume of hundreds of millions; it is an e-commerce platform. The concurrency is not large, usually around dozens, and during holidays, it can exceed 100. At peak times, I have never seen it reach 200. The total number of orders in the database is around 50 million, increasing by tens of thousands daily. The project has been running for seven or eight years, still using Python 2.7 + Django 1.8, and there are no plans to upgrade.

Currently, we have one 4-core 8GB server and three 8-core 16GB servers on Alibaba Cloud. The database and Redis are also on Alibaba Cloud, costing about 50,000 a year. We use Qiniu for CDN, which costs a few tens of thousands a year. There are three programmers, including myself, maintaining it, and we add new features almost every week. After several years of adjustments, I estimate that effective code is less than 70%, as some parts are no longer used due to business reasons.

In 2021, we developed another system using Python 3.8 + Django 3. The usual load is not high, but it spikes during holidays. The highest record so far is 350 orders per minute, with 150,000 orders in a day, resulting in a transaction volume of about 15 million. We usually have two 8-core 16GB servers, and during holidays, we scale up to six servers, with temporary upgrades to the database as well. There are four programmers, including myself, maintaining it.

We also have a few smaller projects that haven’t gained much traction, with about two people responsible for one project, and one person simultaneously managing two projects in a cross-functional manner.

Currently, the entire company’s backend tech stack is Python + Django + Gunicorn (with a small project using Tornado). The company has accumulated some basic frameworks based on Django, and there are not many people in the company, about 14 or 15 programmers, all of whom are familiar with this framework. New hires usually go through a process of strengthening their Python basics -> learning Django -> learning the company framework -> entering project development.

The company has strict requirements regarding naming, style, etc., and pays more attention during code reviews. Once everyone is familiar, we have a good tacit understanding, so the disadvantages of Python as a dynamic language have not manifested significantly.

In the early days, due to lack of experience, we encountered crashes when concurrency slightly increased. Later, we upgraded the database (which was self-built in the early days) and implemented some Redis caching, which significantly reduced such occurrences.

Some query languages constructed by Django are overly complex or have not been optimized, leading to slow queries. The current solution is to regularly monitor slow logs, identify the problematic code for optimization, and the database itself needs to be upgraded based on business needs. This is actually true for any language.

I have experience with most programming languages, but I find that Python allows me to express my intentions to the computer as easily as my mother tongue.

From my perspective, once programmers become familiar with Python, they only need to understand the business and convert requirements into code, without spending too much time on technical details. Python has a rich library ecosystem, so most problems encountered have ready-made solutions. Django’s ORM is also excellent, allowing programmers to interact with the database easily, without worrying about table structure changes or complex queries.

There are drawbacks, such as Python being somewhat cumbersome, especially as projects grow larger and more complex, leading to longer startup times and increased memory usage.

Django’s ORM, while convenient, can also lead to inefficient code. For example, I often see people constructing overly complex queries that cause excessive joins, resulting in long query times, or querying unnecessary fields all at once, as well as significant data queries within for loops.

However, I believe these drawbacks are not fatal because the cost of increasing cloud server resources is minimal compared to labor costs and development efficiency. Moreover, most projects do not reach the optimization stage before they fail. Some standards or usage methods can be improved through training, and the code quality of most people will gradually improve.

Besides web development, we also use Python on some hardware devices (mostly single-board computers with Linux, such as Raspberry Pi, 7688, etc.). The advantage is that development can be done on a computer and then directly run on the device, without needing to hire embedded engineers. Once the hardware calling parts are encapsulated, any backend developer in the company can develop it.

We also use Python in areas such as image processing and recognition, web scraping, automated testing, and CI/CD.

For small teams, Python’s low entry barrier and high efficiency outweigh the elusive performance loss, provided that standards are established, quality is emphasized, and continuous attention and optimization are maintained.

I didn’t expect this answer to attract so much attention, so I will add a few more points.

The system mentioned earlier, which processes 350 orders per minute, mainly aggregates orders from several food delivery platforms into our system, allowing merchants to use the aggregated delivery platform to call riders for delivery. The entire process involves synchronizing food delivery orders and delivery orders, as well as some management functions.

Food delivery platforms notify our system of order notifications (new orders, order status changes, etc.) via HTTP requests. In the early days, we implemented a synchronous method, which meant that upon receiving a request, we called the food delivery platform’s order query interface (and several other supporting interfaces) to obtain order detail data before creating the order in the database and responding. However, due to the large number of network requests, it took considerable time, and we couldn’t handle even a slight increase in concurrency. We tried to run multiple machines and processes, but it had little effect.

I remember that in the early days, processing 30 orders per minute was basically the limit. Beyond that, we experienced significant slowdowns, and the food delivery platform required us to respond within a specified time, so this synchronous processing method encountered significant bottlenecks and could not be sustained for long. We attempted multi-threaded task queues, but the results were unsatisfactory, and there was a risk of task loss.

Later, we adopted Celery. After receiving a notification, we put the message into the Celery queue and returned immediately, allowing Celery worker processes to handle it gradually, thus avoiding overload during peak times. Since putting messages into the Celery queue is a very fast operation, the system can respond instantly to notifications from the food delivery platform.

Based on the backlog of messages, we adjust the number of Celery worker processes accordingly and can assign different queues based on message priority, ensuring that new order notifications are processed promptly, allowing merchants to know about new orders that need handling as soon as possible.

Initially, we used Redis for Celery message distribution, but later switched to RabbitMQ for easier monitoring. After several years of iteration, we now have a better understanding of how to handle peak periods during holidays. We temporarily increase cloud resources as needed, and the Celery worker processes are set to auto-scale. In principle, unless we encounter extremely extreme situations, we are confident that we can handle it.

In addition to the aforementioned projects, about seven or eight years ago, we developed a data reporting system for the government using Python 2 + Django 1.8. Each year, we would open a week for enterprises to fill in data, with around 4,000 to 5,000 enterprises participating, each filling in seven or eight forms, with concurrency not closely monitored but conservatively estimated to be in the dozens.

Initially, we ran it using Django’s built-in runserver mode (which was also due to lack of experience), and it was easy to encounter lagging issues. Later, after running several processes through Gunicorn, we no longer faced language-level lag issues. When it was slow, it was mostly due to high database load or MongoDB data aggregation.

The server configuration was not high, just 2C8G, running Python Web, MySQL, MongoDB, and several other application processes. This system ran for three years, and in the fourth year, due to changes in government relations, it was redeveloped by another company, which had fewer features and was less user-friendly than ours, and I don’t know what language they used.

In this project, the combination of Python + MongoDB provided us with great flexibility because the data reported each year varies, and the statistical indicators also change. The entire system supports customizable reporting forms, data validation, data import/export, and custom statistics. I feel it would be very difficult to achieve such results with other languages, or the cost would be significantly higher.

Of course, this system required little maintenance; it was basically developed once and then needed to ensure accessibility. At that time, I led a junior programmer in the development; I was responsible for the core architecture and most of the code implementation, while he handled simpler logic, UI, form definitions, etc. He might not have easily understood the complex code I wrote. I believe the maintainability of such complex system code largely depends on standards, documentation, and training, rather than the type constraints at the language level.

We also developed an internal office system for travel agencies, mainly targeting Southeast Asian travel agencies, supporting multiple languages and currencies, covering almost all daily operations of travel agencies, including planning, group formation, hotel transportation, shopping, guides, guests, accounting, revenue, finance, reports, charts, etc.

This was also done using Python 2 + Django 1.8. We independently deployed a web process and a database for each travel agency (the database names were independent, but we ran one MySQL instance on one machine). Each web process used about 170MB of memory when running. We used 2C8G machines, and each machine could provide services to about 40 clients. Generally, the daily user data for each client is around 10, while larger travel agencies may have 20 to 30 employees operating simultaneously. The concurrency for most clients is estimated not to exceed 10.

At the beginning of each month, when clients are doing accounting and data exporting, they occasionally report lagging issues. From my observations, most of these are performance issues at the database level. Our solution is to tell clients to wait a while before exporting (or if they need to export a lot of data, we advise them to do it at night). As long as several clients stagger their data exports, there are usually no issues.

To save costs, we also self-built the database on cloud servers, almost pushing the cloud servers to their limits. During the day, the servers typically run at over 80% CPU usage and over 90% memory usage, with CPU hitting 100% during data exports.

Before 2020, we had over 100 clients, but the pandemic halted tourism for three years. This year, we may have the most relaxed period for those servers. Last year and this year, we have regained some clients, but it’s nothing compared to before, as revenue has sharply decreased, and clients’ willingness to pay has also declined.

I started using Python around 2012, after using Java and C# extensively. I used Java for Android and web development, and C# for Windows desktop applications and Windows Phone development.

Previously, I found Java’s SSH framework cumbersome with its XML configuration, feeling like I was mostly writing nonsense. I wonder if Spring is easier now. Besides the framework issues, I find Java to be relatively verbose.

C# feels better than Java, but it falls short on cross-platform capabilities, so I now choose C# for desktop applications, but not for other situations. Nowadays, most desktop applications are also web-based.

Python, on the other hand, feels simple enough to allow people to focus on business, which is why our company chose Python as our primary language (even though I was less familiar with Python than with Java and C# at that time). The entire team has also moved towards Python, but I believe that with the development of AI, Python will gain more popularity.

A few years ago, recruiting Python developers was relatively challenging; most of them came from other languages and gradually adapted to Python. In recent years, there have been more candidates with Python backgrounds (thanks to public account advertisements?), but most of them are still not at a high level and need a familiarization and strengthening process.

Generally, programmers with good skills can become proficient in about half a year, while those who learn slower may take over a year. More importantly, it depends on their interest; some people genuinely enjoy programming and will work on their personal projects after hours, allowing them to progress quickly.

Different team experiences cannot be entirely replicated. I am one of the founders of the company, so I determine the technical direction, and there are no issues with employee turnover.

I am very interested in using Python to solve most of the technical problems we encounter, including how to establish standards and train people to ensure code control and personnel improvement.

In summary, over the years, I have accumulated some technical and management experience, and to be honest, I feel less confident when it comes to using other languages.

Why Our Company Still Uses Python for Development

Follow 【Testing Development Technology】, add “Star” to get daily technical insights and grow together!

Leave a Comment Cancel reply

Follow 【Testing Development Technology】, add “Star” to get daily technical insights and grow together!

Related posts

Leave a Comment Cancel reply