Recently, I have been busy with a requirement to downsize multiple cloud hosts. This operation covers four major middleware components: Redis, RocketMQ, RabbitMQ, and Nginx, as well as nearly 50 Tomcat services, involving critical business sectors such as orders, payments, and file storage. Due to the complexity of this environment, any oversight could affect service availability. I planned nearly a week for this, anticipating any possible unexpected issues and writing out solutions in advance, resulting in a full 10 pages of documentation. Upon reviewing it, the client remarked: “You truly are the right person for this!” Although many situations encountered are not universally applicable, the directional experiences gained during this process are worth sharing.

The most important point is: before making adjustments, do not rush to take action; instead, conduct a comprehensive assessment first. This is actually a universal principle; regardless of the size of daily operations, one must maintain a respectful attitude towards the production environment.

The first step is to thoroughly confirm application information (the operations involved are not complex, just basic commands, but do not underestimate their importance; this is a must-do): Since the system was originally deployed by someone else a long time ago, many details are unclear. Therefore, first use ps -aux | grep service_name to check the processes of each service, and use netstat -ntlp to record the listening ports (such as the port for Redis, RocketMQ, RabbitMQ), while also locating core configuration files using commands like whereis (e.g., Redis’s /etc/redis.conf, which is very important because you don’t know if the previous operator randomly specified a configuration file). These operations cannot be skipped; they are all to ensure that every step has a clear basis.The second step is critical data backup (the importance of backing up important data goes without saying; basically, everyone in this field understands this, and those who don’t are already running away!): Perform local backups of Redis persistent files, core Tomcat directories, etc., to prevent data loss during the downsizing process, adding a “double insurance” for business continuity. Since there are many Tomcat instances, it is best to write a shell script to automate this process to avoid missing anything.

The third step is where the real work begins: implementing the adjustments: Operate by categorizing service types, strictly controlling the process, and advancing the adjustments in the order of “middleware → Tomcat services → special tasks.” Each type of service follows the process of “stop service → downsizing operation → restart verification.” Do not be impatient during this step; do not assume anything. In a long-running production environment, some issues will be far stranger than you can imagine; experienced operators will have encountered this.

Next, I will briefly discuss the operations for each service and the points to pay attention to:

For Redis, the main task is to clarify the configuration file used in the current running environment. Use the official recommended command redis-cli –raw -p port -a password SHUTDOWN to stop the service. After executing, confirm that the process and listening port have disappeared using ps -aux | grep redis; after downsizing, start it with /usr/local/bin/redis-server /etc/redis.conf & to launch the service, and again use netstat -ntlp to verify that the port is listening normally, ensuring that the configuration file matches the startup command to avoid service anomalies due to configuration errors. These operations are not difficult; they are all basic tasks, and you can refer to a previous article: A Simple Tutorial on Redis.

For RocketMQ: To stop the service, first execute the custom script /app/rocketmq/bin/mystop.sh, then use mqshutdown broker and mqshutdown namesrv as a fallback to ensure the process is completely closed; for starting and optimizing: after downsizing, use /app/rocketmq/bin/mystart.sh to start it, and focus on verifying whether the port is listening normally to ensure the service can provide capabilities externally.

For RabbitMQ: Status check: Use systemctl status rabbitmq-server to confirm the service running status; for start/stop operations: stop with systemctl stop rabbitmq-server, and start after downsizing with systemctl start rabbitmq-server, finally verifying that the ports recorded before the adjustments are restored to listening, ensuring the stability of the message queue service.

For Nginx: Control the order of start/stop to reduce error risks. The machine where Nginx is located is associated with three front-end Tomcat services, so the order of operations must be particularly careful: stop the service with nginx/sbin/nginx -s stop to ensure the service is completely closed; for the startup strategy: after downsizing, prioritize starting Nginx (command: /app/nginx/sbin/nginx -c /app/nginx/conf/nginx.conf), then start the associated Tomcat to avoid errors in Tomcat due to Nginx not being ready, which would affect front-end service access.

Tomcat service adjustments (this is the focus of this requirement): standardization while accommodating personalized needs.

This adjustment involves multiple groups of Tomcat services (ranging from 1 to 2 per machine, with ports covering 7001, 7002, 7003, 7005, 7006, 7007, 7008, 7009, 7010, 7011, etc.), and the operational process is basically the same. For start/stop operations, stop using tomcat directory/bin/shutdown.sh, and start using tomcat directory/bin/startup.sh, confirming the port status with netstat -ntlp after each operation to ensure the service is completely stopped and started.

Additionally, there is a small requirement to add automatic deletion of business logs to prevent insufficient disk space. This is relatively simple; you can refer to a previous article: Practical Linux Operations: Multi-Dimensional Strategies to Prevent Disk Full from Logs.

Finally, to reiterate, the requirements I received this time are all basic operations and do not involve overly advanced technology. The difficulty lies in how quickly you can handle issues if unexpected situations arise; this is what the client values. Otherwise, you can find someone to operate anywhere. In summary, the only way to confidently take on such requirements is to solidify your foundational knowledge.

The most important point is: before making adjustments, do not rush to take action; instead, conduct a comprehensive assessment first. This is actually a universal principle; regardless of the size of daily operations, one must maintain a respectful attitude towards the production environment.

Next, I will briefly discuss the operations for each service and the points to pay attention to:

Tomcat service adjustments (this is the focus of this requirement): standardization while accommodating personalized needs.

Related posts

Leave a Comment Cancel reply