Whether for databases or applications, when issues arise, knowing the exception stack information can greatly assist in identifying the root cause and fundamentally resolving the problem.
The stack information from MySQL can similarly aid in problem-solving. This article from the technical community, 《MySQL Fault Reporting Core Dump Collection Process》 introduces the process of collecting core dumps in MySQL, which can be attempted in practical work.
1. Configure Core Dump
A core file is an exception stack file retained when a program exits abnormally, helping developers quickly locate the issue. Therefore, for online programs, it is necessary to enable the core dump feature. This can be done by configuring core-file or coredumper to enable core dump functionality.
1.1 Core-file
We need to add the following configuration in the [mysqld] section of my.cnf.
[mysqld]
core-file
innodb_buffer_pool_in_core_file=OFF
By default, the generated core file will appear in the corresponding datadir directory.
Before starting mysqld_safe, the operating system also needs some configuration:
echo "core.%p.%e.%s" > /proc/sys/kernel/core_pattern
ulimit -c unlimited
sudo sysctl -w fs.suid_dumpable=2
1.2 Coredumper
If there are no root privileges to make global changes to the operating system, the following configuration can be used.
[mysqld]
coredumper
innodb_buffer_pool_in_core_file=OFF
Before starting mysqld_safe, the operating system also needs some configuration:
ulimit -c unlimited
This will only affect the MySQL user.
The core file will be generated in the datadir directory.
2. Preliminary Analysis On-Site
2.1 Analyze Error Log
First, search for the keyword <span>backtrace</span>
in the error log to find the exception stack.
67912 Build ID: 0eaf4b944b1dbc99a26f9343f301e033bdedeb1d
67913 Server Version: 8.0.25-15-mysqlcluster5.0.7-GA MySQL Cluster, Release GA, Revision b43d2b2e462
67914
67915 Thread pointer: 0x7f914ec27000
67916 Attempting backtrace. You can use the following information to find out
67917 where mysqld died. If you see no messages after this, something went
67918 terribly wrong...
67919 stack_bottom = 7f91608d8c30 thread_stack 0x46000
67920 /mysql/svr/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x200c33d]
67921 /mysql/svr/mysql/bin/mysqld(handle_fatal_signal+0x37b) [0x1169e4b]
67922 /lib64/libpthread.so.0(+0xf5d0) [0x7f91d95c55d0]
67923 /mysql/svr/mysql/lib/plugin/ha_mysql.so(mysql::ha_mysql::build_order_list_string(String&)+0x6c) [0x7f915951aa4c]
67924 /mysql/svr/mysql/lib/plugin/ha_mysql.so(mysql::ha_greatpart::rnd_init_low(bool)+0x1ac) [0x7f915959765c]
67925 /mysql/svr/mysql/bin/mysqld(handler::ha_rnd_init(bool)+0x22) [0xcc6102]
67926 /mysql/svr/mysql/bin/mysqld(TableScanIterator::Init()+0x4e) [0xeed66e]
67927 /mysql/svr/mysql/bin/mysqld(LimitOffsetIterator::Init()+0x16) [0x122afe6]
67928 /mysql/svr/mysql/bin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0x281) [0x10a95a1]
67929 /mysql/svr/mysql/bin/mysqld(Query_expression::execute(THD*)+0x2f) [0x10a994f]
67930 /mysql/svr/mysql/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x516) [0x1029376]
67931 /mysql/svr/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0x9e0) [0xfcce70]
67932 /mysql/svr/mysql/bin/mysqld(dispatch_sql_command(THD*, Parser_state*, bool)+0x4f1) [0xfd0961]
67933 /mysql/svr/mysql/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x1843) [0xfd26b3]
67934 /mysql/svr/mysql/bin/mysqld(do_command(THD*)+0x210) [0xfd3c70]
67935 /mysql/svr/mysql/bin/mysqld() [0x115ac90]
67936 /mysql/svr/mysql/bin/mysqld() [0x24cf38e]
67937 /lib64/libpthread.so.0(+0x7dd5) [0x7f91d95bddd5]
67938 /lib64/libc.so.6(clone+0x6d) [0x7f91d796dead]
67939
67940 Trying to get some variables.
67941 Some pointers may be invalid and cause the dump to abort.
67942 Query (7f913ef65028): select * from `grid-report`.`dp_bi_main_order_acccum` limit 0, 1000
67943 Connection ID (thread ID): 53
67944 Status: NOT_KILLED
From here, we can see the following information:
-
Query (7f913ef65028): select * from grid-report
.dp_bi_main_order_acccum
limit 0, 1000 statement caused the stack exception -
The core ended in the build_order_list_string function, with an offset of 0x6c within the function -
Server Version: 8.0.25-15-mysqlcluster5.0.7-GA MySQL Cluster, Release GA, Revision b43d2b2e462, here is the commit version number, we need to use the binary corresponding to this version number.
2.2 Debugging Core File
# gdb /mysqld所在目录/mysqld /core文件所在目录/corefile
// Enter gdb command line, type bt command to view the stack
(gdb) bt
#0 0x00007f6cd73619d1 in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000001169e7d in handle_fatal_signal (sig=11) at /builds/mysql-cluster/myrocks/sql/signal_handler.cc:194
#2 <signal handler called>
#3 operator() (__ptr=0x7f6c3636c71a, this=0x7f6c0e1b1f2c) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:79
#4 ~unique_ptr (this=0x7f6c0e1b1f2c, __in_chrg=<optimized out>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:361
#5 ~Gdb_execute_plan (this=0x7f6c0e1b1f2c, __in_chrg=<optimized out>) at /builds/mysql-cluster/myrocks/storage/mysql/gdb_execute_plan.h:69
#6 operator() (this=0x7f6c3635c548, __ptr=0x7f6c0e1b1f2c) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:85
#7 operator() (__ptr=0x7f6c0e1b1f2c, this=0x7f6c3635c548) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:79
#8 ~unique_ptr (this=0x7f6c3635c548, __in_chrg=<optimized out>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:361
#9 mysql::execute_direct_query(std::unique_ptr<mysql::Gdb_execute_node, std::default_delete<mysql::Gdb_execute_node> >&, mysql::Gdb_direct_exec_parameters*, unsigned long, bool) () at /builds/mysql-cluster/myrocks/storage/mysql/gdb_query.cc:1669
#10 0x00007f6c0e19b65c in mem_free (this=<optimized out>) at /builds/mysql-cluster/myrocks/include/sql_string.h:382
#11 ~String (this=<optimized out>, __in_chrg=<optimized out>) at /builds/mysql-cluster/myrocks/include/sql_string.h:235
#12 mysql::ha_greatpart::index_read_map_low(unsigned char const*, unsigned long, ha_rkey_function, unsigned int) ()
at /builds/mysql-cluster/myrocks/storage/mysql/ha_greatpart.cc:881
#13 0x0000000000cc6102 in handler::ha_rnd_init (this=0x7f6c720fb028, scan=<optimized out>) at /builds/mysql-cluster/myrocks/sql/handler.cc:3141
#14 0x0000000000eed66e in TableScanIterator::Init (this=0x7f6b7f486c50) at /builds/mysql-cluster/myrocks/sql/row_iterator.h:213
#15 0x000000000122afe6 in LimitOffsetIterator::Init (this=0x7f6b7f486c88)
at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:421
#16 0x00000000010a95a1 in Query_expression::ExecuteIteratorQuery(THD*) ()
at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:421
#17 0x00000000010a994f in Query_expression::execute(THD*) () at /builds/mysql-cluster/myrocks/sql/sql_union.cc:1310
#18 0x0000000001029376 in Sql_cmd_dml::execute(THD*) () at /builds/mysql-cluster/myrocks/sql/sql_select.cc:575
#19 0x0000000000fcce70 in mysql_execute_command(THD*, bool) () at /builds/mysql-cluster/myrocks/sql/sql_parse.cc:4725
#20 0x0000000000fd0961 in dispatch_sql_command (thd=thd@entry=0x7f6beb676000, parser_state=parser_state@entry=0x7f6c3636daa0,
update_userstat=update_userstat@entry=false) at /builds/mysql-cluster/myrocks/sql/sql_parse.cc:5321
#21 0x0000000000fd26b3 in dispatch_command(THD*, COM_DATA const*, enum_server_command) () at /builds/mysql-cluster/myrocks/sql/sql_parse.cc:1969
#22 0x0000000000fd3c70 in do_command (thd=0x7f6beb676000) at /builds/mysql-cluster/myrocks/sql/sql_parse.cc:1417
#23 0x000000000115ac90 in handle_connection (arg=arg@entry=0x7f6bed741340)
at /builds/mysql-cluster/myrocks/sql/conn_handler/connection_handler_per_thread.cc:307
#24 0x00000000024cf38e in pfs_spawn_thread (arg=0x7f6bed4b63e0) at /builds/mysql-cluster/myrocks/storage/perfschema/pfs.cc:2899
#25 0x00007f6cd735cdd5 in start_thread () from /lib64/libpthread.so.0
#26 0x00007f6cd570cead in clone () from /lib64/libc.so.6
This information needs to be captured in a screenshot. You can contact the development team for assistance in printing some variable information.
3. Information Collection
After the preliminary analysis above, the following files need to be collected and submitted to the development team for further analysis.
-
error log -
general log -
core file, if the development team has already collected it, it does not need to be submitted. -
Table creation statements and data for the query that caused the core dump (data needs to be checked if the development team requires it).
If the general log is large, the DBA needs to extract logs for 1 hour before the core dump or 100,000 lines of logs, which can be done using commands like grep, head, tail, etc.
-
Determine the time point of the core dump from the error log -
Locate the specific line number in the general log corresponding to the time point using grep -
Use head -n x general.log | tail -n 100000 > general.log.coredump
4. Reference Documents
-
MySQL is crashing: a support engineer’s point of view(https://www.percona.com/blog/2015/08/17/mysql-is-crashing-a-support-engineers-point-of-view/)
-
Say Hello to Libcoredumper(https://www.percona.com/blog/2020/10/28/say-hello-to-libcoredumper-a-new-way-to-generate-core-dumps-and-other-improvements/)
If you find this article helpful, please don’t hesitate to click the “like” and “see” at the end of the article, or directly share it on your social media.
