1. Background
1. Storytelling
When a <span>.NET application</span> crashes on Linux, we can configure some references to obtain the corresponding core file. After obtaining the core file, we can open it with windbg, and often see a message like this: <span>Signal SIGABRT code SI_USER (Sent by kill, sigsend, raise)</span>, as shown below:
(1.1d): Signal SIGABRT code SI_USER (Sent by kill, sigsend, raise)
libc_so!wait4+0x57:
00007fbd`09313c17 483d00f0ffff cmp rax,0FFFFFFFFFFFFF000h
0:023> ? 1d
Evaluate expression: 29 = 00000000`0000001d
0:023> ~29s
*** WARNING: Unable to verify timestamp for libSystem.Native.so
libc_so!read+0x4c:
00007fbd`0933829c 483d00f0ffff cmp rax,0FFFFFFFFFFFFF000h
Literally, it indicates that the <span>kill, sigsend, raise</span> functions sent a SIGABRT signal with the SI_USER code, which seems related to the Linux signal mechanism. But what does it specifically mean? This is what we will discuss in this article.
2. Linux Signal Mechanism
1. Introduction to Signal Mechanism
In simple terms, <span>Linux signals</span> are a form of inter-process communication mechanism that can roughly do three things.
- Notify a process that a certain event has occurred, such as a segmentation fault.
- Allow processes to send simple messages to each other.
- Control process behavior, such as terminating, pausing, continuing, etc.
There are over 60 signals on Linux, and 11 of them can generate core files by default, which is what we are most concerned about. They are summarized in the table below:
| Signal Name | Signal Number | Description |
|---|---|---|
| SIGQUIT | 3 | Usually triggered by Ctrl+\ |
| SIGILL | 4 | Illegal instruction |
| SIGABRT | 6 | Generated by the abort() function |
| SIGFPE | 8 | Floating point exception |
| SIGSEGV | 11 | Segmentation fault (illegal memory access) |
| SIGBUS | 7 | Bus error (memory access alignment issues, etc.) |
| SIGSYS | 31 | Invalid system call |
| SIGTRAP | 5 | Trace/breakpoint trap |
| SIGXCPU | 24 | Exceeded CPU time limit |
| SIGXFSZ | 25 | Exceeded file size limit |
| SIGEMT | 7 | EMT instruction (on certain architectures) |
With this foundation, we can interpret the statement <span>Signal SIGABRT code SI_USER (Sent by kill, sigsend, raise)</span> more accurately.
1) SIGABRT
Full name signal abort, it is a signal that can generate a core dump.
2) SI_USER
In the Linux source code, there is a line of code that states:<span>(type == PIDTYPE_PID) ? SI_TKILL : SI_USER</span>, as shown below:
static void prepare_kill_siginfo(int sig, struct kernel_siginfo *info,enum pid_type type)
{
clear_siginfo(info);
info->si_signo = sig;
info->si_errno = 0;
info->si_code = (type == PIDTYPE_PID) ? SI_TKILL : SI_USER;
info->si_pid = task_tgid_vnr(current);
info->si_uid = from_kuid_munged(current_user_ns(), current_uid());
}
The <span>kernel_siginfo.si_code</span> field in the code indicates the source of the signal. For example, <span>SI_USER</span> indicates that the signal comes from a user process, while <span>SI_TKILL</span> indicates that the signal comes from the <span>tgkill, tkill</span> system calls.
3) kill, sigsend, raise
Those familiar with Linux should be very familiar with the <span>kill</span> and <span>raise</span> functions, as they comply with the <span>POSIX</span> standard. As for their differences, you can tell by their signatures…
/* Raise signal SIG, i.e., send SIG to yourself. */
extern int raise (int __sig) __THROW;
/* Send signal SIG to process number PID. If PID is zero,
send SIG to all processes in the current process's process group.
If PID is < -1, send SIG to all processes in process group - PID. */
#ifdef __USE_POSIX
extern int kill (__pid_t __pid, int __sig) __THROW;
#endif /* Use POSIX. */
In contrast to the previous functions, the <span>sigsend</span> function is not part of the <span>POSIX</span> standard and is only available on some Unix systems, such as Solaris and SunOS. However, it is still very powerful, as it can specify not only the pid but also the pid group and user to kill processes in bulk. Here is its signature:
int sigsend(idtype_t idtype, id_t id, int sig);
Summarizing this information, a more accurate interpretation is: <span>Your program may have called kill(SIGABRT), raise(SIGABRT), or abort, leading to the program crash.</span> Is that the case? You can use windbg’s <span>~* k</span> to observe the call stack of each thread, and indeed, it can be found.
0:023> k
# Child-SP RetAddr Call Site
0000007fbd`03c62a70 00007fbd`090bf635 libc_so!wait4+0x57
0100007fbd`03c62aa0 00007fbd`090c0580 libcoreclr!PROCCreateCrashDump+0x275 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2307]
0200007fbd`03c62b00 00007fbd`090be22f libcoreclr!PROCCreateCrashDumpIfEnabled+0x770 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2524]
0300007fbd`03c62b90 00007fbd`090be159 (T) libcoreclr!PROCAbort+0x2f [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2555]
04 (Inline Function) --------`-------- (T) libcoreclr!PROCEndProcess+0x7c [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 1352]
0500007fbd`03c62bb0 00007fbd`08db667f (T) libcoreclr!TerminateProcess+0x84 [/__w/1/s/src/coreclr/pal/inc/pal_mstypes.h @ 1249]
...
0900007fbd`03c63950 00007fbd`08d4524e libcoreclr!UMEntryThunk::Terminate+0x38 [/__w/1/s/src/coreclr/inc/clrtypes.h @ 260]
0a (Inline Function) --------`-------- libcoreclr!InteropSyncBlockInfo::FreeUMEntryThunk+0x24 [/__w/1/s/src/coreclr/vm/syncblk.cpp @ 119]
1900007fbd`03c63e30 00007fbd`092c91f5 libcoreclr!CorUnix::CPalThread::ThreadEntry+0x1fe [/__w/1/s/src/coreclr/pal/inc/pal.h @ 1763]
1a 00007fbd`03c63ee0 00007fbd`09348b00 libc_so!pthread_condattr_setpshared+0x515
1b00007fbd`03c63f80 ffffffff`ffffffff libc_so!_clone+0x40
1c 00007fbd`03c63f88 00000000`00000000 0xffffffff`ffffffff
In the above code, we see the <span>libcoreclr!PROCAbort</span> function, which is defined in coreclr as follows:
/*++
Function:
PROCAbort()
Aborts the process after calling the shutdown cleanup handler. This function
should be called instead of calling abort() directly.
Parameters:
signal - POSIX signal number
Does not return
--*/
PAL_NORETURN
VOID
PROCAbort(int signal)
{
// Do any shutdown cleanup before aborting or creating a core dump
PROCNotifyProcessShutdown();
PROCCreateCrashDumpIfEnabled(signal);
// Restore the SIGABORT handler to prevent recursion
SEHCleanupAbort();
// Abort the process after waiting for the core dump to complete
abort();
}
VOID PROCCreateCrashDumpIfEnabled(int signal, siginfo_t* siginfo, bool serialize)
{
// If enabled, launch the create minidump utility and wait until it completes
if (!g_argvCreateDump.empty())
{
std::vector<constchar*> argv(g_argvCreateDump);
...
}
}
The logic in the code is very clear. Before aborting, it first calls the <span>PROCCreateCrashDumpIfEnabled(signal)</span> method to create a dump. This means that the information seen in the dump is filled using this method. You can observe the <span>libcoreclr!g_argvCreateDump</span> global variable, as shown below:
0:023> x libcoreclr!*g_argvCreateDump*
00007fbd`09192360 libcoreclr!g_argvCreateDump = {size=8}
0:023> dx -r1 (*((libcoreclr!std::vector<constchar *, std::allocator<constchar *> > *)0x7fbd09192360))
(*((libcoreclr!std::vector<constchar *, std::allocator<constchar *> > *)0x7fbd09192360)) : {size=8} [Type: std::vector<constchar *, std::allocator<constchar *> >]
[<Raw View>] [Type: std::vector<constchar *, std::allocator<constchar *> >]
[size] : 8
[capacity] : 8
[0] : 0x5555b5d71140 : "/usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.15/createdump" [Type: char *]
[1] : 0x7fbd08b61d8f : "--name" [Type: char *]
[2] : 0x7ffd1b7e1cec : "/db/xxxx/crash.dmp" [Type: char *]
[3] : 0x7fbd08b6ce5f : "--full" [Type: char *]
[4] : 0x7fbd08b4c7ee : "--diag" [Type: char *]
[5] : 0x7fbd08b58630 : "--crashreport" [Type: char *]
[6] : 0x5555b5dd7230 : "1" [Type: char *]
[7] : 0x0 [Type: char *]
2. Seeing is Believing with C Code
To give everyone a more tangible understanding, we will demonstrate with C code how to generate a core file with the following configuration:
root@ubuntu2404:/data2# ulimit -c unlimited
root@ubuntu2404:/data2# echo /data2/core-%e-%p-%t | sudo tee /proc/sys/kernel/core_pattern
/data2/core-%e-%p-%t
After configuring, you can use any of the <span>abort, kill, raise</span> methods. Here, I will demonstrate using <span>kill</span>.
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
void sig_handler(int signo, siginfo_t *info, void *context)
{
fprintf(stderr, "Received signal: %d (sent by PID: %d, UID: %d)\n",
signo, info->si_pid, info->si_uid);
}
int main()
{
struct sigaction sa;
sa.sa_sigaction = sig_handler;
sa.sa_flags = SIGABRT;
sigemptyset(&sa.sa_mask);
if (sigaction(SIGSEGV, &sa, NULL) == -1)
{
perror("sigaction");
return 1;
}
printf("My PID: %d\n", getpid());
printf("Press Enter to send SIGABRT to myself...\n");
getchar();
kill(getpid(), SIGABRT); // First method
// raise(SIGABRT); // Second method
// abort(); // Third method
printf("This line may not be reached.\n");
return 0;
}
The terminal output is as follows:
root@ubuntu2404:/data2# ./app
My PID: 7403
Press Enter to send SIGABRT to myself...
Aborted (core dumped)
root@ubuntu2404:/data2#
root@ubuntu2404:/data2# ls -lh
total 160K
-rwxr-xr-x 1 root root 21K May 27 10:25 app
-rw-r--r-- 1 root root 813 May 27 10:25 app.c
-rw------- 1 root root 432K May 27 10:25 core-app-7403-1748312729
Using windbg to open the core-app-7403-1748312729 file, the familiar scene returns, haha. The screenshot is as follows:

3. Conclusion
To analyze .NET application crashes on Linux, understanding the <span>Linux signal mechanism</span> is a fundamental requirement. The debugging journey is challenging…