Discussing the Linux Signal Mechanism in .NET Dumps

1. Background

1. Storytelling

When a .NET application crashes on Linux, we can configure some references to obtain the corresponding core file. After obtaining the core file, we can open it with windbg, and often see a message like this: Signal SIGABRT code SI_USER (Sent by kill, sigsend, raise), as shown below:

(1.1d): Signal SIGABRT code SI_USER (Sent by kill, sigsend, raise)
libc_so!wait4+0x57:
00007fbd`09313c17 483d00f0ffff    cmp     rax,0FFFFFFFFFFFFF000h
0:023> ? 1d
Evaluate expression: 29 = 00000000`0000001d
0:023> ~29s
*** WARNING: Unable to verify timestamp for libSystem.Native.so
libc_so!read+0x4c:
00007fbd`0933829c 483d00f0ffff    cmp     rax,0FFFFFFFFFFFFF000h

Literally, it indicates that the kill, sigsend, raise functions sent a SIGABRT signal with the SI_USER code, which seems related to the Linux signal mechanism. But what does it specifically mean? This is what we will discuss in this article.

2. Linux Signal Mechanism

1. Introduction to Signal Mechanism

In simple terms, Linux signals are a form of inter-process communication mechanism that can roughly do three things.

Notify a process that a certain event has occurred, such as a segmentation fault.
Allow processes to send simple messages to each other.
Control process behavior, such as terminating, pausing, continuing, etc.

There are over 60 signals on Linux, and 11 of them can generate core files by default, which is what we are most concerned about. They are summarized in the table below:

Signal Name	Signal Number	Description
SIGQUIT	3	Usually triggered by Ctrl+\
SIGILL	4	Illegal instruction
SIGABRT	6	Generated by the abort() function
SIGFPE	8	Floating point exception
SIGSEGV	11	Segmentation fault (illegal memory access)
SIGBUS	7	Bus error (memory access alignment issues, etc.)
SIGSYS	31	Invalid system call
SIGTRAP	5	Trace/breakpoint trap
SIGXCPU	24	Exceeded CPU time limit
SIGXFSZ	25	Exceeded file size limit
SIGEMT	7	EMT instruction (on certain architectures)

With this foundation, we can interpret the statement Signal SIGABRT code SI_USER (Sent by kill, sigsend, raise) more accurately.

1) SIGABRT

Full name signal abort, it is a signal that can generate a core dump.

2) SI_USER

In the Linux source code, there is a line of code that states:(type == PIDTYPE_PID) ? SI_TKILL : SI_USER, as shown below:

static void prepare_kill_siginfo(int sig, struct kernel_siginfo *info,enum pid_type type)
{
 clear_siginfo(info);
 info->si_signo = sig;
 info->si_errno = 0;
 info->si_code = (type == PIDTYPE_PID) ? SI_TKILL : SI_USER;
 info->si_pid = task_tgid_vnr(current);
 info->si_uid = from_kuid_munged(current_user_ns(), current_uid());
}

The kernel_siginfo.si_code field in the code indicates the source of the signal. For example, SI_USER indicates that the signal comes from a user process, while SI_TKILL indicates that the signal comes from the tgkill, tkill system calls.

3) kill, sigsend, raise

Those familiar with Linux should be very familiar with the kill and raise functions, as they comply with the POSIX standard. As for their differences, you can tell by their signatures…

/* Raise signal SIG, i.e., send SIG to yourself.  */
extern int raise (int __sig) __THROW;

/* Send signal SIG to process number PID.  If PID is zero,
   send SIG to all processes in the current process's process group.
   If PID is < -1, send SIG to all processes in process group - PID.  */
#ifdef __USE_POSIX
extern int kill (__pid_t __pid, int __sig) __THROW;
#endif /* Use POSIX.  */

In contrast to the previous functions, the sigsend function is not part of the POSIX standard and is only available on some Unix systems, such as Solaris and SunOS. However, it is still very powerful, as it can specify not only the pid but also the pid group and user to kill processes in bulk. Here is its signature:

int sigsend(idtype_t idtype, id_t id, int sig);

Summarizing this information, a more accurate interpretation is: Your program may have called kill(SIGABRT), raise(SIGABRT), or abort, leading to the program crash. Is that the case? You can use windbg’s ~* k to observe the call stack of each thread, and indeed, it can be found.

0:023> k
# Child-SP          RetAddr               Call Site
0000007fbd`03c62a70 00007fbd`090bf635     libc_so!wait4+0x57
0100007fbd`03c62aa0 00007fbd`090c0580     libcoreclr!PROCCreateCrashDump+0x275 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2307] 
0200007fbd`03c62b00 00007fbd`090be22f     libcoreclr!PROCCreateCrashDumpIfEnabled+0x770 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2524] 
0300007fbd`03c62b90 00007fbd`090be159 (T) libcoreclr!PROCAbort+0x2f [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2555] 
04 (Inline Function) --------`-------- (T) libcoreclr!PROCEndProcess+0x7c [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 1352] 
0500007fbd`03c62bb0 00007fbd`08db667f (T) libcoreclr!TerminateProcess+0x84 [/__w/1/s/src/coreclr/pal/inc/pal_mstypes.h @ 1249] 
...
0900007fbd`03c63950 00007fbd`08d4524e     libcoreclr!UMEntryThunk::Terminate+0x38 [/__w/1/s/src/coreclr/inc/clrtypes.h @ 260] 
0a (Inline Function) --------`--------     libcoreclr!InteropSyncBlockInfo::FreeUMEntryThunk+0x24 [/__w/1/s/src/coreclr/vm/syncblk.cpp @ 119] 
1900007fbd`03c63e30 00007fbd`092c91f5     libcoreclr!CorUnix::CPalThread::ThreadEntry+0x1fe [/__w/1/s/src/coreclr/pal/inc/pal.h @ 1763] 
1a 00007fbd`03c63ee0 00007fbd`09348b00     libc_so!pthread_condattr_setpshared+0x515
1b00007fbd`03c63f80 ffffffff`ffffffff     libc_so!_clone+0x40
1c 00007fbd`03c63f88 00000000`00000000     0xffffffff`ffffffff

In the above code, we see the libcoreclr!PROCAbort function, which is defined in coreclr as follows:

/*++
Function:
  PROCAbort()

  Aborts the process after calling the shutdown cleanup handler. This function
  should be called instead of calling abort() directly.

Parameters:
  signal - POSIX signal number

  Does not return
--*/
PAL_NORETURN
VOID
PROCAbort(int signal)
{
    // Do any shutdown cleanup before aborting or creating a core dump
    PROCNotifyProcessShutdown();

    PROCCreateCrashDumpIfEnabled(signal);

    // Restore the SIGABORT handler to prevent recursion
    SEHCleanupAbort();

    // Abort the process after waiting for the core dump to complete
    abort();
}

VOID PROCCreateCrashDumpIfEnabled(int signal, siginfo_t* siginfo, bool serialize)
{
// If enabled, launch the create minidump utility and wait until it completes
if (!g_argvCreateDump.empty())
 {
  std::vector<constchar*> argv(g_argvCreateDump);
    ...
 }
}

The logic in the code is very clear. Before aborting, it first calls the PROCCreateCrashDumpIfEnabled(signal) method to create a dump. This means that the information seen in the dump is filled using this method. You can observe the libcoreclr!g_argvCreateDump global variable, as shown below:

0:023> x libcoreclr!*g_argvCreateDump*
00007fbd`09192360 libcoreclr!g_argvCreateDump = {size=8}
0:023> dx -r1 (*((libcoreclr!std::vector<constchar *, std::allocator<constchar *> > *)0x7fbd09192360))
(*((libcoreclr!std::vector<constchar *, std::allocator<constchar *> > *)0x7fbd09192360))                 : {size=8} [Type: std::vector<constchar *, std::allocator<constchar *> >]
    [<Raw View>]     [Type: std::vector<constchar *, std::allocator<constchar *> >]
    [size]           : 8
    [capacity]       : 8
    [0]              : 0x5555b5d71140 : "/usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.15/createdump" [Type: char *]
    [1]              : 0x7fbd08b61d8f : "--name" [Type: char *]
    [2]              : 0x7ffd1b7e1cec : "/db/xxxx/crash.dmp" [Type: char *]
    [3]              : 0x7fbd08b6ce5f : "--full" [Type: char *]
    [4]              : 0x7fbd08b4c7ee : "--diag" [Type: char *]
    [5]              : 0x7fbd08b58630 : "--crashreport" [Type: char *]
    [6]              : 0x5555b5dd7230 : "1" [Type: char *]
    [7]              : 0x0 [Type: char *]

2. Seeing is Believing with C Code

To give everyone a more tangible understanding, we will demonstrate with C code how to generate a core file with the following configuration:

root@ubuntu2404:/data2# ulimit -c unlimited 
root@ubuntu2404:/data2# echo /data2/core-%e-%p-%t  | sudo tee /proc/sys/kernel/core_pattern
/data2/core-%e-%p-%t

After configuring, you can use any of the abort, kill, raise methods. Here, I will demonstrate using kill.

#include <stdio.h>
#include <signal.h>
#include <unistd.h>

void sig_handler(int signo, siginfo_t *info, void *context)
{
    fprintf(stderr, "Received signal: %d (sent by PID: %d, UID: %d)\n",
            signo, info->si_pid, info->si_uid);
}

int main()
{
    struct sigaction sa;

    sa.sa_sigaction = sig_handler;
    sa.sa_flags = SIGABRT;
    sigemptyset(&sa.sa_mask);

    if (sigaction(SIGSEGV, &sa, NULL) == -1)
    {
        perror("sigaction");
        return 1;
    }

    printf("My PID: %d\n", getpid());
    printf("Press Enter to send SIGABRT to myself...\n");
    getchar();

    kill(getpid(), SIGABRT);  // First method
    // raise(SIGABRT);        // Second method
    // abort();              // Third method

    printf("This line may not be reached.\n");
    return 0;
}

The terminal output is as follows:

root@ubuntu2404:/data2# ./app
My PID: 7403
Press Enter to send SIGABRT to myself...

Aborted (core dumped)
root@ubuntu2404:/data2# 
root@ubuntu2404:/data2# ls -lh
 total 160K
-rwxr-xr-x 1 root root  21K May 27 10:25 app
-rw-r--r-- 1 root root  813 May 27 10:25 app.c
-rw------- 1 root root 432K May 27 10:25 core-app-7403-1748312729

Using windbg to open the core-app-7403-1748312729 file, the familiar scene returns, haha. The screenshot is as follows:

Discussing the Linux Signal Mechanism in .NET Dumps

3. Conclusion

To analyze .NET application crashes on Linux, understanding the Linux signal mechanism is a fundamental requirement. The debugging journey is challenging…