Some notes on computer stuff code tags rss about

Why restoring signal handlers is important

October 5, 2014
[issue] [programming] [shell] [c] [gnu/linux] [vim]

I experienced strange failures of some commands in gvim (e.g. errors on closing window of :Gblame) for quite a while, but couldn't figure out why the hell their occur. Overall it probably took more than 20 hours to identify the issue and solve it not causing any other issues to arise. At the end of the day it took single line to solve the issue...

A lot of time was spent to identify when it happens, the answer is: when gvim is started by window manager (euclid-wm here). If there was some intermediate process (e.g. terminal, script, anything), the issue didn't occur. Simple way to test for the issue:

:!echo text

text

Command terminated

It looks like, echo is failing, but it works, so it must be shell, but it works too... What really is broken is communication of child's exit code to its parent process.

This line in euclid-wm's sources caused the issue:

//this is to avoid leaving zombies
signal(SIGCHLD, SIG_IGN);

And indeed, commenting this line out causes euclid-wm to leave a bunch of processes in zombie state, but it fixes the error of gvim! Need a way to both solve the issue and do not leave zombies hanging around.

Replacing signal(...) with proper handler for SIGCHLD didn't help. Loop calling waitpid() did nothing useful. Commenting out setsid() in spawn() function changed nothing (thought that something is wrong with associated controlling terminals). Forking one more time on spawn() had no effect. Not execing on spawn, closing standard file descriptors, explicit waiting for each child process, nothing helped. Searches on the Web, in books on GNU/Linux programming...

Literally nothing answered what could it be and how one can fix it, I still don't understand why it had so strange effect, but now the reason is at least clear enough: child processes inherit signal mask of their parent and shell feels bad when SIGCHLD is ignored.

The solution is to call

signal(SIGCHLD, SIG_DFL);

to restore default behaviour in each child process after forking, i.e.:

if (fork() == 0) { /* ... */
    signal(SIGCHLD, SIG_DFL);
    /* ... */
    exec(/* ... */);
}

The solution is trivial, but it surely isn't the most obvious one.