8.6. wait and waitpid FunctionsWhen a process terminates, either normally or abnormally, the kernel notifies the parent by sending the SIGCHLD signal to the parent. Because the termination of a child is an asynchronous eventit can happen at any time while the parent is runningthis signal is the asynchronous notification from the kernel to the parent. The parent can choose to ignore this signal, or it can provide a function that is called when the signal occurs: a signal handler. The default action for this signal is to be ignored. We describe these options in Chapter 10. For now, we need to be aware that a process that calls wait or waitpid can
If the process is calling wait because it received the SIGCHLD signal, we expect wait to return immediately. But if we call it at any random point in time, it can block.
The differences between these two functions are as follows.
If a child has already terminated and is a zombie, wait returns immediately with that child's status. Otherwise, it blocks the caller until a child terminates. If the caller blocks and has multiple children, wait returns when one terminates. We can always tell which child terminated, because the process ID is returned by the function. For both functions, the argument statloc is a pointer to an integer. If this argument is not a null pointer, the termination status of the terminated process is stored in the location pointed to by the argument. If we don't care about the termination status, we simply pass a null pointer as this argument. Traditionally, the integer status that these two functions return has been defined by the implementation, with certain bits indicating the exit status (for a normal return), other bits indicating the signal number (for an abnormal return), one bit to indicate whether a core file was generated, and so on. POSIX.1 specifies that the termination status is to be looked at using various macros that are defined in <sys/wait.h>. Four mutually exclusive macros tell us how the process terminated, and they all begin with WIF. Based on which of these four macros is true, other macros are used to obtain the exit status, signal number, and the like. The four mutually-exclusive macros are shown in Figure 8.4. We'll discuss how a process can be stopped in Section 9.8 when we discuss job control. ExampleThe function pr_exit in Figure 8.5 uses the macros from Figure 8.4 to print a description of the termination status. We'll call this function from numerous programs in the text. Note that this function handles the WCOREDUMP macro, if it is defined.
The program shown in Figure 8.6 calls the pr_exit function, demonstrating the various values for the termination status. If we run the program in Figure 8.6, we get
$ ./a.out
normal termination, exit status = 7
abnormal termination, signal number = 6 (core file generated)
abnormal termination, signal number = 8 (core file generated)
Unfortunately, there is no portable way to map the signal numbers from WTERMSIG into descriptive names. (See Section 10.21 for one method.) We have to look at the <signal.h> header to verify that SIGABRT has a value of 6 and that SIGFPE has a value of 8. Figure 8.5. Print a description of the exit status#include "apue.h" #include <sys/wait.h> void pr_exit(int status) { if (WIFEXITED(status)) printf("normal termination, exit status = %d\n", WEXITSTATUS(status)); else if (WIFSIGNALED(status)) printf("abnormal termination, signal number = %d%s\n", WTERMSIG(status), #ifdef WCOREDUMP WCOREDUMP(status) ? " (core file generated)" : ""); #else ""); #endif else if (WIFSTOPPED(status)) printf("child stopped, signal number = %d\n", WSTOPSIG(status)); } Figure 8.6. Demonstrate various exit statuses#include "apue.h" #include <sys/wait.h> int main(void) { pid_t pid; int status; if ((pid = fork()) < 0) err_sys("fork error"); else if (pid == 0) /* child */ exit(7); if (wait(&status) != pid) /* wait for child */ err_sys("wait error"); pr_exit(status); /* and print its status */ if ((pid = fork()) < 0) err_sys("fork error"); else if (pid == 0) /* child */ abort(); /* generates SIGABRT */ if (wait(&status) != pid) /* wait for child */ err_sys("wait error"); pr_exit(status); /* and print its status */ if ((pid = fork()) < 0) err_sys("fork error"); else if (pid == 0) /* child */ status /= 0; /* divide by 0 generates SIGFPE */ if (wait(&status) != pid) /* wait for child */ err_sys("wait error"); pr_exit(status); /* and print its status */ exit(0); } As we mentioned, if we have more than one child, wait returns on termination of any of the children. What if we want to wait for a specific process to terminate (assuming we know which process ID we want to wait for)? In older versions of the UNIX System, we would have to call wait and compare the returned process ID with the one we're interested in. If the terminated process wasn't the one we wanted, we would have to save the process ID and termination status and call wait again. We would need to continue doing this until the desired process terminated. The next time we wanted to wait for a specific process, we would go through the list of already terminated processes to see whether we had already waited for it, and if not, call wait again. What we need is a function that waits for a specific process. This functionality (and more) is provided by the POSIX.1 waitpid function. The interpretation of the pid argument for waitpid depends on its value:
The waitpid function returns the process ID of the child that terminated and stores the child's termination status in the memory location pointed to by statloc. With wait, the only real error is if the calling process has no children. (Another error return is possible, in case the function call is interrupted by a signal. We'll discuss this in Chapter 10.) With waitpid, however, it's also possible to get an error if the specified process or process group does not exist or is not a child of the calling process. The options argument lets us further control the operation of waitpid. This argument is either 0 or is constructed from the bitwise OR of the constants in Figure 8.7.
The waitpid function provides three features that aren't provided by the wait function.
ExampleRecall our discussion in Section 8.5 about zombie processes. If we want to write a process so that it forks a child but we don't want to wait for the child to complete and we don't want the child to become a zombie until we terminate, the trick is to call fork twice. The program in Figure 8.8 does this. We call sleep in the second child to ensure that the first child terminates before printing the parent process ID. After a fork, either the parent or the child can continue executing; we never know which will resume execution first. If we didn't put the second child to sleep, and if it resumed execution after the fork before its parent, the parent process ID that it printed would be that of its parent, not process ID 1. Executing the program in Figure 8.8 gives us
$ ./a.out
$ second child, parent pid = 1
Note that the shell prints its prompt when the original process terminates, which is before the second child prints its parent process ID. Figure 8.8. Avoid zombie processes by calling fork twice#include "apue.h" #include <sys/wait.h> int main(void) { pid_t pid; if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) { /* first child */ if ((pid = fork()) < 0) err_sys("fork error"); else if (pid > 0) exit(0); /* parent from second fork == first child */ /* * We're the second child; our parent becomes init as soon * as our real parent calls exit() in the statement above. * Here's where we'd continue executing, knowing that when * we're done, init will reap our status. */ sleep(2); printf("second child, parent pid = %d\n", getppid()); exit(0); } if (waitpid(pid, NULL, 0) != pid) /* wait for first child */ err_sys("waitpid error"); /* * We're the parent (the original process); we continue executing, * knowing that we're not the parent of the second child. */ exit(0); } |