8.10. exec FunctionsWe mentioned in Section 8.3 that one use of the fork function is to create a new process (the child) that then causes another program to be executed by calling one of the exec functions. When a process calls one of the exec functions, that process is completely replaced by the new program, and the new program starts executing at its main function. The process ID does not change across an exec, because a new process is not created; exec merely replaces the current processits text, data, heap, and stack segmentswith a brand new program from disk. There are six different exec functions, but we'll often simply refer to "the exec function," which means that we could use any of the six functions. These six functions round out the UNIX System process control primitives. With fork, we can create new processes; and with the exec functions, we can initiate new programs. The exit function and the wait functions handle termination and waiting for termination. These are the only process control primitives we need. We'll use these primitives in later sections to build additional functions, such as popen and system.
The first difference in these functions is that the first four take a pathname argument, whereas the last two take a filename argument. When a filename argument is specified
The PATH variable contains a list of directories, called path prefixes, that are separated by colons. For example, the name=value environment string PATH=/bin:/usr/bin:/usr/local/bin/:. specifies four directories to search. The last path prefix specifies the current directory. (A zero-length prefix also means the current directory. It can be specified as a colon at the beginning of the value, two colons in a row, or a colon at the end of the value.)
If either execlp or execvp finds an executable file using one of the path prefixes, but the file isn't a machine executable that was generated by the link editor, the function assumes that the file is a shell script and tries to invoke /bin/sh with the filename as input to the shell. The next difference concerns the passing of the argument list (l stands for list and v stands for vector). The functions execl, execlp, and execle require each of the command-line arguments to the new program to be specified as separate arguments. We mark the end of the arguments with a null pointer. For the other three functions (execv, execvp, and execve), we have to build an array of pointers to the arguments, and the address of this array is the argument to these three functions. Before using ISO C prototypes, the normal way to show the command-line arguments for the three functions execl, execle, and execlp was char *arg0, char *arg1, ..., char *argn, (char *)0 This specifically shows that the final command-line argument is followed by a null pointer. If this null pointer is specified by the constant 0, we must explicitly cast it to a pointer; if we don't, it's interpreted as an integer argument. If the size of an integer is different from the size of a char *, the actual arguments to the exec function will be wrong. The final difference is the passing of the environment list to the new program. The two functions whose names end in an e (execle and execve) allow us to pass a pointer to an array of pointers to the environment strings. The other four functions, however, use the environ variable in the calling process to copy the existing environment for the new program. (Recall our discussion of the environment strings in Section 7.9 and Figure 7.8. We mentioned that if the system supported such functions as setenv and putenv, we could change the current environment and the environment of any subsequent child processes, but we couldn't affect the environment of the parent process.) Normally, a process allows its environment to be propagated to its children, but in some cases, a process wants to specify a certain environment for a child. One example of the latter is the login program when a new login shell is initiated. Normally, login creates a specific environment with only a few variables defined and lets us, through the shell start-up file, add variables to the environment when we log in. Before using ISO C prototypes, the arguments to execle were shown as char *pathname, char *arg0, ..., char *argn, (char *)0, char *envp[] This specifically shows that the final argument is the address of the array of character pointers to the environment strings. The ISO C prototype doesn't show this, as all the command-line arguments, the null pointer, and the envp pointer are shown with the ellipsis notation (...). The arguments for these six exec functions are difficult to remember. The letters in the function names help somewhat. The letter p means that the function takes a filename argument and uses the PATH environment variable to find the executable file. The letter l means that the function takes a list of arguments and is mutually exclusive with the letter v, which means that it takes an argv[] vector. Finally, the letter e means that the function takes an envp[] array instead of using the current environment. Figure 8.14 shows the differences among these six functions.
Every system has a limit on the total size of the argument list and the environment list. From Section 2.5.2 and Figure 2.8, this limit is given by ARG_MAX. This value must be at least 4,096 bytes on a POSIX.1 system. We sometimes encounter this limit when using the shell's filename expansion feature to generate a list of filenames. On some systems, for example, the command grep getrlimit /usr/share/man/*/* can generate a shell error of the form Argument list too long
To get around the limitation in argument list size, we can use the xargs(1) command to break up long argument lists. To look for all the occurrences of geTRlimit in the man pages on our system, we could use find /usr/share/man -type f -print | xargs grep getrlimit If the man pages on our system are compressed, however, we could try find /usr/share/man -type f -print | xargs bzgrep getrlimit We use the type -f option to the find command to restrict the list to contain only regular files, because the grep commands can't search for patterns in directories, and we want to avoid unnecessary error messages. We've mentioned that the process ID does not change after an exec, but the new program inherits additional properties from the calling process:
The handling of open files depends on the value of the close-on-exec flag for each descriptor. Recall from Figure 3.6 and our mention of the FD_CLOEXEC flag in Section 3.14 that every open descriptor in a process has a close-on-exec flag. If this flag is set, the descriptor is closed across an exec. Otherwise, the descriptor is left open across the exec. The default is to leave the descriptor open across the exec unless we specifically set the close-on-exec flag using fcntl. POSIX.1 specifically requires that open directory streams (recall the opendir function from Section 4.21) be closed across an exec. This is normally done by the opendir function calling fcntl to set the close-on-exec flag for the descriptor corresponding to the open directory stream. Note that the real user ID and the real group ID remain the same across the exec, but the effective IDs can change, depending on the status of the set-user-ID and the set- group-ID bits for the program file that is executed. If the set-user-ID bit is set for the new program, the effective user ID becomes the owner ID of the program file. Otherwise, the effective user ID is not changed (it's not set to the real user ID). The group ID is handled in the same way. In many UNIX system implementations, only one of these six functions, execve, is a system call within the kernel. The other five are just library functions that eventually invoke this system call. We can illustrate the relationship among these six functions as shown in Figure 8.15. Figure 8.15. Relationship of the six exec functionsIn this arrangement, the library functions execlp and execvp process the PATH environment variable, looking for the first path prefix that contains an executable file named filename. ExampleThe program in Figure 8.16 demonstrates the exec functions. We first call execle, which requires a pathname and a specific environment. The next call is to execlp, which uses a filename and passes the caller's environment to the new program. The only reason the call to execlp works is that the directory /home/sar/bin is one of the current path prefixes. Note also that we set the first argument, argv[0] in the new program, to be the filename component of the pathname. Some shells set this argument to be the complete pathname. This is a convention only. We can set argv[0] to any string we like. The login command does this when it executes the shell. Before executing the shell, login adds a dash as a prefix to argv[0] to indicate to the shell that it is being invoked as a login shell. A login shell will execute the start-up profile commands, whereas a nonlogin shell will not. The program echoall that is executed twice in the program in Figure 8.16 is shown in Figure 8.17. It is a trivial program that echoes all its command-line arguments and its entire environment list. When we execute the program from Figure 8.16, we get $ ./a.out argv[0]: echoall argv[1]: myarg1 argv[2]: MY ARG2 USER=unknown PATH=/tmp $ argv[0]: echoall argv[1]: only 1 arg USER=sar LOGNAME=sar SHELL=/bin/bash 47 more lines that aren't shown HOME=/home/sar Note that the shell prompt appeared before the printing of argv[0] from the second exec. This is because the parent did not wait for this child process to finish. Figure 8.16. Example of exec functions#include "apue.h" #include <sys/wait.h> char *env_init[] = { "USER=unknown", "PATH=/tmp", NULL }; int main(void) { pid_t pid; if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) { /* specify pathname, specify environment */ if (execle("/home/sar/bin/echoall", "echoall", "myarg1", "MY ARG2", (char *)0, env_init) < 0) err_sys("execle error"); } if (waitpid(pid, NULL, 0) < 0) err_sys("wait error"); if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) { /* specify filename, inherit environment */ if (execlp("echoall", "echoall", "only 1 arg", (char *)0) < 0) err_sys("execlp error"); } exit(0); } Figure 8.17. Echo all command-line arguments and all environment strings#include "apue.h" int main(int argc, char *argv[]) { int i; char **ptr; extern char **environ; for (i = 0; i < argc; i++) /* echo all command-line args */ printf("argv[%d]: %s\n", i, argv[i]); for (ptr = environ; *ptr != 0; ptr++) /* and all env strings */ printf("%s\n", *ptr); exit(0); } |