Section 3.14. fcntl Function

#include <fcntl.h>

int fcntl(int filedes, int cmd, ... /* int arg */ );

Returns: depends on cmd if OK (see following), 1 on error

In the examples in this section, the third argument is always an integer, corresponding to the comment in the function prototype just shown. But when we describe record locking in Section 14.3, the third argument becomes a pointer to a structure.

Duplicate an existing descriptor (cmd = F_DUPFD)
Get/set file descriptor flags (cmd = F_GETFD or F_SETFD)
Get/set file status flags (cmd = F_GETFL or F_SETFL)
Get/set asynchronous I/O ownership (cmd = F_GETOWN or F_SETOWN)
Get/set record locks (cmd = F_GETLK, F_SETLK, or F_SETLKW)

We'll now describe the first seven of these ten cmd values. (We'll wait until Section 14.3 to describe the last three, which deal with record locking.) Refer to Figure 3.6, since we'll be referring to both the file descriptor flags associated with each file descriptor in the process table entry and the file status flags associated with each file table entry.

`F_DUPFD`	Duplicate the file descriptor filedes. The new file descriptor is returned as the value of the function. It is the lowest-numbered descriptor that is not already open, that is greater than or equal to the third argument (taken as an integer). The new descriptor shares the same file table entry as filedes. (Refer to Figure 3.8.) But the new descriptor has its own set of file descriptor flags, and its `FD_CLOEXEC` file descriptor flag is cleared. (This means that the descriptor is left open across an `exec`, which we discuss in Chapter 8.)
`F_GETFD`	Return the file descriptor flags for filedes as the value of the function. Currently, only one file descriptor flag is defined: the `FD_CLOEXEC` flag.
`F_SETFD`	Set the file descriptor flags for filedes. The new flag value is set from the third argument (taken as an integer). Be aware that some existing programs that deal with the file descriptor flags don't use the constant `FD_CLOEXEC`. Instead, the programs set the flag to either 0 (don't close-on-exec, the default) or 1 (do close-on-exec).
`F_GETFL`	Return the file status flags for filedes as the value of the function. We described the file status flags when we described the `open` function. They are listed in Figure 3.9.

Figure 3.9. File status flags for `fcntl`
File status flag	Description
`O_RDONLY`	open for reading only
`O_WRONLY`	open for writing only
`O_RDWR`	open for reading and writing
`O_APPEND`	append on each write
`O_NONBLOCK`	nonblocking mode
`O_SYNC`	wait for writes to complete (data and attributes)
`O_DSYNC`	wait for writes to complete (data only)
`O_RSYNC`	synchronize reads and writes
`O_FSYNC`	wait for writes to complete (FreeBSD and Mac OS X only)
`O_ASYNC`	asynchronous I/O (FreeBSD and Mac OS X only)

	Unfortunately, the three access-mode flags`O_RDONLY`, `O_WRONLY`, and `O_RDWR`are not separate bits that can be tested. (As we mentioned earlier, these three often have the values 0, 1, and 2, respectively, for historical reasons. Also, these three values are mutually exclusive; a file can have only one of the three enabled.) Therefore, we must first use the `O_ACCMODE` mask to obtain the access-mode bits and then compare the result against any of the three values.
`F_SETFL`	Set the file status flags to the value of the third argument (taken as an integer). The only flags that can be changed are `O_APPEND`, `O_NONBLOCK`, `O_SYNC`, `O_DSYNC`, `O_RSYNC`, `O_FSYNC`, and `O_ASYNC`.
`F_GETOWN`	Get the process ID or process group ID currently receiving the `SIGIO` and `SIGURG` signals. We describe these asynchronous I/O signals in Section 14.6.2.
`F_SETOWN`	Set the process ID or process group ID to receive the `SIGIO` and `SIGURG` signals. A positive arg specifies a process ID. A negative arg implies a process group ID equal to the absolute value of arg.

The return value from fcntl depends on the command. All commands return 1 on an error or some other value if OK. The following four commands have special return values: F_DUPFD, F_GETFD, F_GETFL, and F_GETOWN. The first returns the new file descriptor, the next two return the corresponding flags, and the final one returns a positive process ID or a negative process group ID.

Example

The program in Figure 3.10 takes a single command-line argument that specifies a file descriptor and prints a description of selected file flags for that descriptor.

Note that we use the feature test macro _POSIX_C_SOURCE and conditionally compile the file access flags that are not part of POSIX.1. The following script shows the operation of the program, when invoked from bash (the Bourne-again shell). Results vary, depending on which shell you use.

The clause 5<>temp.foo opens the file temp.foo for reading and writing on file descriptor 5.

Figure 3.10. Print file flags for specified descriptor

Example

When we modify either the file descriptor flags or the file status flags, we must be careful to fetch the existing flag value, modify it as desired, and then set the new flag value. We can't simply do an F_SETFD or an F_SETFL, as this could turn off flag bits that were previously set.

Figure 3.11 shows a function that sets one or more of the file status flags for a descriptor.

we have a function named clr_fl, which we'll use in some later examples. This statement logically ANDs the one's complement of flags with the current val.

at the beginning of the program, we'll turn on the synchronous-write flag. This causes each write to wait for the data to be written to disk before returning. Normally in the UNIX System, a write only queues the data for writing; the actual disk write operation can take place sometime later. A database system is a likely candidate for using O_SYNC, so that it knows on return from a write that the data is actually on the disk, in case of an abnormal system failure.

We expect the O_SYNC flag to increase the clock time when the program runs. To test this, we can run the program in Figure 3.4, copying 98.5 MB of data from one file on disk to another and compare this with a version that does the same thing with the O_SYNC flag set. The results from a Linux system using the ext2 file system are shown in Figure 3.12.

The six rows in Figure 3.12 were all measured with a BUFFSIZE of 4,096. The results in Figure 3.5 were measured reading a disk file and writing to /dev/null, so there was no disk output. The second row in Figure 3.12 corresponds to reading a disk file and writing to another disk file. This is why the first and second rows in Figure 3.12 are different. The system time increases when we write to a disk file, because the kernel now copies the data from our process and queues the data for writing by the disk driver. We expect the clock time to increase also when we write to a disk file, but it doesn't increase significantly for this test, which indicates that our writes go to the system cache, and we don't measure the cost to actually write the data to disk.

When we enable synchronous writes, the system time and the clock time should increase significantly. As the third row shows, the time for writing synchronously is about the same as when we used delayed writes. This implies that the Linux ext2 file system isn't honoring the O_SYNC flag. This suspicion is supported by the sixth line, which shows that the time to do synchronous writes followed by a call to fsync is just as large as calling fsync after writing the file without synchronous writes (line 5). After writing a file synchronously, we expect that a call to fsync will have no effect.

Figure 3.13 shows timing results for the same tests on Mac OS X 10.3. Note that the times match our expectations: synchronous writes are far more expensive than delayed writes, and using fsync with synchronous writes makes no measurable difference. Note also that adding a call to fsync at the end of the delayed writes makes no measurable difference. It is likely that the operating system flushed previously written data to disk as we were writing new data to the file, so by the time that we called fsync, very little work was left to be done.

Compare fsync and fdatasync, which update a file's contents when we say so, with the O_SYNC flag, which updates a file's contents every time we write to the file.

Figure 3.11. Turn on one or more of the file status flags for a descriptor

Figure 3.12. Linux `ext2` timing results using various synchronization mechanisms
Operation	User CPU (seconds)	System CPU (seconds)	Clock time (seconds)
read time from Figure 3.5 for `BUFFSIZE` = 4,096	0.03	0.16	6.86
normal `write` to disk file	0.02	0.30	6.87
`write` to disk file with `O_SYNC` set	0.03	0.30	6.83
`write` to disk followed by `fdatasync`	0.03	0.42	18.28
`write` to disk followed by `fsync`	0.03	0.37	17.95
`write` to disk with `O_SYNC` set followed by `fsync`	0.05	0.44	17.95

Figure 3.13. Mac OS X timing results using various synchronization mechanisms
Operation	User CPU (seconds)	System CPU (seconds)	Clock time (seconds)
`write` to `/dev/null`	0.06	0.79	4.33
normal `write` to disk file	0.05	3.56	14.40
`write` to disk file with `O_FSYNC` set	0.13	9.53	22.48
`write` to disk followed by `fsync`	0.11	3.31	14.12
`write` to disk with `O_FSYNC` set followed by `fsync`	0.17	9.14	22.12

With this example, we see the need for fcntl. Our program operates on a descriptor (standard output), never knowing the name of the file that was opened by the shell on that descriptor. We can't set the O_SYNC flag when the file is opened, since the shell opened the file. With fcntl, we can modify the properties of a descriptor, knowing only the descriptor for the open file. We'll see another need for fcntl when we describe nonblocking pipes (Section 15.2), since all we have with a pipe is a descriptor.

3.14. `fcntl` Function

Figure 3.9. File status flags for `fcntl`

Example

Figure 3.10. Print file flags for specified descriptor

Example

Figure 3.11. Turn on one or more of the file status flags for a descriptor

Figure 3.12. Linux `ext2` timing results using various synchronization mechanisms

Figure 3.13. Mac OS X timing results using various synchronization mechanisms

3.14. fcntl Function

Figure 3.9. File status flags for fcntl

Example

Figure 3.10. Print file flags for specified descriptor

Example

Figure 3.11. Turn on one or more of the file status flags for a descriptor

Figure 3.12. Linux ext2 timing results using various synchronization mechanisms

Figure 3.13. Mac OS X timing results using various synchronization mechanisms

3.14. `fcntl` Function

Figure 3.9. File status flags for `fcntl`

Figure 3.12. Linux `ext2` timing results using various synchronization mechanisms