Team BBL
Previous Page Next Page

3.14. fcntl Function

The fcntl function can change the properties of a file that is already open.

#include <fcntl.h>

int fcntl(int filedes, int cmd, ... /* int arg */ );

Returns: depends on cmd if OK (see following), 1 on error


In the examples in this section, the third argument is always an integer, corresponding to the comment in the function prototype just shown. But when we describe record locking in Section 14.3, the third argument becomes a pointer to a structure.

The fcntl function is used for five different purposes.

  1. Duplicate an existing descriptor (cmd = F_DUPFD)

  2. Get/set file descriptor flags (cmd = F_GETFD or F_SETFD)

  3. Get/set file status flags (cmd = F_GETFL or F_SETFL)

  4. Get/set asynchronous I/O ownership (cmd = F_GETOWN or F_SETOWN)

  5. Get/set record locks (cmd = F_GETLK, F_SETLK, or F_SETLKW)

We'll now describe the first seven of these ten cmd values. (We'll wait until Section 14.3 to describe the last three, which deal with record locking.) Refer to Figure 3.6, since we'll be referring to both the file descriptor flags associated with each file descriptor in the process table entry and the file status flags associated with each file table entry.

F_DUPFD

Duplicate the file descriptor filedes. The new file descriptor is returned as the value of the function. It is the lowest-numbered descriptor that is not already open, that is greater than or equal to the third argument (taken as an integer). The new descriptor shares the same file table entry as filedes. (Refer to Figure 3.8.) But the new descriptor has its own set of file descriptor flags, and its FD_CLOEXEC file descriptor flag is cleared. (This means that the descriptor is left open across an exec, which we discuss in Chapter 8.)

F_GETFD

Return the file descriptor flags for filedes as the value of the function. Currently, only one file descriptor flag is defined: the FD_CLOEXEC flag.

F_SETFD

Set the file descriptor flags for filedes. The new flag value is set from the third argument (taken as an integer).

Be aware that some existing programs that deal with the file descriptor flags don't use the constant FD_CLOEXEC. Instead, the programs set the flag to either 0 (don't close-on-exec, the default) or 1 (do close-on-exec).

F_GETFL

Return the file status flags for filedes as the value of the function. We described the file status flags when we described the open function. They are listed in Figure 3.9.


Figure 3.9. File status flags for fcntl

File status flag

Description

O_RDONLY

open for reading only

O_WRONLY

open for writing only

O_RDWR

open for reading and writing

O_APPEND

append on each write

O_NONBLOCK

nonblocking mode

O_SYNC

wait for writes to complete (data and attributes)

O_DSYNC

wait for writes to complete (data only)

O_RSYNC

synchronize reads and writes

O_FSYNC

wait for writes to complete (FreeBSD and Mac OS X only)

O_ASYNC

asynchronous I/O (FreeBSD and Mac OS X only)


 

Unfortunately, the three access-mode flagsO_RDONLY, O_WRONLY, and O_RDWRare not separate bits that can be tested. (As we mentioned earlier, these three often have the values 0, 1, and 2, respectively, for historical reasons. Also, these three values are mutually exclusive; a file can have only one of the three enabled.) Therefore, we must first use the O_ACCMODE mask to obtain the access-mode bits and then compare the result against any of the three values.

F_SETFL

Set the file status flags to the value of the third argument (taken as an integer). The only flags that can be changed are O_APPEND, O_NONBLOCK, O_SYNC, O_DSYNC, O_RSYNC, O_FSYNC, and O_ASYNC.

F_GETOWN

Get the process ID or process group ID currently receiving the SIGIO and SIGURG signals. We describe these asynchronous I/O signals in Section 14.6.2.

F_SETOWN

Set the process ID or process group ID to receive the SIGIO and SIGURG signals. A positive arg specifies a process ID. A negative arg implies a process group ID equal to the absolute value of arg.


The return value from fcntl depends on the command. All commands return 1 on an error or some other value if OK. The following four commands have special return values: F_DUPFD, F_GETFD, F_GETFL, and F_GETOWN. The first returns the new file descriptor, the next two return the corresponding flags, and the final one returns a positive process ID or a negative process group ID.

Example

The program in Figure 3.10 takes a single command-line argument that specifies a file descriptor and prints a description of selected file flags for that descriptor.

Note that we use the feature test macro _POSIX_C_SOURCE and conditionally compile the file access flags that are not part of POSIX.1. The following script shows the operation of the program, when invoked from bash (the Bourne-again shell). Results vary, depending on which shell you use.

     $ ./a.out 0 < /dev/tty
     read only
     $ ./a.out 1 > temp.foo
     $ cat temp.foo
     write only
     $ ./a.out 2 2>>temp.foo
     write only, append
     $ ./a.out 5 5<>temp.foo
     read write

The clause 5<>temp.foo opens the file temp.foo for reading and writing on file descriptor 5.

Figure 3.10. Print file flags for specified descriptor
#include "apue.h"
#include <fcntl.h>
int
main(int argc, char *argv[])
{

    int       val;

    if (argc != 2)
        err_quit("usage: a.out <descriptor#>");

    if ((val = fcntl(atoi(argv[1]), F_GETFL, 0)) < 0)
        err_sys("fcntl error for fd %d", atoi(argv[1]));

    switch (val & O_ACCMODE) {
    case O_RDONLY:
        printf("read only");
        break;

    case O_WRONLY:
        printf("write only");
        break;

    case O_RDWR:
        printf("read write");
        break;

    default:
        err_dump("unknown access mode");
    }

    if (val & O_APPEND)
        printf(", append");
    if (val & O_NONBLOCK)
        printf(", nonblocking");
#if defined(O_SYNC)
    if (val & O_SYNC)
        printf(", synchronous writes");
#endif
#if !defined(_POSIX_C_SOURCE) && defined(O_FSYNC)
    if (val & O_FSYNC)
        printf(", synchronous writes");
#endif
    putchar('\n');
    exit(0);
}

Example

When we modify either the file descriptor flags or the file status flags, we must be careful to fetch the existing flag value, modify it as desired, and then set the new flag value. We can't simply do an F_SETFD or an F_SETFL, as this could turn off flag bits that were previously set.

Figure 3.11 shows a function that sets one or more of the file status flags for a descriptor.

If we change the middle statement to

   val &= ~flags;          /* turn flags off */

we have a function named clr_fl, which we'll use in some later examples. This statement logically ANDs the one's complement of flags with the current val.

If we call set_fl from Figure 3.4 by adding the line

    set_fl(STDOUT_FILENO, O_SYNC);

at the beginning of the program, we'll turn on the synchronous-write flag. This causes each write to wait for the data to be written to disk before returning. Normally in the UNIX System, a write only queues the data for writing; the actual disk write operation can take place sometime later. A database system is a likely candidate for using O_SYNC, so that it knows on return from a write that the data is actually on the disk, in case of an abnormal system failure.

We expect the O_SYNC flag to increase the clock time when the program runs. To test this, we can run the program in Figure 3.4, copying 98.5 MB of data from one file on disk to another and compare this with a version that does the same thing with the O_SYNC flag set. The results from a Linux system using the ext2 file system are shown in Figure 3.12.

The six rows in Figure 3.12 were all measured with a BUFFSIZE of 4,096. The results in Figure 3.5 were measured reading a disk file and writing to /dev/null, so there was no disk output. The second row in Figure 3.12 corresponds to reading a disk file and writing to another disk file. This is why the first and second rows in Figure 3.12 are different. The system time increases when we write to a disk file, because the kernel now copies the data from our process and queues the data for writing by the disk driver. We expect the clock time to increase also when we write to a disk file, but it doesn't increase significantly for this test, which indicates that our writes go to the system cache, and we don't measure the cost to actually write the data to disk.

When we enable synchronous writes, the system time and the clock time should increase significantly. As the third row shows, the time for writing synchronously is about the same as when we used delayed writes. This implies that the Linux ext2 file system isn't honoring the O_SYNC flag. This suspicion is supported by the sixth line, which shows that the time to do synchronous writes followed by a call to fsync is just as large as calling fsync after writing the file without synchronous writes (line 5). After writing a file synchronously, we expect that a call to fsync will have no effect.

Figure 3.13 shows timing results for the same tests on Mac OS X 10.3. Note that the times match our expectations: synchronous writes are far more expensive than delayed writes, and using fsync with synchronous writes makes no measurable difference. Note also that adding a call to fsync at the end of the delayed writes makes no measurable difference. It is likely that the operating system flushed previously written data to disk as we were writing new data to the file, so by the time that we called fsync, very little work was left to be done.

Compare fsync and fdatasync, which update a file's contents when we say so, with the O_SYNC flag, which updates a file's contents every time we write to the file.

Figure 3.11. Turn on one or more of the file status flags for a descriptor
#include "apue.h"
#include <fcntl.h>

void
set_fl(int fd, int flags) /* flags are file status flags to turn on */
{
    int     val;

    if ((val = fcntl(fd, F_GETFL, 0)) < 0)
        err_sys("fcntl F_GETFL error");

    val |= flags;       /* turn on flags */

    if (fcntl(fd, F_SETFL, val) < 0)
        err_sys("fcntl F_SETFL error");
}

Figure 3.12. Linux ext2 timing results using various synchronization mechanisms

Operation

User CPU (seconds)

System CPU (seconds)

Clock time (seconds)

read time from Figure 3.5 for BUFFSIZE = 4,096

0.03

0.16

6.86

normal write to disk file

0.02

0.30

6.87

write to disk file with O_SYNC set

0.03

0.30

6.83

write to disk followed by fdatasync

0.03

0.42

18.28

write to disk followed by fsync

0.03

0.37

17.95

write to disk with O_SYNC set followed by fsync

0.05

0.44

17.95


Figure 3.13. Mac OS X timing results using various synchronization mechanisms

Operation

User CPU (seconds)

System CPU (seconds)

Clock time (seconds)

write to /dev/null

0.06

0.79

4.33

normal write to disk file

0.05

3.56

14.40

write to disk file with O_FSYNC set

0.13

9.53

22.48

write to disk followed by fsync

0.11

3.31

14.12

write to disk with O_FSYNC set followed by fsync

0.17

9.14

22.12


With this example, we see the need for fcntl. Our program operates on a descriptor (standard output), never knowing the name of the file that was opened by the shell on that descriptor. We can't set the O_SYNC flag when the file is opened, since the shell opened the file. With fcntl, we can modify the properties of a descriptor, knowing only the descriptor for the open file. We'll see another need for fcntl when we describe nonblocking pipes (Section 15.2), since all we have with a pipe is a descriptor.

    Team BBL
    Previous Page Next Page