3.6. lseek FunctionEvery open file has an associated "current file offset," normally a non-negative integer that measures the number of bytes from the beginning of the file. (We describe some exceptions to the "non-negative" qualifier later in this section.) Read and write operations normally start at the current file offset and cause the offset to be incremented by the number of bytes read or written. By default, this offset is initialized to 0 when a file is opened, unless the O_APPEND option is specified. An open file's offset can be set explicitly by calling lseek.
The interpretation of the offset depends on the value of the whence argument.
Because a successful call to lseek returns the new file offset, we can seek zero bytes from the current position to determine the current offset: off_t currpos; currpos = lseek(fd, 0, SEEK_CUR); This technique can also be used to determine if a file is capable of seeking. If the file descriptor refers to a pipe, FIFO, or socket, lseek sets errno to ESPIPE and returns 1.
ExampleThe program in Figure 3.1 tests its standard input to see whether it is capable of seeking. If we invoke this program interactively, we get $ ./a.out < /etc/motd seek OK $ cat < /etc/motd | ./a.out cannot seek $ ./a.out < /var/spool/cron/FIFO cannot seek Figure 3.1. Test whether standard input is capable of seeking#include "apue.h" int main(void) { if (lseek(STDIN_FILENO, 0, SEEK_CUR) == -1) printf("cannot seek\n"); else printf("seek OK\n"); exit(0); } Normally, a file's current offset must be a non-negative integer. It is possible, however, that certain devices could allow negative offsets. But for regular files, the offset must be non-negative. Because negative offsets are possible, we should be careful to compare the return value from lseek as being equal to or not equal to 1 and not test if it's less than 0.
lseek only records the current file offset within the kernelit does not cause any I/O to take place. This offset is then used by the next read or write operation. The file's offset can be greater than the file's current size, in which case the next write to the file will extend the file. This is referred to as creating a hole in a file and is allowed. Any bytes in a file that have not been written are read back as 0. A hole in a file isn't required to have storage backing it on disk. Depending on the file system implementation, when you write after seeking past the end of the file, new disk blocks might be allocated to store the data, but there is no need to allocate disk blocks for the data between the old end of file and the location where you start writing. ExampleThe program shown in Figure 3.2 creates a file with a hole in it. $ ./a.out $ ls -l file.hole check its size -rw-r--r-- 1 sar 16394 Nov 25 01:01 file.hole $ od -c file.hole let's look at the actual contents 0000000 a b c d e f g h i j \0 \0 \0 \0 \0 \0 0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0040000 A B C D E F G H I J 0040012 We use the od(1) command to look at the contents of the file. The -c flag tells it to print the contents as characters. We can see that the unwritten bytes in the middle are read back as zero. The seven-digit number at the beginning of each line is the byte offset in octal. To prove that there is really a hole in the file, let's compare the file we've just created with a file of the same size, but without holes: $ ls -ls file.hole file.nohole compare sizes 8 -rw-r--r-- 1 sar 16394 Nov 25 01:01 file.hole 20 -rw-r--r-- 1 sar 16394 Nov 25 01:03 file.nohole Although both files are the same size, the file without holes consumes 20 disk blocks, whereas the file with holes consumes only 8 blocks. In this example, we call the write function (Section 3.8). We'll have more to say about files with holes in Section 4.12. Figure 3.2. Create a file with a hole in it#include "apue.h" #include <fcntl.h> char buf1[] = "abcdefghij"; char buf2[] = "ABCDEFGHIJ"; int main(void) { int fd; if ((fd = creat("file.hole", FILE_MODE)) < 0) err_sys("creat error"); if (write(fd, buf1, 10) != 10) err_sys("buf1 write error"); /* offset now = 10 */ if (lseek(fd, 16384, SEEK_SET) == -1) err_sys("lseek error"); /* offset now = 16384 */ if (write(fd, buf2, 10) != 10) err_sys("buf2 write error"); /* offset now = 16394 */ exit(0); } Because the offset address that lseek uses is represented by an off_t, implementations are allowed to support whatever size is appropriate on their particular platform. Most platforms today provide two sets of interfaces to manipulate file offsets: one set that uses 32-bit file offsets and another set that uses 64-bit file offsets. The Single UNIX Specification provides a way for applications to determine which environments are supported through the sysconf function (Section 2.5.4.). Figure 3.3 summarizes the sysconf constants that are defined.
The c99 compiler requires that we use the getconf(1) command to map the desired data size model to the flags necessary to compile and link our programs. Different flags and libraries might be needed, depending on the environments supported by each platform.
Note that even though you might enable 64-bit file offsets, your ability to create a file larger than 2 TB (231-1 bytes) depends on the underlying file system type. |