4.12. File SizeThe st_size member of the stat structure contains the size of the file in bytes. This field is meaningful only for regular files, directories, and symbolic links.
For a regular file, a file size of 0 is allowed. We'll get an end-of-file indication on the first read of the file. For a directory, the file size is usually a multiple of a number, such as 16 or 512. We talk about reading directories in Section 4.21. For a symbolic link, the file size is the number of bytes in the filename. For example, in the following case, the file size of 7 is the length of the pathname usr/lib: lrwxrwxrwx 1 root 7 Sep 25 07:14 lib -> usr/lib (Note that symbolic links do not contain the normal C null byte at the end of the name, as the length is always specified by st_size.) Most contemporary UNIX systems provide the fields st_blksize and st_blocks. The first is the preferred block size for I/O for the file, and the latter is the actual number of 512-byte blocks that are allocated. Recall from Section 3.9 that we encountered the minimum amount of time required to read a file when we used st_blksize for the read operations. The standard I/O library, which we describe in Chapter 5, also tries to read or write st_blksize bytes at a time, for efficiency.
Holes in a FileIn Section 3.6, we mentioned that a regular file can contain "holes." We showed an example of this in Figure 3.2. Holes are created by seeking past the current end of file and writing some data. As an example, consider the following: $ ls -l core -rw-r--r-- 1 sar 8483248 Nov 18 12:18 core $ du -s core 272 core The size of the file core is just over 8 MB, yet the du command reports that the amount of disk space used by the file is 272 512-byte blocks (139,264 bytes). (The du command on many BSD-derived systems reports the number of 1,024-byte blocks; Solaris reports the number of 512-byte blocks.) Obviously, this file has many holes. As we mentioned in Section 3.6, the read function returns data bytes of 0 for any byte positions that have not been written. If we execute the following, we can see that the normal I/O operations read up through the size of the file:
$ wc -c core
8483248 core
If we make a copy of this file, using a utility such as cat(1), all these holes are written out as actual data bytes of 0: $ cat core > core.copy $ ls -l core* -rw-r--r-- 1 sar 8483248 Nov 18 12:18 core -rw-rw-r-- 1 sar 8483248 Nov 18 12:27 core.copy $ du -s core* 272 core 16592 core.copy Here, the actual number of bytes used by the new file is 8,495,104 (512 x 16,592). The difference between this size and the size reported by ls is caused by the number of blocks used by the file system to hold pointers to the actual data blocks. Interested readers should refer to Section 4.2 of Bach [1986], Sections 7.2 and 7.3 of McKusick et al. [1996] (or Sections 8.2 and 8.3 in McKusick and Neville-Neil [2005]), and Section 14.2 of Mauro and McDougall [2001] for additional details on the physical layout of files. |