Team BBL
Previous Page Next Page

4.12. File Size

The st_size member of the stat structure contains the size of the file in bytes. This field is meaningful only for regular files, directories, and symbolic links.

Solaris also defines the file size for a pipe as the number of bytes that are available for reading from the pipe. We'll discuss pipes in Section 15.2.

For a regular file, a file size of 0 is allowed. We'll get an end-of-file indication on the first read of the file.

For a directory, the file size is usually a multiple of a number, such as 16 or 512. We talk about reading directories in Section 4.21.

For a symbolic link, the file size is the number of bytes in the filename. For example, in the following case, the file size of 7 is the length of the pathname usr/lib:

    lrwxrwxrwx 1 root           7 Sep 25 07:14 lib -> usr/lib

(Note that symbolic links do not contain the normal C null byte at the end of the name, as the length is always specified by st_size.)

Most contemporary UNIX systems provide the fields st_blksize and st_blocks. The first is the preferred block size for I/O for the file, and the latter is the actual number of 512-byte blocks that are allocated. Recall from Section 3.9 that we encountered the minimum amount of time required to read a file when we used st_blksize for the read operations. The standard I/O library, which we describe in Chapter 5, also tries to read or write st_blksize bytes at a time, for efficiency.

Be aware that different versions of the UNIX System use units other than 512-byte blocks for st_blocks. Using this value is nonportable.

Holes in a File

In Section 3.6, we mentioned that a regular file can contain "holes." We showed an example of this in Figure 3.2. Holes are created by seeking past the current end of file and writing some data. As an example, consider the following:

     $ ls -l core
     -rw-r--r-- 1 sar       8483248 Nov 18 12:18 core
     $ du -s core
     272        core

The size of the file core is just over 8 MB, yet the du command reports that the amount of disk space used by the file is 272 512-byte blocks (139,264 bytes). (The du command on many BSD-derived systems reports the number of 1,024-byte blocks; Solaris reports the number of 512-byte blocks.) Obviously, this file has many holes.

As we mentioned in Section 3.6, the read function returns data bytes of 0 for any byte positions that have not been written. If we execute the following, we can see that the normal I/O operations read up through the size of the file:

     $ wc -c core
      8483248 core

The wc(1) command with the -c option counts the number of characters (bytes) in the file.

If we make a copy of this file, using a utility such as cat(1), all these holes are written out as actual data bytes of 0:

       $ cat core > core.copy
       $ ls -l core*
       -rw-r--r--  1 sar      8483248 Nov 18 12:18 core
       -rw-rw-r--  1 sar      8483248 Nov 18 12:27 core.copy
       $ du -s core*
       272     core
       16592   core.copy

Here, the actual number of bytes used by the new file is 8,495,104 (512 x 16,592). The difference between this size and the size reported by ls is caused by the number of blocks used by the file system to hold pointers to the actual data blocks.

Interested readers should refer to Section 4.2 of Bach [1986], Sections 7.2 and 7.3 of McKusick et al. [1996] (or Sections 8.2 and 8.3 in McKusick and Neville-Neil [2005]), and Section 14.2 of Mauro and McDougall [2001] for additional details on the physical layout of files.

    Team BBL
    Previous Page Next Page