4.21. Reading DirectoriesDirectories can be read by anyone who has access permission to read the directory. But only the kernel can write to a directory, to preserve file system sanity. Recall from Section 4.5 that the write permission bits and execute permission bits for a directory determine if we can create new files in the directory and remove files from the directorythey don't specify if we can write to the directory itself. The actual format of a directory depends on the UNIX System implementation and the design of the file system. Earlier systems, such as Version 7, had a simple structure: each directory entry was 16 bytes, with 14 bytes for the filename and 2 bytes for the i-node number. When longer filenames were added to 4.2BSD, each entry became variable length, which means that any program that reads a directory is now system dependent. To simplify this, a set of directory routines were developed and are part of POSIX.1. Many implementations prevent applications from using the read function to access the contents of directories, thereby further isolating applications from the implementation-specific details of directory formats.
The telldir and seekdir functions are not part of the base POSIX.1 standard. They are XSI extensions in the Single UNIX Specifications, so all conforming UNIX System implementations are expected to provide them. Recall our use of several of these functions in the program shown in Figure 1.3, our bare-bones implementation of the ls command. The dirent structure defined in the file <dirent.h> is implementation dependent. Implementations define the structure to contain at least the following two members: struct dirent { ino_t d_ino; /* i-node number */ char d_name[NAME_MAX + 1]; /* null-terminated filename */ }
Note that NAME_MAX is not a defined constant with Solarisits value depends on the file system in which the directory resides, and its value is usually obtained from the fpathconf function. A common value for NAME_MAX is 255. (Recall Figure 2.14.) Since the filename is null terminated, however, it doesn't matter how the array d_name is defined in the header, because the array size doesn't indicate the length of the filename. The DIR structure is an internal structure used by these six functions to maintain information about the directory being read. The purpose of the DIR structure is similar to that of the FILE structure maintained by the standard I/O library, which we describe in Chapter 5. The pointer to a DIR structure that is returned by opendir is then used with the other five functions. The opendir function initializes things so that the first readdir reads the first entry in the directory. The ordering of entries within the directory is implementation dependent and is usually not alphabetical. ExampleWe'll use these directory routines to write a program that traverses a file hierarchy. The goal is to produce the count of the various types of files that we show in Figure 4.4. The program shown in Figure 4.22 takes a single argumentthe starting pathnameand recursively descends the hierarchy from that point. Solaris provides a function, ftw(3), that performs the actual traversal of the hierarchy, calling a user-defined function for each file. The problem with this function is that it calls the stat function for each file, which causes the program to follow symbolic links. For example, if we start at the root and have a symbolic link named /lib that points to /usr/lib, all the files in the directory /usr/lib are counted twice. To correct this, Solaris provides an additional function, nftw(3), with an option that stops it from following symbolic links. Although we could use nftw, we'll write our own simple file walker to show the use of the directory routines.
We have provided more generality in this program than needed. This was done to illustrate the ftw function. For example, the function myfunc always returns 0, even though the function that calls it is prepared to handle a nonzero return. Figure 4.22. Recursively descend a directory hierarchy, counting file types#include "apue.h" #include <dirent.h> #include <limits.h> /* function type that is called for each filename */ typedef int Myfunc(const char *, const struct stat *, int); static Myfunc myfunc; static int myftw(char *, Myfunc *); static int dopath(Myfunc *); static long nreg, ndir, nblk, nchr, nfifo, nslink, nsock, ntot; int main(int argc, char *argv[]) { int ret; if (argc != 2) err_quit("usage: ftw <starting-pathname>"); ret = myftw(argv[1], myfunc); /* does it all */ ntot = nreg + ndir + nblk + nchr + nfifo + nslink + nsock; if (ntot == 0) ntot = 1; /* avoid divide by 0; print 0 for all counts */ printf("regular files = %7ld, %5.2f %%\n", nreg, nreg*100.0/ntot); printf("directories = %7ld, %5.2f %%\n", ndir, ndir*100.0/ntot); printf("block special = %7ld, %5.2f %%\n", nblk, nblk*100.0/ntot); printf("char special = %7ld, %5.2f %%\n", nchr, nchr*100.0/ntot); printf("FIFOs = %7ld, %5.2f %%\n", nfifo, nfifo*100.0/ntot); printf("symbolic links = %7ld, %5.2f %%\n", nslink, nslink*100.0/ntot); printf("sockets = %7ld, %5.2f %%\n", nsock, nsock*100.0/ntot); exit(ret); } /* * Descend through the hierarchy, starting at "pathname". * The caller's func() is called for every file. */ #define FTW_F 1 /* file other than directory */ #define FTW_D 2 /* directory */ #define FTW_DNR 3 /* directory that can't be read */ #define FTW_NS 4 /* file that we can't stat */ static char *fullpath; /* contains full pathname for every file */ static int /* we return whatever func() returns */ myftw(char *pathname, Myfunc *func) { int len; fullpath = path_alloc(&len); /* malloc's for PATH_MAX+1 bytes */ /* (Figure 2.15) */ strncpy(fullpath, pathname, len); /* protect against */ fullpath[len-1] = 0; /* buffer overrun */ return(dopath(func)); } /* * Descend through the hierarchy, starting at "fullpath". * If "fullpath" is anything other than a directory, we lstat() it, * call func(), and return. For a directory, we call ourself * recursively for each name in the directory. */ static int /* we return whatever func() returns */ dopath(Myfunc* func) { struct stat statbuf; struct dirent *dirp; DIR *dp; int ret; char *ptr; if (lstat(fullpath, &statbuf) < 0) /* stat error */ return(func(fullpath, &statbuf, FTW_NS)); if (S_ISDIR(statbuf.st_mode) == 0) /* not a directory */ return(func(fullpath, &statbuf, FTW_F)); /* * It's a directory. First call func() for the directory, * then process each filename in the directory. */ if ((ret = func(fullpath, &statbuf, FTW_D)) != 0) return(ret); ptr = fullpath + strlen(fullpath); /* point to end of fullpath */ *ptr++ = '/'; *ptr = 0; if ((dp = opendir(fullpath)) == NULL) /* can't read directory */ return(func(fullpath, &statbuf, FTW_DNR)); while ((dirp = readdir(dp)) != NULL) { if (strcmp(dirp->d_name, ".") == 0 || strcmp(dirp->d_name, "..") == 0) continue; /* ignore dot and dot-dot */ strcpy(ptr, dirp->d_name); /* append name after slash */ if ((ret = dopath(func)) != 0) /* recursive */ break; /* time to leave */ } ptr[-1] = 0; /* erase everything from slash onwards */ if (closedir(dp) < 0) err_ret("can't close directory %s", fullpath); return(ret); } static int myfunc(const char *pathname, const struct stat *statptr, int type) { switch (type) { case FTW_F: switch (statptr->st_mode & S_IFMT) { case S_IFREG: nreg++; break; case S_IFBLK: nblk++; break; case S_IFCHR: nchr++; break; case S_IFIFO: nfifo++; break; case S_IFLNK: nslink++; break; case S_IFSOCK: nsock++; break; case S_IFDIR: err_dump("for S_IFDIR for %s", pathname); /* directories should have type = FTW_D */ } break; case FTW_D: ndir++; break; case FTW_DNR: err_ret("can't read directory %s", pathname); break; case FTW_NS: err_ret("stat error for %s", pathname); break; default: err_dump("unknown type %d for pathname %s", type, pathname); } return(0); } For additional information on descending through a file system and the use of this technique in many standard UNIX System commandsfind, ls, tar, and so onrefer to Fowler, Korn, and Vo [1989]. |