14.9. Memory-Mapped I/OMemory-mapped I/O lets us map a file on disk into a buffer in memory so that, when we fetch bytes from the buffer, the corresponding bytes of the file are read. Similarly, when we store data in the buffer, the corresponding bytes are automatically written to the file. This lets us perform I/O without using read or write.
To use this feature, we have to tell the kernel to map a given file to a region in memory. This is done by the mmap function.
The addr argument lets us specify the address of where we want the mapped region to start. We normally set this to 0 to allow the system to choose the starting address. The return value of this function is the starting address of the mapped area. The filedes argument is the file descriptor specifying the file that is to be mapped. We have to open this file before we can map it into the address space. The len argument is the number of bytes to map, and off is the starting offset in the file of the bytes to map. (Some restrictions on the value of off are described later.) The prot argument specifies the protection of the mapped region. We can specify the protection as either PROT_NONE or the bitwise OR of any combination of PROT_READ, PROT_WRITE, and PROT_EXEC. The protection specified for a region can't allow more access than the open mode of the file. For example, we can't specify PROT_WRITE if the file was opened read-only. Before looking at the flag argument, let's see what's going on here. Figure 14.31 shows a memory-mapped file. (Recall the memory layout of a typical process, Figure 7.6.) In this figure, "start addr" is the return value from mmap. We have shown the mapped memory being somewhere between the heap and the stack: this is an implementation detail and may differ from one implementation to the next. Figure 14.31. Example of a memory-mapped fileThe flag argument affects various attributes of the mapped region. Each implementation has additional MAP_xxx flag values, which are specific to that implementation. Check the mmap(2) manual page on your system for details. The value of off and the value of addr (if MAP_FIXED is specified) are required to be multiples of the system's virtual memory page size. This value can be obtained from the sysconf function (Section 2.5.4) with an argument of _SC_PAGESIZE or _SC_PAGE_SIZE. Since off and addr are often specified as 0, this requirement is not a big deal. Since the starting offset of the mapped file is tied to the system's virtual memory page size, what happens if the length of the mapped region isn't a multiple of the page size? Assume that the file size is 12 bytes and that the system's page size is 512 bytes. In this case, the system normally provides a mapped region of 512 bytes, and the final 500 bytes of this region are set to 0. We can modify the final 500 bytes, but any changes we make to them are not reflected in the file. Thus, we cannot append to a file with mmap. We must first grow the file, as we will see in Figure 14.32. Figure 14.32. Copy a file using memory-mapped I/O#include "apue.h" #include <fcntl.h> #include <sys/mman.h> int main(int argc, char *argv[]) { int fdin, fdout; void *src, *dst; struct stat statbuf; if (argc != 3) err_quit("usage: %s <fromfile> <tofile>", argv[0]); if ((fdin = open(argv[1], O_RDONLY)) < 0) err_sys("can't open %s for reading", argv[1]); if ((fdout = open(argv[2], O_RDWR | O_CREAT | O_TRUNC, FILE_MODE)) < 0) err_sys("can't creat %s for writing", argv[2]); if (fstat(fdin, &statbuf) < 0) /* need size of input file */ err_sys("fstat error"); /* set size of output file */ if (lseek(fdout, statbuf.st_size - 1, SEEK_SET) == -1) err_sys("lseek error"); if (write(fdout, "", 1) != 1) err_sys("write error"); if ((src = mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED, fdin, 0)) == MAP_FAILED) err_sys("mmap error for input"); if ((dst = mmap(0, statbuf.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdout, 0)) == MAP_FAILED) err_sys("mmap error for output"); memcpy(dst, src, statbuf.st_size); /* does the file copy */ exit(0); } Two signals are normally used with mapped regions. SIGSEGV is the signal normally used to indicate that we have tried to access memory that is not available to us. This signal can also be generated if we try to store into a mapped region that we specified to mmap as read-only. The SIGBUS signal can be generated if we access a portion of the mapped region that does not make sense at the time of the access. For example, assume that we map a file using the file's size, but before we reference the mapped region, the file's size is truncated by some other process. If we then try to access the memory-mapped region corresponding to the end portion of the file that was truncated, we'll receive SIGBUS. A memory-mapped region is inherited by a child across a fork (since it's part of the parent's address space), but for the same reason, is not inherited by the new program across an exec. We can change the permissions on an existing mapping by calling mprotect.
The legal values for prot are the same as those for mmap (Figure 14.30). The address argument must be an integral multiple of the system's page size.
If the pages in a shared mapping have been modified, we can call msync to flush the changes to the file that backs the mapping. The msync function is similar to fsync (Section 3.13), but works on memory-mapped regions.
If the mapping is private, the file mapped is not modified. As with the other memory-mapped functions, the address must be aligned on a page boundary. The flags argument allows us some control over how the memory is flushed. We can specify the MS_ASYNC flag to simply schedule the pages to be written. If we want to wait for the writes to complete before returning, we can use the MS_SYNC flag. Either MS_ASYNC or MS_SYNC must be specified. An optional flag, MS_INVALIDATE, lets us tell the operating system to discard any pages that are out of sync with the underlying storage. Some implementations will discard all pages in the specified range when we use this flag, but this behavior is not required. A memory-mapped region is automatically unmapped when the process terminates or by calling munmap directly. Closing the file descriptor filedes does not unmap the region.
munmap does not affect the object that was mappedthat is, the call to munmap does not cause the contents of the mapped region to be written to the disk file. The updating of the disk file for a MAP_SHARED region happens automatically by the kernel's virtual memory algorithm as we store into the memory-mapped region. Modifications to memory in a MAP_PRIVATE region are discarded when the region is unmapped. ExampleThe program in Figure 14.32 copies a file (similar to the cp(1) command) using memory-mapped I/O. We first open both files and then call fstat to obtain the size of the input file. We need this size for the call to mmap for the input file, and we also need to set the size of the output file. We call lseek and then write one byte to set the size of the output file. If we don't set the output file's size, the call to mmap for the output file is OK, but the first reference to the associated memory region generates SIGBUS. We might be tempted to use ftruncate to set the size of the output file, but not all systems extend the size of a file with this function. (See Section 4.13.)
We then call mmap for each file, to map the file into memory, and finally call memcpy to copy from the input buffer to the output buffer. As the bytes of data are fetched from the input buffer (src), the input file is automatically read by the kernel; as the data is stored in the output buffer (dst), the data is automatically written to the output file.
Let's compare this memory-mapped file copy to a copy that is done by calling read and write (with a buffer size of 8,192). Figure 14.33 shows the results. The times are given in seconds, and the size of the file being copied was 300 megabytes. For Solaris 9, the total CPU time (user + system) is almost the same for both types of copies: 9.88 seconds versus 9.62 seconds. For Linux 2.4.22, the total CPU time is almost doubled when we use mmap and memcpy (1.06 seconds versus 1.95 seconds). The difference is probably because the two systems implement process time accounting differently. As far as elapsed time is concerned, the version with mmap and memcpy is faster than the version with read and write. This makes sense, because we're doing less work with mmap and memcpy. With read and write, we copy the data from the kernel's buffer to the application's buffer (read), and then copy the data from the application's buffer to the kernel's buffer (write). With mmap and memcpy, we copy the data directly from one kernel buffer mapped into our address space into another kernel buffer mapped into our address space.
Memory-mapped I/O is faster when copying one regular file to another. There are limitations. We can't use it to copy between certain devices (such as a network device or a terminal device), and we have to be careful if the size of the underlying file could change after we map it. Nevertheless, some applications can benefit from memory-mapped I/O, as it can often simplify the algorithms, since we manipulate memory instead of reading and writing a file. One example that can benefit from memory-mapped I/O is the manipulation of a frame buffer device that references a bit-mapped display. Krieger, Stumm, and Unrau [1992] describe an alternative to the standard I/O library (Chapter 5) that uses memory-mapped I/O. We return to memory-mapped I/O in Section 15.9, showing an example of how it can be used to provide shared memory between related processes. |