14.7. readv and writev FunctionsThe readv and writev functions let us read into and write from multiple noncontiguous buffers in a single function call. These operations are called scatter read and gather write.
The second argument to both functions is a pointer to an array of iovec structures: struct iovec { void *iov_base; /* starting address of buffer */ size_t iov_len; /* size of buffer */ }; The number of elements in the iov array is specified by iovcnt. It is limited to IOV_MAX (Recall Figure 2.10). Figure 14.27 shows a picture relating the arguments to these two functions and the iovec structure. Figure 14.27. The iovec structure for readv and writevThe writev function gathers the output data from the buffers in order: iov[0], iov[1], through iov[iovcnt1]; writev returns the total number of bytes output, which should normally equal the sum of all the buffer lengths. The readv function scatters the data into the buffers in order, always filling one buffer before proceeding to the next. readv returns the total number of bytes that were read. A count of 0 is returned if there is no more data and the end of file is encountered.
ExampleIn Section 20.8, in the function _db_writeidx, we need to write two buffers consecutively to a file. The second buffer to output is an argument passed by the caller, and the first buffer is one we create, containing the length of the second buffer and a file offset of other information in the file. There are three ways we can do this.
The solution we use in Section 20.8 is to use writev, but it's instructive to compare it to the other two solutions. Figure 14.28 shows the results from the three methods just described. The test program that we measured output a 100-byte header followed by 200 bytes of data. This was done 1,048,576 times, generating a 300-megabyte file. The test program has three separate casesone for each of the techniques measured in Figure 14.28. We used times (Section 8.16) to obtain the user CPU time, system CPU time, and wall clock time before and after the writes. All three times are shown in seconds. As we expect, the system time increases when we call write twice, compared to calling either write or writev once. This correlates with the results in Figure 3.5. Next, note that the sum of the CPU times (user plus system) is less when we do a buffer copy followed by a single write compared to a single call to writev. With the single write, we copy the buffers to a staging buffer at user level, and then the kernel will copy the data to its internal buffers when we call write. With writev, we should do less copying, because the kernel only needs to copy the data directly into its staging buffers. The fixed cost of using writev for such small amounts of data, however, is greater than the benefit. As the amount of data we need to copy increases, the more expensive it will be to copy the buffers in our program, and the writev alternative will be more attractive.
In summary, we should always try to use the fewest number of system calls necessary to get the job done. If we are writing small amounts of data, we will find it less expensive to copy the data ourselves and use a single write instead of using writev. We might find, however, that the performance benefits aren't worth the extra complexity cost needed to manage our own staging buffers. |