Section 14.4. STREAMS

14.4. STREAMS

The STREAMS mechanism is provided by System V as a general way to interface communication drivers into the kernel. We need to discuss STREAMS to understand the terminal interface in System V, the use of the poll function for I/O multiplexing (Section 14.5.2), and the implementation of STREAMS-based pipes and named pipes (Sections 17.2 and 17.2.1).

Be careful not to confuse this usage of the word stream with our previous usage of it in the standard I/O library (Section 5.2). The streams mechanism was developed by Dennis Ritchie [Ritchie 1984] as a way of cleaning up the traditional character I/O system (c-lists) and to accommodate networking protocols. The streams mechanism was later added to SVR3, after enhancing it a bit and capitalizing the name. Complete support for STREAMS (i.e., a STREAMS-based terminal I/O system) was provided with SVR4. The SVR4 implementation is described in [AT&T 1990d]. Rago [1993] discusses both user-level STREAMS programming and kernel-level STREAMS programming.

STREAMS is an optional feature in the Single UNIX Specification (included as the XSI STREAMS Option Group). Of the four platforms discussed in this text, only Solaris provides native support for STREAMS. A STREAMS subsystem is available for Linux, but you need to add it yourself. It is not usually included by default.

A stream provides a full-duplex path between a user process and a device driver. There is no need for a stream to talk to a hardware device; a stream can also be used with a pseudo-device driver. Figure 14.13 shows the basic picture for what is called a simple stream.

Figure 14.13. A simple stream

Beneath the stream head, we can push processing modules onto the stream. This is done using an ioctl command. Figure 14.14 shows a stream with a single processing module. We also show the connection between these boxes with two arrows to stress the full-duplex nature of streams and to emphasize that the processing in one direction is separate from the processing in the other direction.

Figure 14.14. A stream with a processing module

Any number of processing modules can be pushed onto a stream. We use the term push, because each new module goes beneath the stream head, pushing any previously pushed modules down. (This is similar to a last-in, first-out stack.) In Figure 14.14, we have labeled the downstream and upstream sides of the stream. Data that we write to a stream head is sent downstream. Data read by the device driver is sent upstream.

STREAMS modules are similar to device drivers in that they execute as part of the kernel, and they are normally link edited into the kernel when the kernel is built. If the system supports dynamically-loadable kernel modules (as do Linux and Solaris), then we can take a STREAMS module that has not been link edited into the kernel and try to push it onto a stream; however, there is no guarantee that arbitrary combinations of modules and drivers will work properly together.

We access a stream with the functions from Chapter 3: open, close, read, write, and ioctl. Additionally, three new functions were added to the SVR3 kernel to support STREAMS (getmsg, putmsg, and poll), and another two (getpmsg and putpmsg) were added with SVR4 to handle messages with different priority bands within a stream. We describe these five new functions later in this section.

The pathname that we open for a stream normally lives beneath the /dev directory. Simply looking at the device name using ls -l, we can't tell whether the device is a STREAMS device. All STREAMS devices are character special files.

Although some STREAMS documentation implies that we can write processing modules and push them willy-nilly onto a stream, the writing of these modules requires the same skills and care as writing a device driver. Generally, only specialized applications or functions push and pop STREAMS modules.

Before STREAMS, terminals were handled with the existing c-list mechanism. (Section 10.3.1 of Bach [1986] and Section 10.6 of McKusick et al. [1996] describe c-lists in SVR2 and 4.4BSD, respectively.) Adding other character-based devices to the kernel usually involved writing a device driver and putting everything into the driver. Access to the new device was typically through the raw device, meaning that every user read or write ended up directly in the device driver. The STREAMS mechanism cleans up this way of interacting, allowing the data to flow between the stream head and the driver in STREAMS messages and allowing any number of intermediate processing modules to operate on the data.

STREAMS Messages

All input and output under STREAMS is based on messages. The stream head and the user process exchange messages using read, write, ioctl, getmsg, getpmsg, putmsg, and putpmsg. Messages are also passed up and down a stream between the stream head, the processing modules, and the device driver.

Between the user process and the stream head, a message consists of a message type, optional control information, and optional data. We show in Figure 14.15 how the various message types are generated by the arguments to write, putmsg, and putpmsg. The control information and data are specified by strbuf structures:

   struct strbuf
     int maxlen;  /* size of buffer */
     int len;     /* number of bytes currently in buffer */
     char *buf;   /* pointer to buffer */
   };

Figure 14.15. Type of STREAMS message generated for write, putmsg, and putpmsg
Function
Control?
Data?
band
flag
Message type generated
write
N/A
yes
N/A
N/A
M_DATA (ordinary)
putmsg
no
no
N/A
0
no message sent, returns 0
putmsg
no
yes
N/A
0
M_DATA (ordinary)
putmsg
yes
yes or no
N/A
0
M_PROTO (ordinary)
putmsg
yes
yes or no
N/A
RS_HIPRI
M_PCPROTO (high-priority)
putmsg
no
yes or no
N/A
RS_HIPRI
error, EINVAL
putpmsg
yes or no
yes or no
0255
0
error, EINVAL
putpmsg
no
no
0255
MSG_BAND
no message sent, returns 0
putpmsg
no
yes
0
MSG_BAND
M_DATA (ordinary)
putpmsg
no
yes
1255
MSG_BAND
M_DATA (priority band)
putpmsg
yes
yes or no
0
MSG_BAND
M_PROTO (ordinary)
putpmsg
yes
yes or no
1255
MSG_BAND
M_PROTO (priority band)
putpmsg
yes
yes or no
0
MSG_HIPRI
M_PCPROTO (high-priority)
putpmsg
no
yes or no
0
MSG_HIPRI
error, EINVAL
putpmsg
yes or no
yes or no
nonzero
MSG_HIPRI
error, EINVAL

When we send a message with putmsg or putpmsg, len specifies the number of bytes of data in the buffer. When we receive a message with getmsg or getpmsg, maxlen specifies the size of the buffer (so the kernel won't overflow the buffer), and len is set by the kernel to the amount of data stored in the buffer. We'll see that a zero-length message is OK and that a len of 1 can specify that there is no control or data.

Why do we need to pass both control information and data? Providing both allows us to implement service interfaces between a user process and a stream. Olander, McGrath, and Israel [1986] describe the original implementation of service interfaces in System V. Chapter 5 of AT&T [1990d] describes service interfaces in detail, along with a simple example. Probably the best-known service interface, described in Chapter 4 of Rago [1993], is the System V Transport Layer Interface (TLI), which provides an interface to the networking system.

Another example of control information is sending a connectionless network message (a datagram). To send the message, we need to specify the contents of the message (the data) and the destination address for the message (the control information). If we couldn't send control and data together, some ad hoc scheme would be required. For example, we could specify the address using an ioctl, followed by a write of the data. Another technique would be to require that the address occupy the first N bytes of the data that is written using write. Separating the control information from the data, and providing functions that handle both (putmsg and getmsg) is a cleaner way to handle this.

There are about 25 different types of messages, but only a few of these are used between the user process and the stream head. The rest are passed up and down a stream within the kernel. (These message types are of interest to people writing STREAMS processing modules, but can safely be ignored by people writing user-level code.) We'll encounter only three of these message types with the functions we use (read, write, getmsg, getpmsg, putmsg, and putpmsg):

M_DATA (user data for I/O)
M_PROTO (protocol control information)
M_PCPROTO (high-priority protocol control information)

Every message on a stream has a queueing priority:

High-priority messages (highest priority)
Priority band messages
Ordinary messages (lowest priority)

Ordinary messages are simply priority band messages with a band of 0. Priority band messages have a band of 1255, with a higher band specifying a higher priority. High-priority messages are special in that only one is queued by the stream head at a time. Additional high-priority messages are discarded when one is already on the stream head's read queue.

Each STREAMS module has two input queues. One receives messages from the module above (messages moving downstream from the stream head toward the driver), and one receives messages from the module below (messages moving upstream from the driver toward the stream head). The messages on an input queue are arranged by priority. We show in Figure 14.15 how the arguments to write, putmsg, and putpmsg cause these various priority messages to be generated.

There are other types of messages that we don't consider. For example, if the stream head receives an M_SIG message from below, it generates a signal. This is how a terminal line discipline module sends the terminal-generated signals to the foreground process group associated with a controlling terminal.

`putmsg` and `putpmsg` Functions

A STREAMS message (control information or data, or both) is written to a stream using either putmsg or putpmsg. The difference in these two functions is that the latter allows us to specify a priority band for the message.

[View full width]
#include <stropts.h> int putmsg(int filedes, const struct strbuf *ctlptr, const struct strbuf *dataptr, int flag); int putpmsg(int filedes, const struct strbuf *ctlptr, const struct strbuf *dataptr, int band , int flag);

Both return: 0 if OK, 1 on error

We can also write to a stream, which is equivalent to a putmsg without any control information and with a flag of 0.

These two functions can generate the three different priorities of messages: ordinary, priority band, and high priority. Figure 14.15 details the combinations of the arguments to these two functions that generate the various types of messages.

The notation "N/A" means not applicable. In this figure, a "no" for the control portion of the message corresponds to either a null ctlptr argument or ctlptr>len being 1. A "yes" for the control portion corresponds to ctlptr being non-null and ctlptr>len being greater than or equal to 0. The data portion of the message is handled equivalently (using dataptr instead of ctlptr).

STREAMS `ioctl` Operations

In Section 3.15, we said that the ioctl function is the catchall for anything that can't be done with the other I/O functions. The STREAMS system continues this tradition.

Between Linux and Solaris, there are almost 40 different operations that can be performed on a stream using ioctl. Most of these operations are documented in the streamio(7) manual page. The header <stropts.h> must be included in C code that uses any of these operations. The second argument for ioctl, request, specifies which of the operations to perform. All the requests begin with I_. The third argument depends on the request. Sometimes, the third argument is an integer value; sometimes, it's a pointer to an integer or a structure.

Example`isastream` Function

We sometimes need to determine if a descriptor refers to a stream or not. This is similar to calling the isatty function to determine if a descriptor refers to a terminal device (Section 18.9). Linux and Solaris provide the isastream function.

#include <stropts.h> int isastream(int filedes);

Returns: 1 (true) if STREAMS device, 0 (false) otherwise

Like isatty, this is usually a trivial function that merely tries an ioctl that is valid only on a STREAMS device. Figure 14.16 shows one possible implementation of this function. We use the I_CANPUT ioctl command, which checks if the band specified by the third argument (0 in the example) is writable. If the ioctl succeeds, the stream is not changed.

We can use the program in Figure 14.17 to test this function.

Running this program on Solaris 9 shows the various errors returned by the ioctl function:

   $ ./a.out /dev/tty /dev/fb /dev/null /etc/motd
   /dev/tty: streams device
   /dev/fb: not a stream: Invalid argument
   /dev/null: not a stream: No such device or address
   /etc/motd: not a stream: Inappropriate ioctl for device

Note that /dev/tty is a STREAMS device, as we expect under Solaris. The character special file /dev/fb is not a STREAMS device, but it supports other ioctl requests. These devices return EINVAL when the ioctl request is unknown. The character special file /dev/null does not support any ioctl operations, so the error ENODEV is returned. Finally, /etc/motd is a regular file, not a character special file, so the classic error ENOTTY is returned. We never receive the error we might expect: ENOSTR ("Device is not a stream").

The message for ENOTTY used to be "Not a typewriter," a historical artifact because the UNIX kernel returns ENOTTY whenever an ioctl is attempted on a descriptor that doesn't refer to a character special device. This message has been updated on Solaris to "Inappropriate ioctl for device."

Figure 14.16. Check if descriptor is a STREAMS device

#include   <stropts.h>
#include   <unistd.h>

int
isastream(int fd)
{
    return(ioctl(fd, I_CANPUT, 0) != -1);
}

Figure 14.17. Test the `isastream` function

#include "apue.h"
#include <fcntl.h>

int
main(int argc, char *argv[])
{
    int     i, fd;

    for (i = 1; i < argc; i++) {
        if ((fd = open(argv[i], O_RDONLY)) < 0) {
            err_ret("%s: can't open", argv[i]);
            continue;
        }

        if (isastream(fd) == 0)
            err_ret("%s: not a stream", argv[i]);
        else
            err_msg("%s: streams device", argv[i]);
     }

     exit(0);
}

Example

If the ioctl request is I_LIST, the system returns the names of all the modules on the streamthe ones that have been pushed onto the stream, including the topmost driver. (We say topmost because in the case of a multiplexing driver, there may be more than one driver. Chapter 12 of Rago [1993] discusses multiplexing drivers in detail.) The third argument must be a pointer to a str_list structure:

   struct str_list {
     int                sl_nmods;   /* number of entries in array */
     struct str_mlist  *sl_modlist; /* ptr to first element of array */
   };

We have to set sl_modlist to point to the first element of an array of str_mlist structures and set sl_nmods to the number of entries in the array:

   struct str_mlist {
     char l_name[FMNAMESZ+1]; /* null terminated module name */
   };

The constant FMNAMESZ is defined in the header <sys/conf.h> and is often 8. The extra byte in l_name is for the terminating null byte.

If the third argument to the ioctl is 0, the count of the number of modules is returned (as the value of ioctl) instead of the module names. We'll use this to determine the number of modules and then allocate the required number of str_mlist structures.

Figure 14.18 illustrates the I_LIST operation. Since the returned list of names doesn't differentiate between the modules and the driver, when we print the module names, we know that the final entry in the list is the driver at the bottom of the stream.

If we run the program in Figure 14.18 from both a network login and a console login, to see which STREAMS modules are pushed onto the controlling terminal, we get the following:

   $ who
   sar        console     May 1 18:27
   sar        pts/7       Jul 12 06:53
   $ ./a.out /dev/console
   #modules = 5
     module: redirmod
     module: ttcompat
     module: ldterm
     module: ptem
     driver: pts
   $ ./a.out /dev/pts/7
   #modules = 4
     module: ttcompat
     module: ldterm
     module: ptem
     driver: pts

The modules are the same in both cases, except that the console has an extra module on top that helps with virtual console redirection. On this computer, a windowing system was running on the console, so /dev/console actually refers to a pseudo terminal instead of to a hardwired device. We'll return to the pseudo terminal case in Chapter 19.

Figure 14.18. List the names of the modules on a stream

#include "apue.h"
#include <fcntl.h>
#include <stropts.h>
#include <sys/conf.h>

int
main(int argc, char *argv[])
{
    int                 fd, i, nmods;
    struct str_list     list;

    if (argc != 2)
        err_quit("usage: %s <pathname>", argv[0]);

    if ((fd = open(argv[1], O_RDONLY)) < 0)
        err_sys("can't open %s", argv[1]);
    if (isastream(fd) == 0)
        err_quit("%s is not a stream", argv[1]);

    /*
     * Fetch number of modules.
     */
    if ((nmods = ioctl(fd, I_LIST, (void *) 0)) < 0)
        err_sys("I_LIST error for nmods");
    printf("#modules = %d\n", nmods);

    /*
     * Allocate storage for all the module names.
     */
    list.sl_modlist = calloc(nmods, sizeof(struct str_mlist));
    if (list.sl_modlist == NULL)
        err_sys("calloc error");
    list.sl_nmods = nmods;

    /*
     * Fetch the module names.
     */
    if (ioctl(fd, I_LIST, &list) < 0)
        err_sys("I_LIST error for list");

    /*
     * Print the names.
     */
    for (i = 1; i <= nmods; i++)
        printf(" %s: %s\n", (i == nmods) ? "driver" : "module",
          list.sl_modlist++->l_name);

    exit(0);
}

`write` to STREAMS Devices

In Figure 14.15 we said that a write to a STREAMS device generates an M_DATA message. Although this is generally true, there are some additional details to consider. First, with a stream, the topmost processing module specifies the minimum and maximum packet sizes that can be sent downstream. (We are unable to query the module for these values.) If we write more than the maximum, the stream head normally breaks the data into packets of the maximum size, with one final packet that can be smaller than the maximum.

The next thing to consider is what happens if we write zero bytes to a stream. Unless the stream refers to a pipe or FIFO, a zero-length message is sent downstream. With a pipe or FIFO, the default is to ignore the zero-length write, for compatibility with previous versions. We can change this default for pipes and FIFOs using an ioctl to set the write mode for the stream.

Write Mode

Two ioctl commands fetch and set the write mode for a stream. Setting request to I_GWROPT requires that the third argument be a pointer to an integer, and the current write mode for the stream is returned in that integer. If request is I_SWROPT, the third argument is an integer whose value becomes the new write mode for the stream. As with the file descriptor flags and the file status flags (Section 3.14), we should always fetch the current write mode value and modify it rather than set the write mode to some absolute value (possibly turning off some other bits that were enabled).

Currently, only two write mode values are defined.

SNDZERO
A zero-length write to a pipe or FIFO will cause a zero-length message to be sent downstream. By default, this zero-length write sends no message.
SNDPIPE
Causes SIGPIPE to be sent to the calling process that calls either write or putmsg after an error has occurred on a stream.

A stream also has a read mode, and we'll look at it after describing the getmsg and getpmsg functions.

`getmsg` and `getpmsg` Functions

STREAMS messages are read from a stream head using read, getmsg, or getpmsg.

[View full width]
#include <stropts.h> int getmsg(int filedes, struct strbuf *restrict ctlptr, struct strbuf *restrict dataptr, int *restrict flagptr); int getpmsg(int filedes, struct strbuf *restrict ctlptr, struct strbuf *restrict dataptr, int *restrict bandptr, int *restrict flagptr);

Both return: non-negative value if OK, 1 on error

Note that flagptr and bandptr are pointers to integers. The integer pointed to by these two pointers must be set before the call to specify the type of message desired, and the integer is also set on return to the type of message that was read.

If the integer pointed to by flagptr is 0, getmsg returns the next message on the stream head's read queue. If the next message is a high-priority message, the integer pointed to by flagptr is set to RS_HIPRI on return. If we want to receive only high-priority messages, we must set the integer pointed to by flagptr to RS_HIPRI before calling getmsg.

A different set of constants is used by getpmsg. We can set the integer pointed to by flagptr to MSG_HIPRI to receive only high-priority messages. We can set the integer to MSG_BAND and then set the integer pointed to by bandptr to a nonzero priority value to receive only messages from that band, or higher (including high-priority messages). If we only want to receive the first available message, we can set the integer pointed to by flagptr to MSG_ANY; on return, the integer will be overwritten with either MSG_HIPRI or MSG_BAND, depending on the type of message received. If the message we retrieved was not a high-priority message, the integer pointed to by bandptr will contain the message's priority band.

If ctlptr is null or ctlptr>maxlen is 1, the control portion of the message will remain on the stream head's read queue, and we will not process it. Similarly, if dataptr is null or dataptr>maxlen is 1, the data portion of the message is not processed and remains on the stream head's read queue. Otherwise, we will retrieve as much control and data portions of the message as our buffers will hold, and any remainder will be left on the head of the queue for the next call.

If the call to getmsg or getpmsg retrieves a message, the return value is 0. If part of the control portion of the message is left on the stream head read queue, the constant MORECTL is returned. Similarly, if part of the data portion of the message is left on the queue, the constant MOREDATA is returned. If both control and data are left, the return value is (MORECTL|MOREDATA).

Read Mode

We also need to consider what happens if we read from a STREAMS device. There are two potential problems.

What happens to the record boundaries associated with the messages on a stream?
What happens if we call read and the next message on the stream has control information?

The default handling for condition 1 is called byte-stream mode. In this mode, a read takes data from the stream until the requested number of bytes has been read or until there is no more data. The message boundaries associated with the STREAMS messages are ignored in this mode. The default handling for condition 2 causes the read to return an error if there is a control message at the front of the queue. We can change either of these defaults.

Using ioctl, if we set request to I_GRDOPT, the third argument is a pointer to an integer, and the current read mode for the stream is returned in that integer. A request of I_SRDOPT takes the integer value of the third argument and sets the read mode to that value. The read mode is specified by one of the following three constants:

RNORM
Normal, byte-stream mode (the default), as described previously.
RMSGN
Message-nondiscard mode. A read takes data from a stream until the requested number of bytes have been read or until a message boundary is encountered. If the read uses a partial message, the rest of the data in the message is left on the stream for a subsequent read.
RMSGD
Message-discard mode. This is like the nondiscard mode, but if a partial message is used, the remainder of the message is discarded.

Three additional constants can be specified in the read mode to set the behavior of read when it encounters messages containing protocol control information on a stream:

RPROTNORM
Protocol-normal mode: read returns an error of EBADMSG. This is the default.
RPROTDAT
Protocol-data mode: read returns the control portion as data.
RPROTDIS
Protocol-discard mode: read discards the control information but returns any data in the message.

Only one of the message read modes and one of the protocol read modes can be set at a time. The default read mode is (RNORM|RPROTNORM).

Example

The program in Figure 14.19 is the same as the one in Figure 3.4, but recoded to use getmsg instead of read.

If we run this program under Solaris, where both pipes and terminals are implemented using STREAMS, we get the following output:

   $ echo hello, world | ./a.out           requires STREAMS-based pipes
   flag = 0, ctl.len = -1, dat.len = 13
   hello, world
   flag = 0, ctl.len = 0, dat.len = 0     indicates a STREAMS hangup
   $ ./a.out                               requires STREAMS-based terminals
   this is line 1
   flag = 0, ctl.len = -1, dat.len = 15
   this is line 1
   and line 2
   flag = 0, ctl.len = -1, dat.len = 11
   and line 2
   ^D                                      type the terminal EOF character
   flag = 0, ctl.len = -1, dat.len = 0    tty end of file is not the same as a hangup
   $ ./a.out < /etc/motd
   getmsg error: Not a stream device

When the pipe is closed (when echo terminates), it appears to the program in Figure 14.19 as a STREAMS hangup, with both the control length and the data length set to 0. (We discuss pipes in Section 15.2.) With a terminal, however, typing the end-of-file character causes only the data length to be returned as 0. This terminal end of file is not the same as a STREAMS hangup. As expected, when we redirect standard input to be a non-STREAMS device, getmsg returns an error.

Figure 14.19. Copy standard input to standard output using `getmsg`

#include "apue.h"
#include <stropts.h>

#define BUFFSIZE     4096

int
main(void)
{
    int             n, flag;
    char            ctlbuf[BUFFSIZE], datbuf[BUFFSIZE];
    struct strbuf   ctl, dat;

    ctl.buf = ctlbuf;
    ctl.maxlen = BUFFSIZE;
    dat.buf = datbuf;
    dat.maxlen = BUFFSIZE;
    for ( ; ; ) {
        flag = 0;       /* return any message */
        if ((n = getmsg(STDIN_FILENO, &ctl, &dat, &flag)) < 0)
            err_sys("getmsg error");
        fprintf(stderr, "flag = %d, ctl.len = %d, dat.len = %d\n",
          flag, ctl.len, dat.len);
        if (dat.len == 0)
            exit(0);
        else if (dat.len > 0)
            if (write(STDOUT_FILENO, dat.buf, dat.len) != dat.len)
                err_sys("write error");
    }
}

14.4. STREAMS

Figure 14.13. A simple stream

Figure 14.14. A stream with a processing module

STREAMS Messages

Figure 14.15. Type of STREAMS message generated for write, putmsg, and putpmsg

putmsg and putpmsg Functions

STREAMS ioctl Operations

Exampleisastream Function

Figure 14.16. Check if descriptor is a STREAMS device

Figure 14.17. Test the isastream function

Example

Figure 14.18. List the names of the modules on a stream

write to STREAMS Devices

Write Mode

getmsg and getpmsg Functions

Read Mode

Example

Figure 14.19. Copy standard input to standard output using getmsg

Figure 14.15. Type of STREAMS message generated for `write`, `putmsg`, and `putpmsg`

`putmsg` and `putpmsg` Functions

STREAMS `ioctl` Operations

Example`isastream` Function

Figure 14.17. Test the `isastream` function

`write` to STREAMS Devices

`getmsg` and `getpmsg` Functions

Figure 14.19. Copy standard input to standard output using `getmsg`