16.3. AddressingIn the previous section, we learned how to create and destroy a socket. Before we learn to do something useful with a socket, we need to learn how to identify the process that we want to communicate with. Identifying the process has two components. The machine's network address helps us identify the computer on the network we wish to contact, and the service helps us identify the particular process on the computer. 16.3.1. Byte OrderingWhen communicating with processes running on the same computer, we generally don't have to worry about byte ordering. The byte order is a characteristic of the processor architecture, dictating how bytes are ordered within larger data types, such as integers. Figure 16.4 shows how the bytes within a 32-bit integer are numbered. Figure 16.4. Byte order in a 32-bit integer
If the processor architecture supports big-endian byte order, then the highest byte address occurs in the least significant byte (LSB). Little-endian byte order is the opposite: the least significant byte contains the lowest byte address. Note that regardless of the byte ordering, the most significant byte (MSB) is always on the left, and the least significant byte is always on the right. Thus, if we were to assign a 32-bit integer the value 0x04030201, the most significant byte would contain 4, and the least significant byte would contain 1, regardless of the byte ordering. If we were then to cast a character pointer (cp) to the address of the integer, we would see a difference from the byte ordering. On a little-endian processor, cp[0] would refer to the least significant byte and contain 1; cp[3] would refer to the most significant byte and contain 4. Compare that to a big-endian processor, where cp[0] would contain 4, referring to the most significant byte, and cp[3] would contain 1, referring to the least significant byte. Figure 16.5 summarizes the byte ordering for the four platforms discussed in this text.
Network protocols specify a byte ordering so that heterogeneous computer systems can exchange protocol information without confusing the byte ordering. The TCP/IP protocol suite uses big-endian byte order. The byte ordering becomes visible to applications when they exchange formatted data. With TCP/IP, addresses are presented in network byte order, so applications sometimes need to translate them between the processor 's byte order and the network byte order. This is common when printing an address in a human-readable form, for example. Four common functions are provided to convert between the processor byte order and the network byte order for TCP/IP applications.
The h is for "host" byte order, and the n is for "network" byte order. The l is for "long" (i.e., 4-byte) integer, and the s is for "short" (i.e., 2-byte) integer. These four functions are defined in <arpa/inet.h>, although some older systems define them in <netinet/in.h>. 16.3.2. Address FormatsAn address identifies a socket endpoint in a particular communication domain. The address format is specific to the particular domain. So that addresses with different formats can be passed to the socket functions, the addresses are cast to a generic sockaddr address structure: struct sockaddr { sa_family_t sa_family; /* address family */ char sa_data[]; /* variable-length address */ . . . }; Implementations are free to add additional members and define a size for the sa_data member. For example, on Linux, the structure is defined as struct sockaddr { sa_family_t sa_family; /* address family */ char sa_data[14]; /* variable-length address */ }; But on FreeBSD, the structure is defined as struct sockaddr { unsigned char sa_len; /* total length */ sa_family_t sa_family; /* address family */ char sa_data[14]; /* variable-length address */ }; Internet addresses are defined in <netinet/in.h>. In the IPv4 Internet domain (AF_INET), a socket address is represented by a sockaddr_in structure: struct in_addr { in_addr_t s_addr; /* IPv4 address */ }; struct sockaddr_in { sa_family_t sin_family; /* address family */ in_port_t sin_port; /* port number */ struct in_addr sin_addr; /* IPv4 address */ }; The in_port_t data type is defined to be a uint16_t. The in_addr_t data type is defined to be a uint32_t. These integer data types specify the number of bits in the data type and are defined in <stdint.h>. In contrast to the AF_INET domain, the IPv6 Internet domain (AF_INET6) socket address is represented by a sockaddr_in6 structure: struct in6_addr { uint8_t s6_addr[16]; /* IPv6 address */ }; struct sockaddr_in6 { sa_family_t sin6_family; /* address family */ in_port_t sin6_port; /* port number */ uint32_t sin6_flowinfo; /* traffic class and flow info */ struct in6_addr sin6_addr; /* IPv6 address */ uint32_t sin6_scope_id; /* set of interfaces for scope */ }; These are the definitions required by the Single UNIX Specification. Individual implementations are free to add additional fields. For example, on Linux, the sockaddr_in structure is defined as struct sockaddr_in { sa_family_t sin_family; /* address family */ in_port_t sin_port; /* port number */ struct in_addr sin_addr; /* IPv4 address */ unsigned char sin_zero[8]; /* filler */ }; where the sin_zero member is a filler field that should be set to all-zero values. Note that although the sockaddr_in and sockaddr_in6 structures are quite different, they are both passed to the socket routines cast to a sockaddr structure. In Section 17.3, we will see that the structure of a UNIX domain socket address is different from both of the Internet domain socket address formats. It is sometimes necessary to print an address in a format that is understandable by a person instead of a computer. The BSD networking software included the inet_addr and inet_ntoa functions to convert between the binary address format and a string in dotted-decimal notation (a.b.c.d). These functions, however, work only with IPv4 addresses. Two new functionsinet_ntop and inet_ptonsupport similar functionality and work with both IPv4 and IPv6 addresses.
The inet_ntop function converts a binary address in network byte order into a text string; inet_pton converts a text string into a binary address in network byte order. Only two domain values are supported: AF_INET and AF_INET6. For inet_ntop, the size parameter specifies the size of the buffer (str) to hold the text string. Two constants are defined to make our job easier: INET_ADDRSTRLEN is large enough to hold a text string representing an IPv4 address, and INET6_ADDRSTRLEN is large enough to hold a text string representing an IPv6 address. For inet_pton, the addr buffer needs to be large enough to hold a 32-bit address if domain is AF_INET or large enough to hold a 128-bit address if domain is AF_INET6. 16.3.3. Address LookupIdeally, an application won't have to be aware of the internal structure of a socket address. If an application simply passes socket addresses around as sockaddr structures and doesn't rely on any protocol-specific features, then the application will work with many different protocols that provide the same type of service. Historically, the BSD networking software has provided interfaces to access the various network configuration information. In Section 6.7, we briefly discussed the networking data files and the functions used to access them. In this section, we discuss them in a little more detail and introduce the newer functions used to look up addressing information. The network configuration information returned by these functions can be kept in a number of places. They can be kept in static files (/etc/hosts, /etc/services, etc.), or they can be managed by a name service, such as DNS (Domain Name System) or NIS (Network Information Service). Regardless of where the information is kept, the same functions can be used to access it. The hosts known by a given computer system are found by calling gethostent.
If the host database file isn't already open, gethostent will open it. The gethostent function returns the next entry in the file. The sethostent function will open the file or rewind it if it is already open. The endhostent function will close the file. When gethostent returns, we get a pointer to a hostent structure which might point to a static data buffer that is overwritten each time we call gethostent. The hostent structure is defined to have at least the following members: struct hostent { char *h_name; /* name of host */ char **h_aliases; /* pointer to alternate host name array */ int h_addrtype; /* address type */ int h_length; /* length in bytes of address */ char **h_addr_list; /* pointer to array of network addresses */ . . . }; The addresses returned are in network byte order. Two additional functionsgethostbyname and gethostbyaddroriginally were included with the hostent functions, but are now considered to be obsolete. We'll see replacements for them shortly. We can get network names and numbers with a similar set of interfaces.
The netent structure contains at least the following fields: struct netent { char *n_name; /* network name */ char **n_aliases; /* alternate network name array pointer */ int n_addrtype; /* address type */ uint32_t n_net; /* network number */ . . . }; The network number is returned in network byte order. The address type is one of the address family constants (AF_INET, for example). We can map between protocol names and numbers with the following functions.
The protoent structure as defined by POSIX.1 has at least the following members: struct protoent { char *p_name; /* protocol name */ char **p_aliases; /* pointer to alternate protocol name array */ int p_proto; /* protocol number */ . . . }; Services are represented by the port number portion of the address. Each service is offered on a unique, well-known port number. We can map a service name to a port number with getservbyname, map a port number to a service name with getservbyport, or scan the services database sequentially with getservent.
The servent structure is defined to have at least the following members: struct servent { char *s_name; /* service name */ char **s_aliases; /* pointer to alternate service name array */ int s_port; /* port number */ char *s_proto; /* name of protocol */ . . . }; POSIX.1 defines several new functions to allow an application to map from a host name and a service name to an address and vice versa. These functions replace the older gethostbyname and gethostbyaddr functions. The getaddrinfo function allows us to map a host name and a service name to an address.
We need to provide the host name, the service name, or both. If we provide only one name, the other should be a null pointer. The host name can be either a node name or the host address in dotted-decimal notation. The getaddrinfo function returns a linked list of addrinfo structures. We can use freeaddrinfo to free one or more of these structures, depending on how many structures are linked together using the ai_next field. The addrinfo structure is defined to include at least the following members: struct addrinfo { int ai_flags; /* customize behavior */ int ai_family; /* address family */ int ai_socktype; /* socket type */ int ai_protocol; /* protocol */ socklen_t ai_addrlen; /* length in bytes of address */ struct sockaddr *ai_addr; /* address */ char *ai_canonname; /* canonical name of host */ struct addrinfo *ai_next; /* next in list */ . . . }; We can supply an optional hint to select addresses that meet certain criteria. The hint is a template used for filtering addresses and uses only the ai_family, ai_flags, ai_protocol, and ai_socktype fields. The remaining integer fields must be set to 0, and the pointer fields must be null. Figure 16.6 summarizes the flags we can use in the ai_flags field to customize how addresses and names are treated.
If getaddrinfo fails, we can't use perror or strerror to generate an error message. Instead, we need to call gai_strerror to convert the error code returned into an error message.
The getnameinfo function converts an address into a host name and a service name.
The socket address (addr) is translated into a host name and a service name. If host is non-null, it points to a buffer hostlen bytes long that will be used to return the host name. Similarly, if service is non-null, it points to a buffer servlen bytes long that will be used to return the service name. The flags argument gives us some control over how the translation is done. Figure 16.7 summarizes the supported flags.
ExampleFigure 16.8 illustrates the use of the getaddrinfo function. This program illustrates the use of the getaddrinfo function. If multiple protocols provide the given service for the given host, the program will print more than one entry. In this example, we print out the address information only for the protocols that work with IPv4 (ai_family equals AF_INET). If we wanted to restrict the output to the AF_INET protocol family, we could set the ai_family field in the hint. When we run the program on one of the test systems, we get
$ ./a.out harry nfs
flags canon family inet type stream protocol TCP
host harry address 192.168.1.105 port 2049
flags canon family inet type datagram protocol UDP
host harry address 192.168.1.105 port 2049
Figure 16.8. Print host and service information#include "apue.h" #include <netdb.h> #include <arpa/inet.h> #if defined(BSD) || defined(MACOS) #include <sys/socket.h> #include <netinet/in.h> #endif void print_family(struct addrinfo *aip) { printf(" family "); switch (aip->ai_family) { case AF_INET: printf("inet"); break; case AF_INET6: printf("inet6"); break; case AF_UNIX: printf("unix"); break; case AF_UNSPEC: printf("unspecified"); break; default: printf("unknown"); } } void print_type(struct addrinfo *aip) { printf(" type "); switch (aip->ai_socktype) { case SOCK_STREAM: printf("stream"); break; case SOCK_DGRAM: printf("datagram"); break; case SOCK_SEQPACKET: printf("seqpacket"); break; case SOCK_RAW: printf("raw"); break; default: printf("unknown (%d)", aip->ai_socktype); } } void print_protocol(struct addrinfo *aip) { printf(" protocol "); switch (aip->ai_protocol) { case 0: printf("default"); break; case IPPROTO_TCP: printf("TCP"); break; case IPPROTO_UDP: printf("UDP"); break; case IPPROTO_RAW: printf("raw"); break; default: printf("unknown (%d)", aip->ai_protocol); } } void print_flags(struct addrinfo *aip) { printf("flags"); if (aip->ai_flags == 0) { printf(" 0"); } else { if (aip->ai_flags & AI_PASSIVE) printf(" passive"); if (aip->ai_flags & AI_CANONNAME) printf(" canon"); if (aip->ai_flags & AI_NUMERICHOST) printf(" numhost"); #if defined(AI_NUMERICSERV) if (aip->ai_flags & AI_NUMERICSERV) printf(" numserv"); #endif #if defined(AI_V4MAPPED) if (aip->ai_flags & AI_V4MAPPED) printf(" v4mapped"); #endif #if defined(AI_ALL) if (aip->ai_flags & AI_ALL) printf(" all"); #endif } } int main(int argc, char *argv[]) { struct addrinfo *ailist, *aip; struct addrinfo hint; struct sockaddr_in *sinp; const char *addr; int err; char abuf[INET_ADDRSTRLEN]; if (argc != 3) err_quit("usage: %s nodename service", argv[0]); hint.ai_flags = AI_CANONNAME; hint.ai_family = 0; hint.ai_socktype = 0; hint.ai_protocol = 0; hint.ai_addrlen = 0; hint.ai_canonname = NULL; hint.ai_addr = NULL; hint.ai_next = NULL; if ((err = getaddrinfo(argv[1], argv[2], &hint, &ailist)) != 0) err_quit("getaddrinfo error: %s", gai_strerror(err)); for (aip = ailist; aip != NULL; aip = aip->ai_next) { print_flags(aip); print_family(aip); print_type(aip); print_protocol(aip); printf("\n\thost %s", aip->ai_canonname?aip->ai_canonname:"-"); if (aip->ai_family == AF_INET) { sinp = (struct sockaddr_in *)aip->ai_addr; addr = inet_ntop(AF_INET, &sinp->sin_addr, abuf, INET_ADDRSTRLEN); printf(" address %s", addr?addr:"unknown"); printf(" port %d", ntohs(sinp->sin_port)); } printf("\n"); } exit(0); } 16.3.4. Associating Addresses with SocketsThe address associated with a client's socket is of little interest, and we can let the system choose a default address for us. For a server, however, we need to associate a well-known address with the server's socket on which client requests will arrive. Clients need a way to discover the address to use to contact a server, and the simplest scheme is for a server to reserve an address and register it in /etc/services or with a name service. We use the bind function to associate an address with a socket.
There are several restrictions on the address we can use:
For the Internet domain, if we specify the special IP address INADDR_ANY, the socket endpoint will be bound to all the system's network interfaces. This means that we can receive packets from any of the network interface cards installed in the system. We'll see in the next section that the system will choose an address and bind it to our socket for us if we call connect or listen without first binding an address to the socket. We can use the getsockname function to discover the address bound to a socket.
Before calling getsockname, we set alenp to point to an integer containing the size of the sockaddr buffer. On return, the integer is set to the size of the address returned. If the address won't fit in the buffer provided, the address is silently truncated. If no address is currently bound to the socket, the results are undefined. If the socket is connected to a peer, we can find out the peer's address by calling the getpeername function.
Other than returning the peer's address, the getpeername function is identical to the getsockname function. |