Section 16.3. Addressing

16.3. Addressing

In the previous section, we learned how to create and destroy a socket. Before we learn to do something useful with a socket, we need to learn how to identify the process that we want to communicate with. Identifying the process has two components. The machine's network address helps us identify the computer on the network we wish to contact, and the service helps us identify the particular process on the computer.

16.3.1. Byte Ordering

When communicating with processes running on the same computer, we generally don't have to worry about byte ordering. The byte order is a characteristic of the processor architecture, dictating how bytes are ordered within larger data types, such as integers. Figure 16.4 shows how the bytes within a 32-bit integer are numbered.

Figure 16.4. Byte order in a 32-bit integer

If the processor architecture supports big-endian byte order, then the highest byte address occurs in the least significant byte (LSB). Little-endian byte order is the opposite: the least significant byte contains the lowest byte address. Note that regardless of the byte ordering, the most significant byte (MSB) is always on the left, and the least significant byte is always on the right. Thus, if we were to assign a 32-bit integer the value 0x04030201, the most significant byte would contain 4, and the least significant byte would contain 1, regardless of the byte ordering. If we were then to cast a character pointer (cp) to the address of the integer, we would see a difference from the byte ordering. On a little-endian processor, cp[0] would refer to the least significant byte and contain 1; cp[3] would refer to the most significant byte and contain 4. Compare that to a big-endian processor, where cp[0] would contain 4, referring to the most significant byte, and cp[3] would contain 1, referring to the least significant byte. Figure 16.5 summarizes the byte ordering for the four platforms discussed in this text.

Figure 16.5. Byte order for test platforms
Operating system
Processor architecture
Byte order
FreeBSD 5.2.1
Intel Pentium
little-endian
Linux 2.4.22
Intel Pentium
little-endian
Mac OS X 10.3
PowerPC
big-endian
Solaris 9
Sun SPARC
big-endian

To confuse matters further, some processors can be configured for either little-endian or big-endian operation.

Network protocols specify a byte ordering so that heterogeneous computer systems can exchange protocol information without confusing the byte ordering. The TCP/IP protocol suite uses big-endian byte order. The byte ordering becomes visible to applications when they exchange formatted data. With TCP/IP, addresses are presented in network byte order, so applications sometimes need to translate them between the processor 's byte order and the network byte order. This is common when printing an address in a human-readable form, for example.

Four common functions are provided to convert between the processor byte order and the network byte order for TCP/IP applications.

#include <arpa/inet.h> uint32_t htonl(uint32_t hostint32);

Returns: 32-bit integer in network byte order

uint16_t htons(uint16_t hostint16);

Returns: 16-bit integer in network byte order

uint32_t ntohl(uint32_t netint32);

Returns: 32-bit integer in host byte order

uint16_t ntohs(uint16_t netint16);

Returns: 16-bit integer in host byte order

The h is for "host" byte order, and the n is for "network" byte order. The l is for "long" (i.e., 4-byte) integer, and the s is for "short" (i.e., 2-byte) integer. These four functions are defined in <arpa/inet.h>, although some older systems define them in <netinet/in.h>.

16.3.2. Address Formats

An address identifies a socket endpoint in a particular communication domain. The address format is specific to the particular domain. So that addresses with different formats can be passed to the socket functions, the addresses are cast to a generic sockaddr address structure:

   struct sockaddr {
     sa_family_t   sa_family;   /* address family */
     char          sa_data[];   /* variable-length address */
     .
     .
     .
   };

Implementations are free to add additional members and define a size for the sa_data member. For example, on Linux, the structure is defined as

   struct sockaddr {
     sa_family_t  sa_family;     /* address family */
     char         sa_data[14];   /* variable-length address */
   };

But on FreeBSD, the structure is defined as

   struct sockaddr {
     unsigned char  sa_len;        /* total length */
     sa_family_t    sa_family;     /* address family */
     char           sa_data[14];   /* variable-length address */
   };

Internet addresses are defined in <netinet/in.h>. In the IPv4 Internet domain (AF_INET), a socket address is represented by a sockaddr_in structure:

   struct in_addr {
     in_addr_t       s_addr;       /* IPv4 address */
   };

   struct sockaddr_in {
     sa_family_t    sin_family;   /* address family */
     in_port_t      sin_port;     /* port number */
     struct in_addr sin_addr;     /* IPv4 address */
   };

The in_port_t data type is defined to be a uint16_t. The in_addr_t data type is defined to be a uint32_t. These integer data types specify the number of bits in the data type and are defined in <stdint.h>.

In contrast to the AF_INET domain, the IPv6 Internet domain (AF_INET6) socket address is represented by a sockaddr_in6 structure:

   struct in6_addr {
     uint8_t        s6_addr[16];     /* IPv6 address */
   };

   struct sockaddr_in6 {
     sa_family_t     sin6_family;     /* address family */
     in_port_t       sin6_port;       /* port number */
     uint32_t        sin6_flowinfo;   /* traffic class and flow info */
     struct in6_addr sin6_addr;       /* IPv6 address */
     uint32_t        sin6_scope_id;   /* set of interfaces for scope */
   };

These are the definitions required by the Single UNIX Specification. Individual implementations are free to add additional fields. For example, on Linux, the sockaddr_in structure is defined as

   struct sockaddr_in {
     sa_family_t     sin_family;     /* address family */
     in_port_t       sin_port;       /* port number */
     struct in_addr  sin_addr;       /* IPv4 address */
     unsigned char   sin_zero[8];    /* filler */
   };

where the sin_zero member is a filler field that should be set to all-zero values.

Note that although the sockaddr_in and sockaddr_in6 structures are quite different, they are both passed to the socket routines cast to a sockaddr structure. In Section 17.3, we will see that the structure of a UNIX domain socket address is different from both of the Internet domain socket address formats.

It is sometimes necessary to print an address in a format that is understandable by a person instead of a computer. The BSD networking software included the inet_addr and inet_ntoa functions to convert between the binary address format and a string in dotted-decimal notation (a.b.c.d). These functions, however, work only with IPv4 addresses. Two new functionsinet_ntop and inet_ptonsupport similar functionality and work with both IPv4 and IPv6 addresses.

[View full width]
#include <arpa/inet.h> const char *inet_ntop(int domain, const void *restrict addr, char *restrict str, socklen_t size);

Returns: pointer to address string on success, NULL on error

int inet_pton(int domain, const char *restrict str, void *restrict addr);

Returns: 1 on success, 0 if the format is invalid, or 1 on error

The inet_ntop function converts a binary address in network byte order into a text string; inet_pton converts a text string into a binary address in network byte order. Only two domain values are supported: AF_INET and AF_INET6.

For inet_ntop, the size parameter specifies the size of the buffer (str) to hold the text string. Two constants are defined to make our job easier: INET_ADDRSTRLEN is large enough to hold a text string representing an IPv4 address, and INET6_ADDRSTRLEN is large enough to hold a text string representing an IPv6 address. For inet_pton, the addr buffer needs to be large enough to hold a 32-bit address if domain is AF_INET or large enough to hold a 128-bit address if domain is AF_INET6.

16.3.3. Address Lookup

Ideally, an application won't have to be aware of the internal structure of a socket address. If an application simply passes socket addresses around as sockaddr structures and doesn't rely on any protocol-specific features, then the application will work with many different protocols that provide the same type of service.

Historically, the BSD networking software has provided interfaces to access the various network configuration information. In Section 6.7, we briefly discussed the networking data files and the functions used to access them. In this section, we discuss them in a little more detail and introduce the newer functions used to look up addressing information.

The network configuration information returned by these functions can be kept in a number of places. They can be kept in static files (/etc/hosts, /etc/services, etc.), or they can be managed by a name service, such as DNS (Domain Name System) or NIS (Network Information Service). Regardless of where the information is kept, the same functions can be used to access it.

The hosts known by a given computer system are found by calling gethostent.

#include <netdb.h> struct hostent *gethostent(void);

Returns: pointer if OK, NULL on error

void sethostent(int stayopen); void endhostent(void);

If the host database file isn't already open, gethostent will open it. The gethostent function returns the next entry in the file. The sethostent function will open the file or rewind it if it is already open. The endhostent function will close the file.

When gethostent returns, we get a pointer to a hostent structure which might point to a static data buffer that is overwritten each time we call gethostent. The hostent structure is defined to have at least the following members:

   struct hostent {
     char   *h_name;       /* name of host */
     char  **h_aliases;    /* pointer to alternate host name array */
     int     h_addrtype;   /* address type */
     int     h_length;     /* length in bytes of address */
     char  **h_addr_list;  /* pointer to array of network addresses */
     .
     .
     .
   };

The addresses returned are in network byte order.

Two additional functionsgethostbyname and gethostbyaddroriginally were included with the hostent functions, but are now considered to be obsolete. We'll see replacements for them shortly.

We can get network names and numbers with a similar set of interfaces.

#include <netdb.h> struct netent *getnetbyaddr(uint32_t net, int type); struct netent *getnetbyname(const char *name); struct netent *getnetent(void);

All return: pointer if OK, NULL on error

void setnetent(int stayopen); void endnetent(void);

The netent structure contains at least the following fields:

   struct netent {
     char     *n_name;      /* network name */
     char    **n_aliases;   /* alternate network name array pointer */
     int       n_addrtype;  /* address type */
     uint32_t  n_net;       /* network number */
     .
     .
     .
   };

The network number is returned in network byte order. The address type is one of the address family constants (AF_INET, for example).

We can map between protocol names and numbers with the following functions.

#include <netdb.h> struct protoent *getprotobyname(const char *name); struct protoent *getprotobynumber(int proto); struct protoent *getprotoent(void);

All return: pointer if OK, NULL on error

void setprotoent(int stayopen); void endprotoent(void);

The protoent structure as defined by POSIX.1 has at least the following members:

   struct protoent {
     char   *p_name;     /* protocol name */
     char  **p_aliases;  /* pointer to alternate protocol name array */
     int     p_proto;    /* protocol number */
     .
     .
     .
   };

Services are represented by the port number portion of the address. Each service is offered on a unique, well-known port number. We can map a service name to a port number with getservbyname, map a port number to a service name with getservbyport, or scan the services database sequentially with getservent.

[View full width]
#include <netdb.h> struct servent *getservbyname(const char *name, const char *proto); struct servent *getservbyport(int port, const char *proto); struct servent *getservent(void);

All return: pointer if OK, NULL on error

void setservent(int stayopen); void endservent(void);

The servent structure is defined to have at least the following members:

   struct servent {
     char   *s_name;      /* service name */
     char  **s_aliases;   /* pointer to alternate service name array */
     int     s_port;      /* port number */
     char   *s_proto;     /* name of protocol */
     .
     .
     .
   };

POSIX.1 defines several new functions to allow an application to map from a host name and a service name to an address and vice versa. These functions replace the older gethostbyname and gethostbyaddr functions.

The getaddrinfo function allows us to map a host name and a service name to an address.

#include <sys/socket.h> #include <netdb.h> int getaddrinfo(const char *restrict host, const char *restrict service, const struct addrinfo *restrict hint, struct addrinfo **restrict res);

Returns: 0 if OK, nonzero error code on error

void freeaddrinfo(struct addrinfo *ai);

We need to provide the host name, the service name, or both. If we provide only one name, the other should be a null pointer. The host name can be either a node name or the host address in dotted-decimal notation.

The getaddrinfo function returns a linked list of addrinfo structures. We can use freeaddrinfo to free one or more of these structures, depending on how many structures are linked together using the ai_next field.

The addrinfo structure is defined to include at least the following members:

   struct addrinfo {
     int               ai_flags;       /* customize behavior */
     int               ai_family;      /* address family */
     int               ai_socktype;    /* socket type */
     int               ai_protocol;    /* protocol */
     socklen_t         ai_addrlen;     /* length in bytes of address */
     struct sockaddr  *ai_addr;        /* address */
     char             *ai_canonname;   /* canonical name of host */
     struct addrinfo  *ai_next;        /* next in list */
     .
     .
     .
   };

We can supply an optional hint to select addresses that meet certain criteria. The hint is a template used for filtering addresses and uses only the ai_family, ai_flags, ai_protocol, and ai_socktype fields. The remaining integer fields must be set to 0, and the pointer fields must be null. Figure 16.6 summarizes the flags we can use in the ai_flags field to customize how addresses and names are treated.

Figure 16.6. Flags for addrinfo structure
Flag
Description
AI_ADDRCONFIG
Query for whichever address type (IPv4 or IPv6) is configured.
AI_ALL
Look for both IPv4 and IPv6 addresses (used only with AI_V4MAPPED).
AI_CANONNAME
Request a canonical name (as opposed to an alias).
AI_NUMERICHOST
Return the host address in numeric format.
AI_NUMERICSERV
Return the service as a port number.
AI_PASSIVE
Socket address is intended to be bound for listening.
AI_V4MAPPED
If no IPv6 addresses are found, return IPv4 addresses mapped in IPv6 format.

If getaddrinfo fails, we can't use perror or strerror to generate an error message. Instead, we need to call gai_strerror to convert the error code returned into an error message.

#include <netdb.h> const char *gai_strerror(int error);

Returns: a pointer to a string describing the error

The getnameinfo function converts an address into a host name and a service name.

[View full width]
#include <sys/socket.h> #include <netdb.h> int getnameinfo(const struct sockaddr *restrict addr, socklen_t alen, char *restrict host, socklen_t hostlen, char *restrict service, socklen_t servlen, unsigned int flags);

Returns: 0 if OK, nonzero on error

The socket address (addr) is translated into a host name and a service name. If host is non-null, it points to a buffer hostlen bytes long that will be used to return the host name. Similarly, if service is non-null, it points to a buffer servlen bytes long that will be used to return the service name.

The flags argument gives us some control over how the translation is done. Figure 16.7 summarizes the supported flags.

Figure 16.7. Flags for the getnameinfo function
Flag
Description
NI_DGRAM
The service is datagram based instead of stream based.
NI_NAMEREQD
If the host name can't be found, treat this as an error.
NI_NOFQDN
Return only the node name portion of the fully-qualified domain name for local hosts.
NI_NUMERICHOST
Return the numeric form of the host address instead of the name.
NI_NUMERICSERV
Return the numeric form of the service address (i.e., the port number) instead of the name.

Example

Figure 16.8 illustrates the use of the getaddrinfo function.

This program illustrates the use of the getaddrinfo function. If multiple protocols provide the given service for the given host, the program will print more than one entry. In this example, we print out the address information only for the protocols that work with IPv4 (ai_family equals AF_INET). If we wanted to restrict the output to the AF_INET protocol family, we could set the ai_family field in the hint.

When we run the program on one of the test systems, we get

   $ ./a.out harry nfs
   flags canon family inet type stream protocol TCP
       host harry address 192.168.1.105 port 2049
   flags canon family inet type datagram protocol UDP
       host harry address 192.168.1.105 port 2049

Figure 16.8. Print host and service information

#include "apue.h"
#include <netdb.h>
#include <arpa/inet.h>
#if defined(BSD) || defined(MACOS)
#include <sys/socket.h>
#include <netinet/in.h>
#endif

void
print_family(struct addrinfo *aip)
{
    printf(" family ");
    switch (aip->ai_family) {
    case AF_INET:
        printf("inet");
        break;
    case AF_INET6:
        printf("inet6");
        break;
    case AF_UNIX:
        printf("unix");
        break;
    case AF_UNSPEC:
        printf("unspecified");
        break;
    default:
        printf("unknown");
    }

}
void
print_type(struct addrinfo *aip)
{
    printf(" type ");
    switch (aip->ai_socktype) {
    case SOCK_STREAM:
        printf("stream");
        break;
    case SOCK_DGRAM:
        printf("datagram");
        break;
    case SOCK_SEQPACKET:
        printf("seqpacket");
        break;
    case SOCK_RAW:
        printf("raw");
        break;
    default:
        printf("unknown (%d)", aip->ai_socktype);
    }
}

void
print_protocol(struct addrinfo *aip)
{
    printf(" protocol ");
    switch (aip->ai_protocol) {
    case 0:
        printf("default");
        break;
    case IPPROTO_TCP:
        printf("TCP");
        break;
    case IPPROTO_UDP:
        printf("UDP");
        break;
    case IPPROTO_RAW:
        printf("raw");
        break;
    default:
        printf("unknown (%d)", aip->ai_protocol);
    }
}

void
print_flags(struct addrinfo *aip)
{
    printf("flags");
    if (aip->ai_flags == 0) {
        printf(" 0");

    } else {
        if (aip->ai_flags & AI_PASSIVE)
            printf(" passive");
        if (aip->ai_flags & AI_CANONNAME)
            printf(" canon");
        if (aip->ai_flags & AI_NUMERICHOST)
            printf(" numhost");
#if defined(AI_NUMERICSERV)
        if (aip->ai_flags & AI_NUMERICSERV)
            printf(" numserv");
#endif
#if defined(AI_V4MAPPED)
        if (aip->ai_flags & AI_V4MAPPED)
            printf(" v4mapped");
#endif
#if defined(AI_ALL)
        if (aip->ai_flags & AI_ALL)
            printf(" all");
#endif
    }
}
int
main(int argc, char *argv[])
{
    struct addrinfo     *ailist, *aip;
    struct addrinfo     hint;
    struct sockaddr_in  *sinp;
    const char          *addr;
    int                 err;
    char                abuf[INET_ADDRSTRLEN];

    if (argc != 3)
        err_quit("usage: %s nodename service", argv[0]);
    hint.ai_flags = AI_CANONNAME;
    hint.ai_family = 0;
    hint.ai_socktype = 0;
    hint.ai_protocol = 0;
    hint.ai_addrlen = 0;
    hint.ai_canonname = NULL;
    hint.ai_addr = NULL;
    hint.ai_next = NULL;
    if ((err = getaddrinfo(argv[1], argv[2], &hint, &ailist)) != 0)
        err_quit("getaddrinfo error: %s", gai_strerror(err));
    for (aip = ailist; aip != NULL; aip = aip->ai_next) {
        print_flags(aip);
        print_family(aip);
        print_type(aip);
        print_protocol(aip);
        printf("\n\thost %s", aip->ai_canonname?aip->ai_canonname:"-");
        if (aip->ai_family == AF_INET) {

           sinp = (struct sockaddr_in *)aip->ai_addr;
           addr = inet_ntop(AF_INET, &sinp->sin_addr, abuf,
               INET_ADDRSTRLEN);
           printf(" address %s", addr?addr:"unknown");
           printf(" port %d", ntohs(sinp->sin_port));
        }
        printf("\n");
    }
    exit(0);
}

16.3.4. Associating Addresses with Sockets

The address associated with a client's socket is of little interest, and we can let the system choose a default address for us. For a server, however, we need to associate a well-known address with the server's socket on which client requests will arrive. Clients need a way to discover the address to use to contact a server, and the simplest scheme is for a server to reserve an address and register it in /etc/services or with a name service.

We use the bind function to associate an address with a socket.

[View full width]
#include <sys/socket.h> int bind(int sockfd, const struct sockaddr *addr, socklen_t len);

Returns: 0 if OK, 1 on error

There are several restrictions on the address we can use:

The address we specify must be valid for the machine on which the process is running; we can't specify an address belonging to some other machine.
The address must match the format supported by the address family we used to create the socket.
The port number in the address cannot be less than 1,024 unless the process has the appropriate privilege (i.e., is the superuser).
Usually, only one socket endpoint can be bound to a given address, although some protocols allow duplicate bindings.

For the Internet domain, if we specify the special IP address INADDR_ANY, the socket endpoint will be bound to all the system's network interfaces. This means that we can receive packets from any of the network interface cards installed in the system. We'll see in the next section that the system will choose an address and bind it to our socket for us if we call connect or listen without first binding an address to the socket.

We can use the getsockname function to discover the address bound to a socket.

[View full width]
#include <sys/socket.h> int getsockname(int sockfd, struct sockaddr *restrict addr, socklen_t *restrict alenp);

Returns: 0 if OK, 1 on error

Before calling getsockname, we set alenp to point to an integer containing the size of the sockaddr buffer. On return, the integer is set to the size of the address returned. If the address won't fit in the buffer provided, the address is silently truncated. If no address is currently bound to the socket, the results are undefined.

If the socket is connected to a peer, we can find out the peer's address by calling the getpeername function.

[View full width]
#include <sys/socket.h> int getpeername(int sockfd, struct sockaddr *restrict addr, socklen_t *restrict alenp);

Returns: 0 if OK, 1 on error

Other than returning the peer's address, the getpeername function is identical to the getsockname function.

16.3. Addressing

16.3.1. Byte Ordering

Figure 16.4. Byte order in a 32-bit integer

Figure 16.5. Byte order for test platforms

16.3.2. Address Formats

16.3.3. Address Lookup

Figure 16.6. Flags for addrinfo structure

Figure 16.7. Flags for the getnameinfo function

Example

Figure 16.8. Print host and service information

16.3.4. Associating Addresses with Sockets

Figure 16.7. Flags for the `getnameinfo` function