TCP and resolving the Address Already in Use Issue.

This article assumes that the issue is due to the exhaustion of ephemeral ports and not that there is already a service running on the port that this application is trying to bind to leading to this error. On Linux, run “netstat -anp” to confirm that that is not the case.

We have come long and far since the introduction of TCP (RFC 791, 793). It has become the ubiquitous protocol governing communication amongst various applications (including and not limited to Email-SMTP, File Transfer – FTP, HTTP etc.). It is so prevalent and also so visible that an application developer hidden from the complexities of TCP programming is befuddled by the obtuse errors that are generated if and when they are. The following paragraphs will attempt to shine light on one such common message: “Address Already in Use”. Why does it happen, how to resolve this, how to identify the reason – are the questions that hopefully would be answered once you read through this.

An application developer might be using a toolkit to abstract the application from the protocol specificity, sockets etc. An instance would be Apache HTTPClient that has a Java API that consumers can invoke and not have to deal with the under the hood details of sockets getting created, sockets opened, data transfer, sockets getting closed etc. But even using such a toolkit will not preclude a protocol error from being reported. And of course, that would need to be investigated and fixed. How soon and confidently will it be fixed depends on some working knowledge of TCP that is coming your way.

There are many texts on TCP that cover all the manifold details and the nuts and bolts- we are not going there. Still interested? Please read on.

TCP Overview

Transmission Control Protocol (TCP) provides guaranteed delivery semantics and that is what makes it complicated than its cousin – User Datagram Protocol (UDP) that is not covered here. TCP stresses on reliable, in-sequence delivery. It achieves that through

  1. Being connection-oriented. Before any piece of data (termed as a packet) is transferred from the sender to the receiver, a connection has to be established. Once the data transfer is completed, the connection is terminated.

    From the illustration above, the client initiates the connection and transmit’s a SYN message for which the server responds with a SYN-ACK. The client on receipt of that transmit’s the ACK message and the connection moves to an ESTABLISHED state on the client. Once the server receives the ACK message, the connection stands ESTABLISHED for the server as well.

  2. Using a sequence number to identify each packet. A sequence no is specified to be used by the protocol so as to provide for in-order and guaranteed delivery.

An analogy to easily comprehend TCP is a telephone call. If Jack needs to speak with Mary, he will dial her number, say hello and then would follow some interactive communication. Thereafter either Jack or Mary will initiate the end of conversation, say bye and so long. Thereafter the connection will be closed and call will be completed.

This brings us to the closing the connection once the communication is over. There is an elaborate mechanism for when the connection is closed. In the illustration below, the client initiates closing (Step 1). The server responds with an ACK (Step 2) and changes state to CLOSE_WAIT. Step 3 is the server transmitting its own FIN message and changes state to LAST_ACK wherein it remains waiting for the ACK from the client for the message so sent. Step 4 is that ACK to the message. As a consequence of Step 4, the client changes state to TIME_WAIT. Once the server receives the last ACK, it moves the connection to closed state.

From what is depicted in the illustration and associated explanation, the following subtle facets need to be clearly defined for the purpose of resolving the “Address Already in Use” issue:

  1. The side that initiates the close moves to the TIME_WAIT state and the other moves to the CLOSE_WAIT state.

  2. The TIME_WAIT state is configurable and its value is set with the network latency in mind. The client remains in TIME_WAIT state (the socket on the client, to be precise) for a particular set period of time. During this time, the socket, which is a limited resource, is tied down and not available for another connection. Different operating systems’ TCP stacks have different values for a socket to be in this state. This could range from 4 minutes to 30 secs and even lower. BSD’s default is 30 seconds. For Windows, Solaris, Linux it can have different default values. Also as suggested earlier, the default can be overridden and another value specified. Please see the OS help or man pages for setting a value for TIME_WAIT.

Significance of TIME_WAIT

There is a major significance built into the TCP specification for the need of having a TIME_WAIT. A socket that is in TIME_WAIT is, as the name suggests, waiting for the retransmission of a FIN (Step 3) in case the server did not receive the last ACK (Step 4). In such a case, the server will retransmit FIN. The TIME_WAIT state takes care of this eventuality and the client can then respond with the final ACK (Step 4). Let me provide another instance of the requirement of TIME_WAIT. If a new connection is established between the same pair of client socket and server socket addresses, then a message from the preceding connection might still be in the network and is delivered. This message from a (now) closed connection could then be (mistakenly) assumed to be one of the messages being exchanged within the new connection. This will break the protocol in terms of applications receiving messages not intended and incorrectly delivered to them and corrupt the data stream.

TIME_WAIT and 2*MSL:

[Describe]

How many ports are there:

Under the hood, data flows through “sockets”. A socket is a communications end point and is associated with zero or more ports. For the purposes of the discussion, we will assume that a connection from a client to a server requires one port and only one port. Ports are finite resources on a system and only so many are available.

Port numbers are unsigned 16 bit and therefore only 65535 are available (0 is a reserved port). A TCP socket can bind to any one of the available ports. However, there are three buckets that holds the port numbers:

  1. Well Known Ports. These are in the range from 0 to 1023. In UNIX systems an application may require administrative privileges. There are well-defined application protocol ports as well. For instance: SMTP is on port 25, HTTP is on port 80, Telnet is on 23 etc.

  2. Registered Ports. They are in the range from 1024 to 49151. The list of registered ports is maintained by the IANA (Internet Assigned Numbers Authority).

  3. Dynamic / Private ports between 49152 and 65535. The TCP stack picks these when the calling application does not specify a port to bind to.

For instance: a web browser client would allow the TCP stack to pick a port number to connect to the Web Server running on port 80. The Web Browser client will be assigned a port in the range specified by “Dynamic / Private ports” viz. between 49152 and 65535. Note that this assignment if temporary (ephemeral) and is only for the duration of the connection between the Web Browser and the Web Server.

The “Address Already in Use” issue

When the ports in TIME_WAIT climb up and new connections have to be created then the conditions are ripe for the “Address Already in Use” issue to crop up. This implies that there are no available ports on which to create new connections. The free list of ports has dried up.

In order to establish how many new connections can be created given a no of ports, we need to know the period that a port will spend in TIME_WAIT state. This as discussed earlier can range from 4 minutes to 30 seconds and this value can be overridden as well. For this example, we will assume that the value is 30 seconds.

No of ports available: 65535 – 49152 = 16384. [Note: there would be lesser number than these ports available since there might be used by other applications]

If the TIME_WAIT setting is for 30 seconds, then a port would be available for reuse in 30 seconds. That gives us a connection rate for the total available ports to be 16384/30 => around 546 connections per second. If the connection rate exceeds this then the available ports would be exhausted and this message would appear.

How to resolve the issue
  1. The first recommendation would be to check for leaks. Perhaps connections are not being closed. If new connections are being created and old ones remain open then soon enough there would not be any more sockets available.

  2. Is there a way in which connection pooling can be utilized? If the client is connecting to the same or the same set of servers then this is a prime taker for pooling connections. Of course this might not be possible. Once case where this would not work is if JNDI is being used to authenticate a user to an LDAP store. If there are 100 distinct requests for authentications that in JNDI require 100 binds to the LDAP directory then 100 different connections will be opened up. However, it might be possible not to bind to authenticate a user but do a compare of the credentials. This approach would not require 100 connections for 100 authentications since there are no binds taking place. Please confirm if this is possible with the LDAP directory flavor that is in use.
    If using HTTPClient, then use the MultiThreadedHttpConnectionManager to pool connections.

  3. Another question to ask is whether the application can be designed in a way so as to not to require so many connections. Perhaps connections can be reused and kept alive. Of course these suggestions only make sense if this issue has surfaced early in the life cycle and there is time and the liberty to revisit the design.

  4. Reduce the setting for the period that a socket has to be in TIME_WAIT state. This is tricky in such reducing it to a lower value might result in the situation described earlier wherein a packet that was transmitted in a previous reincarnation of a connection between the same hosts arrives to the new reincarnation of the connection.

  5. Setting SO_REUSEADDR option on the socket as true that socket to bind to a local port that is in the TIME_WAIT state.

References:

TCP/IP Illustrated Vol 1 and Vol 2– Gary Wright and Richard Stevens

http://www.iana.org/assignments/port-numbers