I am working on a system that runs several processes, which communicate with each other using sockets. In particular, one process, the server, (among other things) reads frames from a camera and another, a client, writes them to a file (and does other things).

I was having no problem when working with FPS rates such as 10, 15 or 30. But once I started using a camera capable of delivering 120 FPS, things went south: as soon as the frame-writing process connected to the camera-reading one, the FPS rate dropped from 120 to about 50.

After checking many things (Is the disk slow? Is the codec slow? Are there any other processes interfering? Etc.), I found out that the server was waiting for an answer from the client, and the client was not sending the answer because it was still waiting for data from the server. There was no deadlock, to be sure, just a huge delay: the server had already sent the data, but the client wouldn’t receive it for about 40ms.

It turns out that this is a bad result from Nagle’s algorithm and delayed ACKs interacting.

Delayed ACKs work like this, in my context: when receiving data, the client may combine several ACKs in one, instead of sending an ACK for every packet it receives. Not a bad idea in principle; if the client will just keep on receiving data, just send a combined ACK every now and then.

Nagle’s algorithm works like this, again in my context: when sending data, the server may combine several small writes into a larger buffer, thus reducing the number of packets carrying very little information. The algorithm uses ACKs to determine when data should be sent: as long as there are unacknowledged packets, the server won’t send new data. Not a bad idea in principle.

The two algorithms can interact badly, as they did in my case. The server has already “sent” the data: it has called send several times. The client has received some of the data, but the operating system has not sent an ACK yet, just in case there might be an opportunity to combine some ACKs. The client is waiting for more data from the server, but the data is sitting in an operating system buffer on behalf of the sender. Finally, after about 40ms, the client decides it’s not worth it to wait any longer and just sends an ACK. The server receives the ACK and the operating system immediately flushes the output buffer, now that an ACK has been received.

One solution is to disable Nagle’s algorithm — which is easier to do in Linux than disabling delayed ACKs. This can be done by calling setsockopt:

int flag = 1;
int opt = ::setsockopt(m_fd, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));