Monday, January 9, 2017

The magic 40 milliseconds deplay in TCP socket programming

Recently we have programmed a protocol based on TCP socket programming, and then built a service based on the protocol.

During the benchmarking, we found that the service itself is very fast at about 100 microseconds per request, while after benched with the protocol, the speed is much much slower at about 80 milliseconds per request.
After some investigation, we narrowed the problem down to the TCP network layer. In our protocol, we are using a so-called pack4 TCP, which means we are sending a 4-byte integrate before every chuck of data to indicate the number of bytes of the chuck. We thus need to call write() twice to send any data, i.e., we need to do write-write-read on the client side to send a request. This pattern actually causes the 40ms delay, as the default TCP implementation has a method to improve the performance by acknowledging consecutive packets in one packet with a timeout limit 40ms, if the received packets are small enough. As a result, the first packet int he pack4 TCP protocol would not be acknowledged within 40ms, as the first packet is only 4 bytes. This could slow the service badly.

The fix is simple enough that you just need to merge the two write calls together, i.e., sending the data length and the data in one TCP packet.

The Reference also has explained a similar problem in more details.






References:

http://jerrypeng.me/2013/08/mythical-40ms-delay-and-tcp-nodelay/

No comments: