Angler

LMAX Exchange

In my last couple of posts, I’ve been looking at how UDP network packets are received by the Linux kernel. While diving through the source code, it has been shown that there are a number of statistics available for monitoring receive errors, buffer overruns, and queue depths.

In the course of investigating network throughput issues in our systems at LMAX Exchange, we have written some tooling for monitoring the available statistics. The result of that work is a small utility that provides an interface for monitoring system-wide or socket-specific statistics from a Java program.

The code is available in the Angler github repository.

Who is it for?

This utility may be of use to you if you are interested in metrics and alerting around network throughput on Linux. Currently, only UDP socket monitoring is available, though we have plans to add similar functionality for TCP sockets.

Angler works by reading and parsing files in the /proc/ filesystem, and reporting metrics back to your application. It is then up to the user to determine how to handle these data accordingly. Perhaps the correct action is simply to report the numbers to a time-series database for charting or threshold alerting. Another valid use-case would be to apply back-pressure to a publishing system in the event of buffer overflow or increasing queue depth.

Angler is designed for use in latency-sensitive systems, and is garbage-free in steady state. It can, of course, be used in systems where garbage-collection is not an issue.

Available statistics

Angler offers an API to monitor individual sockets specified by either a host:port combination (an instance of java.net.InetSocketAddress), or all sockets listening to a particular IP address (an instance of java.net.InetAddress).

To begin monitoring a socket, use one of the beginMonitoring methods on UdpSocketMonitor:




Once a socket monitoring request has been made, available socket statistics will be provided to the application on the next invocation of the monitor’s poll method.

The callback method is invoked for each monitored socket reporting the receive queue depth and drop count:

System-wide statistics are available from /proc/net/softnet_stat and /proc/net/snmp. See previous posts for more information on exactly what is reported in these files.

The softnet data is provided by SoftnetStatsMonitor, and is made available to the following callback method:

Changes in these numbers can indicate that the Linux worker threads are not getting enough time to dequeue incoming packets from the network device.

SNMP data is provided by SystemNetworkManagementMonitor, and is provided on the following callback:

These statistics report a global view of receive errors, which could be caused by buffer overruns, memory exhausation or other factors.

A complete example of these methods can be found in the ExampleApplication.

Production use

At LMAX Exchange, we have been using Angler in production for some time, so consider it production-ready. We poll the files in /proc/ at up to 100 times per second on some services, in order to get a more fine-grained view of receive buffer depths. So far, we have not encountered any issues with this approach; a careful review of the kernel source code responsible for supplying the statistics indicates only a very small change of lock contention.

Version 1.0.3 is currently available on maven central.

Contributions and feedback are welcome!

//

Any opinions, news, research, analyses, prices or other information ("information") contained on this Blog, constitutes marketing communication and it has not been prepared in accordance with legal requirements designed to promote the independence of investment research. Further, the information contained within this Blog does not contain (and should not be construed as containing) investment advice or an investment recommendation, or an offer of, or solicitation for, a transaction in any financial instrument. LMAX Group has not verified the accuracy or basis-in-fact of any claim or statement made by any third parties as comments for every Blog entry.

LMAX Group will not accept liability for any loss or damage, including without limitation to, any loss of profit, which may arise directly or indirectly from use of or reliance on such information. No representation or warranty is given as to the accuracy or completeness of the above information. While the produced information was obtained from sources deemed to be reliable, LMAX Group does not provide any guarantees about the reliability of such sources. Consequently any person acting on it does so entirely at his or her own risk. It is not a place to slander, use unacceptable language or to promote LMAX Group or any other FX and CFD provider and any such postings, excessive or unjust comments and attacks will not be allowed and will be removed from the site immediately.