So far we have:
- Talked about the regulations and how we might solve this with Linux software
- Built a “PTP Bridge” with Puppet
- Started recording metrics with collectd and InfluxDB, and
- Finished recording metrics
- Drawn lots of graphs with Grafana and found contention on our firewall
In this post we will look at introducing a second ASIC based firewall to route PTP through so that PTP accuracy is not affected by our management firewall getting busy.
The New Design
Here is the adjusted design diagram, it’s not that much more complicated:
Routing To Another Firewall
In order for this design to work every PTP Client must send it’s PTP traffic back through the PTP firewall, rather than the management firewall. Each client needs a routing rule to reflect this. There are similar changes needed on the PTP Bridge to send through the PTP firewall.
With the new firewall in place and the routing changes applied, everything should “just work”, but it doesn’t. We see multicast leaving the PTP Bridge correctly:
[root@ptp-bridge ~]# tcpdump -i p4p3 multicast -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on p4p3, link-type EN10MB (Ethernet), capture size 65535 bytes
16:10:07.519357 IP 192.168.0.4.319 > 224.0.1.129.319: UDP, length 44
16:10:07.519405 IP 192.168.0.4.320 > 224.0.1.129.320: UDP, length 44
Restart sfptpd on a client and we see it cannot subscribe to the multicast group:
2015-12-15 16:01:18.306138: info: using SO_TIMESTAMPNS software timestamps
2015-12-15 16:01:18.406259: notice: Now in state: PTP_LISTENING
2015-12-15 16:01:18.406353: info: creating NTP sync-module
2015-12-15 16:01:30.306177: warning: failed to receive Announce within 12.000 seconds
A packet trace shows that we see PIM announcements from the right firewall, and the IGMPv3 packets from sfptpd are leaving the client:
16:07:15.444343 IP 192.168.0.2 > 224.0.0.13: PIMv2, Hello, length 34
16:07:18.307942 IP 192.168.0.5 > 224.0.0.22: igmp v3 report, 2 group record(s)
16:07:18.369939 IP 192.168.0.5 > 224.0.0.22: igmp v3 report, 2 group record(s)
16:07:18.686936 IP 192.168.0.5 > 224.0.0.22: igmp v3 report, 2 group record(s)
Looking at the multicast groups on the new firewall, we don’t have the PTP multicast addresses appearing as registered groups that the firewall can forward.
To cut a very, very long story short, this was a TTL problem compounded by two firewall brands behaving differently. The ptp4l process sets the TTL of it’s multicast packets to 1 to restrict them to the same subnet. Our management firewall accepted this and did the multicast forwarding anyway, the PTP firewall did not. Setting udp_ttl=2 for the ptp4l process that’s doing the multicast fixed this. The lmaxexchange-linuxptp module has been changed to take this option now too.
Our mean accuracy has improved substantially – at a glance the hardware change has saved 15% of the tolerance the regulations allow.
There are interesting patterns in the most recent data; regular and somewhat uniform spikes in client offset are seen on both clients. The graph below draws both client offsets together:
The spikes are recorded at almost exactly the same second, they last for a few seconds, they appear to come in groups of two and are generally over 9000ns in size. Identifying how regular these spikes are may indicate what the problem is. If they are perfectly regular on both clients, I would start to think it might be something on the firewall, like an administrative or housekeeping function.
We can use an InfluxDB query to get the raw data:
SELECT “time”,”value” FROM “tail_ns” WHERE “instance” = ‘sfptpd_stats’ AND “type_instance” = ‘offset-system’ AND (“host” = ‘client1’) AND ( “value” > ‘0’ OR “value” < ‘0’ ) AND time >= ‘2016-02-05 09:43:19’ AND value > 9000
Now with a little bit of Python we can work out the difference in time between spikes:
Unfortunately there’s not as much uniformity in the spikes as I was hoping to see:
$ python diff.py < foo | awk ‘{print $4}’ | sort | uniq -c | sort -n | tail
6 0:01:08
6 0:01:10
6 0:01:14
6 0:01:16
8 0:01:00
8 0:01:03
8 0:01:17
9 0:01:04
9 0:01:05
10 0:01:06
The most common time difference is around 60-70 seconds, which is between the individual spikes in a pair of two. The difference between the pairs of spikes themselves though is quite varied, anywhere from 2 minutes to 8 minutes apart. This doesn’t mean it’s not the firewall, but this data doesn’t reliably pin the problem to some X minute recurring job in the internals of the firewall’s operating system. The network engineers in my team are also quick to point out that this is an ASIC based firewall that’s doing nothing but shifting a couple PTP packets per second, so firewall spikes of 20μs would be a very strange thing to be encountering. Before getting lost in a rabbit hole chasing performance spikes (something our engineers here know all too much about), there may be another way to mitigate this.
Attracting Attention
Faster Message Rates
2016-02-05 14:47:24.187549: info: received new Sync interval -2 from master (was 0)
After half an hour it appears as if the size of the spikes has lessened from upwards of 20μs to 15μs. Observing over a longer period appears to show the intensity of a spike has lowered slightly, see the before (top) and after (bottom) graphs below:
Changing the Delay Request rate of the sfptpd client is also simple. Setting ptp_delayreq_interval to -2 in the sfptpd configuration file means the client will send a DelayReq message 4 times a second. We also have two clients, so by changing only one we can compare the difference between the two.
An interesting pattern has emerged a day after changing one of our clients to send 4 Delay Requests per second; the spikes are no longer uniform across both clients as seen in the graph below:
This debunks the theory that it’s the firewalls causing the jitter, each client is seeing it’s own variance at different intervals. One of our key performance guys, Mark Price, tells me 10-20μs is well within the range of Operating System jitter he’s seen, so on an un-tuned system, even with no workload, there’s a good chance this is what we are seeing.
A Different Architecture
In the very first post I mused about a design that didn’t require a firewall, something that involved connecting the PTP Bridge to multiple separate management networks and having it send multiple independent PTP multicast streams. The idea had security implications – the PTP Bridge would join networks that don’t usually join, networks that are usually protected from each other by a firewall. With careful attention to detail these implications can be mitigated.
Along side this, I’ve got new options in my choice of software. Solar Flare are interested in us replacing the PTP Bridge component with a purely Solar Flare approach – I’m told there is an undocumented feature that allows sfptpd to work in UDP Unicast mode, so it would be able to talk directly to our S300 Grand Master. There’s also been some advances with PTPd that make it viable to replace linuxptp on the PTP Bridge, as well as replace sfptpd as the PTP clients.
In the next post we’ll remote the PTP firewall from the mix and look at purely switched designs.