leading zeros – builtins and FFSL

I took the time to start using Mike’s HDR histogram port to C the other day. It was pretty painless to use, once I’d overcome a couple of initial problems. One of which is that the C compiler I have on CentOS 6.4 was too old. So I came up with a workaround for older compilers/machines. 

HDRHistogram is Azul’s high dynamic range histogram. The readme on github explains what it is. Suffice it to say that I’m in the process of dumping the various implementations of histogram code I’ve evolved over the years in favour of this for latency and performance management.

So, in one routine Mike uses __lzcnt64() – which is an intrinsic only available in recent versions of gcc and clang as part of the bucket indexing code. My ancient opteron running gcc 4.6 and CentOS 6.4 machines running even older gcc 4.4.6 don’t have that. 

Here’s the routine;

static int32_t get_bucket_index(struct hdr_histogram* h, int64_t value)
{
// smallest power of 2 containing value
    int32_t pow2ceiling = 64 - __lzcnt64(value | h->sub_bucket_mask);
    return pow2ceiling - (h->sub_bucket_half_count_magnitude + 1);
}

Which we can rewrite as;

static int32_t get_power_of_two(int64_t value)
{
   return 64 - __lzcnt64(value); // smallest power of 2 containing value
}

static int32_t get_bucket_index(struct hdr_histogram* h, int64_t value)
{
    int32_t pow2ceiling = get_power_of_two(value | h->sub_bucket_mask);
    return pow2ceiling - (h->sub_bucket_half_count_magnitude + 1);
}

 I got misled initially by the “smallest power of 2 containing value” comment. Its actually “find the most significant bit” which is exactly what lzcnt64 does. 

The obvious thing to do – for gcc – is this;

static int32_t get_power_of_two_gcc(int64_t x)
{
     return 64 - __builtin_clzl(x);
}

 However we want to compile it on clang too. As it happens, clang supports that builtin, but it got me thinking about how to do this in vanilla C without relying on fancy builtins that may or may not work. This means basic library calls only. 

So, the alternative implementation;

static int32_t get_power_of_two(int64_t x)
{
 x |= x >> 1;
 x |= x >> 2;
 x |= x >> 4;
 x |= x >> 8;
 x |= x >> 16;
 x |= x >> 32;
 return ffsl(x+1)-1;
}

 Which uses a variant on the rounding down bitshift algorithm from “hackers delight” to give us the number of the highest bit set. This compiles down to a handful of fast primitive assembly ops. Loads, ors and a bsr for the find first set bit. 

As always there’s bound to be a cleverer/faster way of doing this. For example the ffsl(x+1)-1 could be replaced with a ffsl(x – (x>>1)). That may be faster. Or not… 

Either way, I now can use the fast shiny new toy on my ancient hardware/software. We could use feature test macros to switch between __lzcnt64() or __builtin_clzl(), however I think I’ll just use the gcc builtin. It seems to work under clang too, and looking at the assembly, its a bit more compact than the vanilla version. 

So that was a mini adventure.

Any opinions, news, research, analyses, prices or other information ("information") contained on this Blog, constitutes marketing communication and it has not been prepared in accordance with legal requirements designed to promote the independence of investment research. Further, the information contained within this Blog does not contain (and should not be construed as containing) investment advice or an investment recommendation, or an offer of, or solicitation for, a transaction in any financial instrument. LMAX Group has not verified the accuracy or basis-in-fact of any claim or statement made by any third parties as comments for every Blog entry.

LMAX Group will not accept liability for any loss or damage, including without limitation to, any loss of profit, which may arise directly or indirectly from use of or reliance on such information. No representation or warranty is given as to the accuracy or completeness of the above information. While the produced information was obtained from sources deemed to be reliable, LMAX Group does not provide any guarantees about the reliability of such sources. Consequently any person acting on it does so entirely at his or her own risk. It is not a place to slander, use unacceptable language or to promote LMAX Group or any other FX and CFD provider and any such postings, excessive or unjust comments and attacks will not be allowed and will be removed from the site immediately.