LMAX Exchange - unique vision for global FX

LMAX Exchange

Continuing on from my last post, here we’ll be looking at flags used to control the C2 or server compiler of the Hotspot JVM.

In writing this article, I discovered that the C2 compiler flags did not operate as I expected, and I’ve drawn some possibly incorrect conclusions about how to achieve the required effects. Any enlightenment from those in the know would be welcomed…

Configuration

In order to reduce the noise created in the compilation logs, we’ll be disabling tiered compilation so that only the server compiler will be used. This is done using the following flag:

-XX:-TieredCompilation

We’ll also be producing more detailed compiler logs using:

-XX:+UnlockDiagnosticVMOptions
-XX:+LogCompilation

These flags will cause the JVM to generate a file called hotspot_<pid>.log in the current working directory, containing detailed information on the operation of the compiler.

C2 Compile Thresholds

The server compiler seems to behave in a slightly different manner to the profile-guided client compiler.
Looking at the flags related to Tier4 compilation options (Tier4 is the server compiler), we can see a similar set to those for the Tier3 thresholds described in the last post:

[mark@metal jvm-warmup-talk]$ java -XX:+PrintFlagsFinal 2>&1 | grep Tier4
     intx Tier4BackEdgeThreshold            = 40000
     intx Tier4CompileThreshold             = 15000
     intx Tier4InvocationThreshold          = 5000
     intx Tier4LoadFeedback                 = 3
     intx Tier4MinInvocationThreshold       = 600

So we might expect to be able to run the same experiments regarding the triggering of invocation, back-edge and compile thresholds.
In practice however when disabling tiered compilation, these thresholds do not seem to affect compilation in the same way as the client compiler flags. In order to control the operation of the server compiler, we need to use the following flags:

-XX:CompileThreshold
-XX:BackEdgeThreshold

For the server compiler, CompileThreshold seems to act as the invocation threshold. Setting an artificially low threshold (of -XX:CompileThreshold=200) shows this:

[mark@metal jvm-warmup-talk]$ bash ./scripts/c2-invocation-threshold.sh
LOG: Loop count is: 200
    181   75        com.epickrram.t.w.e.t.C2InvocationThresholdMain::
                            exerciseServerCompileThreshold (6 bytes)

Note that we are no longer seeing information about the tier in the PrintCompilation output. In order to confirm that the server compiler is operating here, we can look at the more detailed LogCompilation output for compile task 75:

[mark@metal jvm-warmup-talk]$ grep "id='75'" hotspot_pid31473.log
<task_queued compile_id='75'
    method='com/epickrram/talk/warmup/example/threshold/C2InvocationThresholdMain
                exerciseServerCompileThreshold (J)J'
    bytes='6' count='100' backedge_count='1' iicount='200'
    stamp='0.089' comment='count' hot_count='200'/>

<nmethod compile_id='75' compiler='C2' ...
    method='com/epickrram/talk/warmup/example/threshold/C2InvocationThresholdMain
                exerciseServerCompileThreshold (J)J'
    bytes='6' count='100' backedge_count='1' iicount='200' stamp='0.182'/>

<task compile_id='75'
    method='com/epickrram/talk/warmup/example/threshold/C2InvocationThresholdMain
                exerciseServerCompileThreshold (J)J'
    bytes='6' count='100' backedge_count='1' iicount='200' stamp='0.182'>

We can see that the compiler being used in this compile task is C2 and that the interpreter invocation count iicount is 200.

C2 BackEdge Threshold

The server compiler’s handling of loop back-edge thresholds seems to differ again from the tiered C1 flags. Using this example program we can see that an on-stack replacement is triggered when the back-edge count is 14563.
This is despite the BackEdgeThreshold flag value being set to a lower value. No amount of threshold-wrangling makes the JVM exhibit the same behaviour as the client compiler in terms of the relationship between the Tier3 InvocationThreshold, CompileThreshold and BackEdgeThreshold.

[mark@metal jvm-warmup-talk]$ bash ./scripts/c2-loop-backedge-threshold.sh 14600
LOG: Loop count is: 14600
    133    2 %  com.epickrram.t.w.e.t.C2LoopBackedgeThresholdMain::
                        exerciseServerLoopBackedgeThreshold @ 5 (25 bytes)

[mark@metal jvm-warmup-talk]$ grep "id='2'" hotspot_pid32675.log
<task_queued compile_id='2' compile_kind='osr'
    method='com/epickrram/talk/warmup/example/threshold/C2LoopBackedgeThresholdMain
                exerciseServerLoopBackedgeThreshold (JI)J'
    bytes='25' count='1' backedge_count='14563' iicount='1' osr_bci='5'
    stamp='0.134' comment='backedge_count' hot_count='14563'/>

<nmethod compile_id='2' compile_kind='osr' compiler='C2' ...
    method='com/epickrram/talk/warmup/example/threshold/C2LoopBackedgeThresholdMain
                exerciseServerLoopBackedgeThreshold (JI)J' bytes='25'
    count='10000' backedge_count='5037' iicount='1' stamp='0.136'/>

<task compile_id='2' compile_kind='osr'
    method='com/epickrram/talk/warmup/example/threshold/C2LoopBackedgeThresholdMain
                exerciseServerLoopBackedgeThreshold (JI)J'
    bytes='25' count='10000' backedge_count='5037' iicount='1' osr_bci='5' stamp='0.134'>

What is interesting is that the nmethod node contains a count that is equal to the value of -XX:CompileThreshold. If we reduce this threshold to 5000, we can see that the on-stack replacement happens sooner:

[mark@metal jvm-warmup-talk]$ bash ./scripts/c2-loop-backedge-threshold.sh
LOG: Loop count is: 10000
    126    6 %    com.epickrram.talk.warmup.example.threshold.C2LoopBackedgeThresholdMain::
                        exerciseServerLoopBackedgeThreshold @ 5 (25 bytes)

[mark@metal jvm-warmup-talk]$ grep "id='6'" hotspot_pid1598.log
<task_queued compile_id='6' compile_kind='osr'
    method='com/epickrram/talk/warmup/example/threshold/C2LoopBackedgeThresholdMain
                exerciseServerLoopBackedgeThreshold (JI)J'
    bytes='25' count='1' backedge_count='7793' iicount='1' osr_bci='5'
    stamp='0.126' comment='backedge_count' hot_count='7793'/>

<nmethod compile_id='6' compile_kind='osr' compiler='C2' ...
    method='com/epickrram/talk/warmup/example/threshold/C2LoopBackedgeThresholdMain
                exerciseServerLoopBackedgeThreshold (JI)J' bytes='25'
    count='5000' backedge_count='2659' iicount='1' stamp='0.128'/>

<task compile_id='6' compile_kind='osr'
    method='com/epickrram/talk/warmup/example/threshold/C2LoopBackedgeThresholdMain
                exerciseServerLoopBackedgeThreshold (JI)J'
    bytes='25' count='5000' backedge_count='2659' iicount='1' osr_bci='5' stamp='0.126'>

Here, OSR occurs after a back-edge count of 7793, while the nmethod node has count=’5000′.
From these observations, we can infer that loop back-edge compilation triggers are related to the CompileThreshold flag, and that if we wish to control when the server compiler kicks in, we need to alter only the CompileThreshold flag.

Inlining

When a method is converted to a native method, the compiler has the option to perform a further optimisation: inlining.
Inlining callee methods reduce method-dispatch overhead, and can allow the compiler a broader scope for further optimisation, e.g. dead-code elimination or escape analysis.
Inlining decisions are based on the size of the method to be inlined. There are two thresholds that we need be concerned with:

-XX:MaxInlineSize
-XX:FreqInlineSize

These thresholds are specified in byte-codes. Let’s start with an example of a method that is small enough for inlining:

private static long shouldInline(final long input)
{
    return (input * System.nanoTime()) + 37L;
}

Using javap to inspect the byte-code of this method, we can see that it is only 9 byte-codes in length:

private static long shouldInline(long);
    descriptor: (J)J
    flags: ACC_PRIVATE, ACC_STATIC
    Code:
    stack=4, locals=2, args_size=1
        0: lload_0
        1: invokestatic  #18   // Method java/lang/System.nanoTime:()J
        4: lmul
        5: ldc2_w        #19   // long 37l
        8: ladd
        9: lreturn

Running the example and adding the -XX:+PrintInlining flag will cause the compiler to interleave information about inlining decisions into the compilation output.

[mark@metal jvm-warmup-talk]$ bash ./scripts/small-method-inlining-threshold.sh 250
72  19  3       com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                        inlineCandidateCaller (5 bytes)
                    @ 1   com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                                shouldInline (10 bytes)
                      @ 1   java.lang.System::nanoTime (0 bytes)   intrinsic
72  20  3       com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                        shouldInline (10 bytes)
                      @ 1   java.lang.System::nanoTime (0 bytes)   intrinsic

In this log excerpt, we can see that after the usual output of PrintCompilation, we also get information about methods being inlined.
If we look at the byte-code of the caller method:

private static long inlineCandidateCaller(long);
    descriptor: (J)J
    flags: ACC_PRIVATE, ACC_STATIC
    Code:
    stack=2, locals=2, args_size=1
        0: lload_0
        1: invokestatic  #17      // Method shouldInline:(J)J
        4: lreturn

Here we can see that the invocation of the shouldInline method is at byte-code 1, so the output of PrintInlining is referring to the call-site that is inlined (the @ 1 part of the log entry).
If we reduce the MaxInlineSize parameter to be less than 10 byte-codes using -XX:MaxInlineSize=9, then inlining will fail:

[mark@metal jvm-warmup-talk]$ bash ./scripts/reduced-inlining-threshold.sh 250
105  19  3       com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                                inlineCandidateCaller (5 bytes)
                   @ 1   com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                                    shouldInline (10 bytes)   callee is too large

Note the message callee is too large – this is something to look out for if you expect methods in hot code-paths to be inlined; it means that the compiler did not inline this method due to its size.
Now, the default value of MaxInlineSize is 20 byte-codes, which is not a lot of code. The compilation process is a trade-off between achieving good performance, and the space overhead of compiled code, among other things.

The compiler will inline your 21 byte-code method, if it is called often enough. In called frequently enough, the size threshold that determines inlining is FreqInlineSize.
Let’s re-run our experiment, and increase the number of invocations:

[mark@metal jvm-warmup-talk]$ bash ./scripts/reduced-inlining-threshold.sh 25000
79  19  3       com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                        inlineCandidateCaller (5 bytes)
                   @ 1   com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                                shouldInline (10 bytes)   callee is too large

...

80  22  4       com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                        shouldInline (10 bytes)
                   @ 1   java.lang.System::nanoTime (0 bytes)   (intrinsic)
                   @ 1   com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                                shouldInline (10 bytes)   inline (hot)

First, we see the same message declaring the callee method to be too large, but later on in the compilation process, the callee method is inlined. This corresponds with the message inline (hot), meaning that the runtime has decided this method is called frequently enough to inline.

If we reduce the FreqInlineSize to be less than 10 byte-codes using -XX:FreqInlineSize=9, then inline will once again fail:

[mark@metal jvm-warmup-talk]$ bash ./scripts/reduced-freq-inlining-threshold.sh 25000
77  22  4       com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                        shouldInline (10 bytes)
                  @ 1   java.lang.System::nanoTime (0 bytes)   (intrinsic)
                  @ 1   com.epickrram.talk.warmup.example.threshold.InliningThresholdMain::
                                shouldInline (10 bytes)   hot method too big

Here denoted by the message hot method too big.

Summary

We have seen that further to the Tier3 client compiler thresholds, the compilation of longer-running programs will controlled by server-specific thresholds.

Inlining decisions are based on the size of the callee method, and the frequency with which is it called. The Hotspot compiler will attempt to aggressively inline hot methods, so it is important to understand whether the design of our code is hindering the ability of the compiler to perform available optimisations.

In my next post, I’ll be looking at some of the tooling available to help analyse and understand the operation of the JVM Hotspot compiler.

LMAX Group blog - FX industry thought leadership

Further notes on Hotspot compiler flags

Configuration

C2 Compile Thresholds

C2 BackEdge Threshold

Inlining

Summary

LMAX Group blog - FX industry thought leadership

Configuration

C2 Compile Thresholds

C2 BackEdge Threshold

Inlining

Summary

Sign up for Global FX Insights, the daily market commentary from LMAX Group