LMAX Exchange - unique vision for global FX

LMAX Exchange

We all know by now that continuous integration is part of good
software development check in regularly and have a suite of automated
tests run to confirm that everything is working as expected. If a test
fails, jump on it quickly and get the build back to green. Simple right?

But what happens when something goes wrong in a way that can’t be fixed
quickly? For example, the build server has a hardware fault or runs out
of disk space. You can’t just rollback the faulty change, its going to
take time to get the build back to green. As your CI system grows, it
may take time just to understand what went wrong. If your team is small
they may all be taken up fixing the problem, but if the team is larger a
pair focusses on fixing the build as quickly as possible and the other
developers carry on working. Now you have two problems.

You still
have the build problem, but now you also have a process problem because
you’re no longer doing continuous integration. When things are working
well in continuous integration, you have a continuous stream of commits
proceeding through the build pipeline. If a bug is introduced the build
quickly picks it up and you can identify the problem change easily
because it can only be one of a few commits.

Continuous integration working well - a stream of commits passing through the pipeline.

On the other hand, if developers keep working while the build is
broken, they build up a large backlog of commits which makes it more
difficult to identify which revision broke the build. It also makes it
significantly harder to resolve the build problem because the code keeps
changing and you can easily wind up with multiple build breakages
starting to overlap and interact.

Broken continuous integration - a huge pile of commits building up.

To avoid this problem, many companies put up an embargo on commits or
close the source tree to prevent any further changes from being
committed. This controls change in the build environment and makes it
easier to resolve the problem, but it doesn’t prevent the build-up of
changes. The result is that when the embargo is lifted, there is a huge
swarm of incoming changes all at once, introducing merging problems and
making it difficult to identify the culprit if any of them introduce
another problem. There could well be multiple problems introduced by
that batch of changes with their effects overlapping and interacting
making it even harder. Essentially, the longer an embargo is up the
greater the chance that it will need to be put back up because of
problems in the batch of changes developed during the embargo.

So
what’s the answer? Simple stop working. The team as a whole will go
faster if developers simply stop writing code once they reach the point
where they would normally commit but can’t because there’s an embargo.
For short embargos, most developers won’t be affected at all, but as the
embargo lasts longer more and more developers will have to stop work.
This feels really bad, but it ensures we keep doing continuous
integration and overall benefits the team’s productivity. For build
problems that are hard to understand, it also means that gradually more
and more developers are available to spitball ideas about what’s wrong
and to pick up lines of investigation to help get the build working
again.

Also, not coding doesn’t mean that developers can’t do
anything at all, maybe now is a good time to do those higher level
design sessions and ensure everyone is pushing in the same direction,
maybe read up on technology that is either in use but not fully
understood, or that could be of benefit if it was introduced. If there
are spikes to be played, they can usually still be picked up and worked
on, write a blog post (like say, this one). Or even just take an early
lunch.

The bottom line is that build breakages are always hugely
expensive pretending that everything is normal and you can continue
work when the build system is broken doesn’t make them any less
expensive, it just makes you look busier while creating the next
problem.

Mo	Tu	We	Th	Fr	Sa	Su
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

LMAX Group blog - FX industry thought leadership

Go Faster By Not Working

LMAX Group blog - FX industry thought leadership

Sign up for Global FX Insights, the daily market commentary from LMAX Group