Tweet this quote
Skip to content
header background

LMAX Exchange blog - FX industry thought leadership

All the latest business and technology views and insights on the FX industry from LMAX Exchange management and staff

header background

Adventures with LVM snapshots

At LMAX Exchange we use LVM for snapshotting volumes for two use cases

1. Take a snapshot of a slave database so it can catch up quickly while the work happens on the snapshotted volume.
2. Backups in case we need to roll back.

Every now and then in our CI environment as we soak tested the integration of this with our in house deployment tool – scotty – we found that we would get a merge error.

journal: Merging snapshot invalidated. Aborting merge.

Which is thrown by lvconvert’s progress polling code when it gets back -1 (DM_PERCENT_INVALID) as the merge percentage.

The snapshot is healthy before merging, there are very few (if any) block changes in the LV when our Jenkins jobs are running, and the snapshot and source volume are quite small (few hundred MiB).

We reproduced the issue outside of the Jenkins tests with this Bash loop. It generally takes 15-30 minutes of running this to fail:

while [[ 1 ]]; do

date

/sbin/lvremove -f vg_os/journal_reserved_snap

/sbin/lvcreate -s -n journal_snap -L 160.00m vg_os/journal

/bin/umount /mnt

/sbin/lvconvert --merge -i 5 vg_os/journal_snap

/bin/mount /mnt

/sbin/lvcreate -L 160.00m -n vg_os/journal_reserved_snap

echo

done

When the merge fails, the snapshot is left in the merging state but invalid, and oddly 100% full:

# lvs --all

...

[journal_snap]        vg_os Swi-I-s--- 160.00m      journal 100.00

Device-Mapper says this about the merge:

May  5 10:45:55 lddev-build-scotty04 kernel: device-mapper: snapshots: Cancelling snapshot handover.

May  5 10:45:55 lddev-build-scotty04 kernel: device-mapper: snapshots: Snapshot is invalid: can't merge

And the *-cow and *-real DMs still exist:

# dmsetup ls | grep journal

vg_os-journal            (253:3)

vg_os-journal_snap        (253:6)

vg_os-journal_snap-cow    (253:5)

vg_os-journal-real        (253:4)

So that felt like a kernel bug. As we were in the process of upgrading we tried our new kernel candidate (4.4.x) which didn’t have that problem (at least not detected after 2 days of operation and 60,000 iterations of the loop, but it did have two other problems.

The first was an error message like this;

[[email protected] ~]# /sbin/lvremove -f vg_os/lv_test_snap

Internal error: Reserved memory (15552512) not enough: used 25280512. Increase activation/reserved_memory?

Logical volume "lv_test_snap" successfully removed

The amount of memory used was increasing by a large block every hundred or so iterations, and worked out to be about 400 bytes per iteration.

We also noticed that when running some tests manually that LVM was running slowly after many iterations. In fact it could take up to several seconds to do basic commands.

A google led us to modifying some values in the /etc/lvm/lvm.conf, in the activation section

use_mlockall = 0

And

reserved_memory = 8192

Setting the user_mlockall to 1, would remove the reserved memory error, but did not modify the delay.

This made us think that something else was going on.

Looking at the audit trail in /etc/lvm/archive, we saw some 110,000 files. This was on an EXT4 filesystem. Even running ‘ls’ on that directory took some time.

So to test the theory that the lvm commands were reading these in every time they ran and storing them in internal structures we tried several things;

  1. Deleting the files – The error message went and the commands completed at usual speed.
  2. Changing the file system mount from defaults to noatime. The error message stayed, and the commands were not quite so slow.

All those metadata updates as the LVM commands accessed each file in that directory really slowed us down. The ideal situation however is for LVM to clean up after itself.

So the solution was to modify the retention values

# Configuration option backup/retain_min.

# Minimum number of archives to keep.

retain_min = 10

# Configuration option backup/retain_days.

# Minimum number of days to keep archive files.

retain_days = 30

To values which more suited our use case. For us this was:

retain_min = 100
retain_days = 0

Note that the value 0 for retain_days does remove all files, so make sure the retain_min value is set to a non-zero value.

These are used when any LVM related command is run. With these values the archive directory was purged of all except the last 100 files. Lvm was then running with a minimal delay, and with no reported memory issues, and we have a stable snapshot system again.

Any opinions, news, research, analyses, prices or other information ("information") contained on this Blog, constitutes marketing communication and it has not been prepared in accordance with legal requirements designed to promote the independence of investment research. Further, the information contained within this Blog does not contain (and should not be construed as containing) investment advice or an investment recommendation, or an offer of, or solicitation for, a transaction in any financial instrument. LMAX Exchange has not verified the accuracy or basis-in-fact of any claim or statement made by any third parties as comments for every Blog entry.

LMAX Exchange will not accept liability for any loss or damage, including without limitation to, any loss of profit, which may arise directly or indirectly from use of or reliance on such information. No representation or warranty is given as to the accuracy or completeness of the above information. While the produced information was obtained from sources deemed to be reliable, LMAX Exchange does not provide any guarantees about the reliability of such sources. Consequently any person acting on it does so entirely at his or her own risk. It is not a place to slander, use unacceptable language or to promote LMAX Exchange or any other FX, Spread Betting and CFD provider and any such postings, excessive or unjust comments and attacks will not be allowed and will be removed from the site immediately.

LMAX Exchange will clearly identify and mark any content it publishes or that is approved by LMAX Exchange.

FX and CFDs are leveraged products that can result in losses exceeding your deposit. They are not suitable for everyone so please ensure you fully understand the risks involved. The information on this website is not directed at residents of the United States of America, Australia (we will only deal with Australian clients who are "wholesale clients" as defined under the Corporations Act 2001), Canada (although we may deal with Canadian residents who meet the "Permitted Client" criteria), Singapore or any other jurisdiction where FX trading and/or CFD trading is restricted or prohibited by local laws or regulations.

LMAX Limited operates a multilateral trading facility. LMAX Limited is authorised and regulated by the Financial Conduct Authority (firm registration number 509778) and is a company registered in England and Wales (number 6505809). Our registered address is Yellow Building, 1A Nicholas Road, London, W11 4AN.

Sign up for Global FX Insights, the daily market commentary from LMAX Exchange

Thank you
for subscribing to the Global FX Insights newsletter

Thank you
you have already subscribed to the newsletter

Error
sorry there was a problem, please try again later

Your information will not be distributed or shared with third parties