Adventures with LVM snapshots

LMAX Exchange

At LMAX Exchange we use LVM for snapshotting volumes for two use cases

1. Take a snapshot of a slave database so it can catch up quickly while the work happens on the snapshotted volume.
2. Backups in case we need to roll back.

Every now and then in our CI environment as we soak tested the integration of this with our in house deployment tool scotty – we found that we would get a merge error.

journal: Merging snapshot invalidated. Aborting merge.

Which is thrown by lvconvert’s progress polling code when it gets back -1 (DM_PERCENT_INVALID) as the merge percentage.

The snapshot is healthy before merging, there are very few (if any) block changes in the LV when our Jenkins jobs are running, and the snapshot and source volume are quite small (few hundred MiB).

We reproduced the issue outside of the Jenkins tests with this Bash loop. It generally takes 15-30 minutes of running this to fail:

while [[ 1 ]]; do

date

/sbin/lvremove -f vg_os/journal_reserved_snap

/sbin/lvcreate -s -n journal_snap -L 160.00m vg_os/journal

/bin/umount /mnt

/sbin/lvconvert --merge -i 5 vg_os/journal_snap

/bin/mount /mnt

/sbin/lvcreate -L 160.00m -n vg_os/journal_reserved_snap

echo

done

When the merge fails, the snapshot is left in the merging state but invalid, and oddly 100% full:

# lvs --all

...

[journal_snap]        vg_os Swi-I-s--- 160.00m      journal 100.00

Device-Mapper says this about the merge:

May  5 10:45:55 lddev-build-scotty04 kernel: device-mapper: snapshots: Cancelling snapshot handover.

May  5 10:45:55 lddev-build-scotty04 kernel: device-mapper: snapshots: Snapshot is invalid: can't merge

And the *-cow and *-real DMs still exist:

# dmsetup ls | grep journal

vg_os-journal            (253:3)

vg_os-journal_snap        (253:6)

vg_os-journal_snap-cow    (253:5)

vg_os-journal-real        (253:4)

So that felt like a kernel bug. As we were in the process of upgrading we tried our new kernel candidate (4.4.x) which didn’t have that problem (at least not detected after 2 days of operation and 60,000 iterations of the loop, but it did have two other problems.

The first was an error message like this;

[root@testsystem ~]# /sbin/lvremove -f vg_os/lv_test_snap

Internal error: Reserved memory (15552512) not enough: used 25280512. Increase activation/reserved_memory?

Logical volume "lv_test_snap" successfully removed

The amount of memory used was increasing by a large block every hundred or so iterations, and worked out to be about 400 bytes per iteration.

We also noticed that when running some tests manually that LVM was running slowly after many iterations. In fact it could take up to several seconds to do basic commands.

A google led us to modifying some values in the /etc/lvm/lvm.conf, in the activation section

use_mlockall = 0

And

reserved_memory = 8192

Setting the user_mlockall to 1, would remove the reserved memory error, but did not modify the delay.

This made us think that something else was going on.

Looking at the audit trail in /etc/lvm/archive, we saw some 110,000 files. This was on an EXT4 filesystem. Even running ‘ls’ on that directory took some time.

So to test the theory that the lvm commands were reading these in every time they ran and storing them in internal structures we tried several things;

  1. Deleting the files The error message went and the commands completed at usual speed.
  2. Changing the file system mount from defaults to noatime. The error message stayed, and the commands were not quite so slow.

All those metadata updates as the LVM commands accessed each file in that directory really slowed us down. The ideal situation however is for LVM to clean up after itself.

So the solution was to modify the retention values

# Configuration option backup/retain_min.

# Minimum number of archives to keep.

retain_min = 10

# Configuration option backup/retain_days.

# Minimum number of days to keep archive files.

retain_days = 30

To values which more suited our use case. For us this was:

retain_min = 100
retain_days = 0

Note that the value 0 for retain_days does remove all files, so make sure the retain_min value is set to a non-zero value.

These are used when any LVM related command is run. With these values the archive directory was purged of all except the last 100 files. Lvm was then running with a minimal delay, and with no reported memory issues, and we have a stable snapshot system again.

Any opinions, news, research, analyses, prices or other information ("information") contained on this Blog, constitutes marketing communication and it has not been prepared in accordance with legal requirements designed to promote the independence of investment research. Further, the information contained within this Blog does not contain (and should not be construed as containing) investment advice or an investment recommendation, or an offer of, or solicitation for, a transaction in any financial instrument. LMAX Group has not verified the accuracy or basis-in-fact of any claim or statement made by any third parties as comments for every Blog entry.

LMAX Group will not accept liability for any loss or damage, including without limitation to, any loss of profit, which may arise directly or indirectly from use of or reliance on such information. No representation or warranty is given as to the accuracy or completeness of the above information. While the produced information was obtained from sources deemed to be reliable, LMAX Group does not provide any guarantees about the reliability of such sources. Consequently any person acting on it does so entirely at his or her own risk. It is not a place to slander, use unacceptable language or to promote LMAX Group or any other FX and CFD provider and any such postings, excessive or unjust comments and attacks will not be allowed and will be removed from the site immediately.