Bug #555

SQLite WAL grows without bound when sqlfs under heavy load

Added by hans almost 5 years ago. Updated almost 4 years ago.

Status:NewStart date:01/24/2013
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Component:libsqlfs

Description

When sqlfs is under heavy load (like 3x fsx sessions), the SQLite WAL log grows without bounds until it fills all available space. I used the attached patch to run sqlite3_wal_checkpoint_v2() as regular intervals. I ran three instances of fsx:

  1. ~/code/guardianproject/libsqlfs/tests/fsx -d -l 10485760 -o 1048576 /mnt/testfile-big
  2. ~/code/guardianproject/libsqlfs/tests/fsx -d -l 10485760 -o 1048576 /mnt/testfile-big2
  3. ~/code/guardianproject/libsqlfs/tests/fsx -d -c 25 /mnt/testfile-c25
With sqlite3_wal_checkpoint_v2() every
  • 1000, the WAL log grew to ~248MB in ~1 minute
  • 100, the WAL log grew to ~600MB in ~1 minute
  • 1, the WAL log grew to ~40MB in ~1 minute

Fixes suggested by sjlombardo:

  1. is that we could do nothing, and leave as is. The WAL is pretty much working as documented, it's just never getting a chance to completely checkpoint because the database is never idle. if you don't think that it's likely for this level of continuous activity to occur in live usage, then most applications should never see this sort of behavior
  2. we could introduce some sort of periodic blocking checkpoint, e.g. after every N commits, explicitly force a blocking checkpoint via sqlite3_wal_checkpoint_v2. that would balance out the performance impact of running the checkpoints
  3. we could disable wal, and switch back to using the standard journal mode. The changes we made to obtain reserved locks using begin immediate and the addition of the busy handler should make the library stable under load even without wal, though we'd loose the performance boost on write and the improved read-write concurrency

0002-use-sqlite3_wal_checkpoint_v2-to-slow-WAL-log-growth.patch Magnifier - better implementation idea (1.23 KB) hans, 01/24/2013 03:02 am

0001-use-sqlite3_wal_checkpoint_v2-to-slow-WAL-log-growth.patch Magnifier - tests were run with this patch (1.76 KB) hans, 01/24/2013 03:02 am

Associated revisions

Revision c40ac891
Added by Hans-Christoph Steiner over 3 years ago

set WAL journal size limit to 10% of available space or 10M

Previously, there was no limit to the size of the WAL log file, and under
heavy load, it could grow quite a bit. This sets a limit as either 10Megs
or 10% of the available space, whichever is larger.

refs #555 https://dev.guardianproject.info/issues/555

History

#1 Updated by hans almost 5 years ago

0001 is the patch I ran the tests with, 0002 is a patch that has a different implementation of the counter that I think makes more sense.

#2 Updated by abeluck almost 5 years ago

  • Component set to libsqlfs

#3 Updated by abeluck almost 5 years ago

FYI https://www.sqlite.org/wal.html is required reading for understanding this ticket ;-)

#4 Updated by hans almost 5 years ago

An update from sjlombardo:

been looking into the WAL growth a bit. its a really tricky situation and there may not be an easy answer. I should say the wal reset can't complete… as the wal grows, the reads slowdown, and it becomes worse. I've tried a few options of interleaving checkpoints, but overtime they eventually report that the database is busy, regardless of how you interleave them. it's difficult to say, if there are sufficient cases that the checkpointing can occur, the it shouldn't cause problems, however, under continuous heavy load all bets are off

another possibility would be to do some read/write locking in the library. I've got a small POC, i could push it on a branch of my fork:

https://github.com/sjlombardo/libsqlfs/tree/rwlock branch called rwlock

#5 Updated by hans over 4 years ago

  • Target version changed from 0.1 to 61

#6 Updated by hans over 4 years ago

sjlombardo posted this possible solution to this issue, it needs to be reviewed:
https://github.com/sjlombardo/libsqlfs/commit/9906642f89187288738d0be69c99e35063de0172

#7 Updated by hans almost 4 years ago

  • Target version deleted (61)

Also available in: Atom PDF