Bug #555
SQLite WAL grows without bound when sqlfs under heavy load
Status: | New | Start date: | 01/24/2013 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 0% | |
Category: | - | |||
Target version: | - | |||
Component: | libsqlfs |
Description
When sqlfs is under heavy load (like 3x fsx sessions), the SQLite WAL log grows without bounds until it fills all available space. I used the attached patch to run sqlite3_wal_checkpoint_v2() as regular intervals. I ran three instances of fsx:
- ~/code/guardianproject/libsqlfs/tests/fsx -d -l 10485760 -o 1048576 /mnt/testfile-big
- ~/code/guardianproject/libsqlfs/tests/fsx -d -l 10485760 -o 1048576 /mnt/testfile-big2
- ~/code/guardianproject/libsqlfs/tests/fsx -d -c 25 /mnt/testfile-c25
- 1000, the WAL log grew to ~248MB in ~1 minute
- 100, the WAL log grew to ~600MB in ~1 minute
- 1, the WAL log grew to ~40MB in ~1 minute
Fixes suggested by sjlombardo:
- is that we could do nothing, and leave as is. The WAL is pretty much working as documented, it's just never getting a chance to completely checkpoint because the database is never idle. if you don't think that it's likely for this level of continuous activity to occur in live usage, then most applications should never see this sort of behavior
- we could introduce some sort of periodic blocking checkpoint, e.g. after every N commits, explicitly force a blocking checkpoint via sqlite3_wal_checkpoint_v2. that would balance out the performance impact of running the checkpoints
- we could disable wal, and switch back to using the standard journal mode. The changes we made to obtain reserved locks using begin immediate and the addition of the busy handler should make the library stable under load even without wal, though we'd loose the performance boost on write and the improved read-write concurrency
Associated revisions
set WAL journal size limit to 10% of available space or 10M
Previously, there was no limit to the size of the WAL log file, and under
heavy load, it could grow quite a bit. This sets a limit as either 10Megs
or 10% of the available space, whichever is larger.
History
#1 Updated by hans almost 5 years ago
- File 0002-use-sqlite3_wal_checkpoint_v2-to-slow-WAL-log-growth.patch added
- File 0001-use-sqlite3_wal_checkpoint_v2-to-slow-WAL-log-growth.patch added
- Target version set to 0.1
0001 is the patch I ran the tests with, 0002 is a patch that has a different implementation of the counter that I think makes more sense.
#2 Updated by abeluck almost 5 years ago
- Component set to libsqlfs
#3 Updated by abeluck almost 5 years ago
FYI https://www.sqlite.org/wal.html is required reading for understanding this ticket ;-)
#4 Updated by hans almost 5 years ago
An update from sjlombardo:
been looking into the WAL growth a bit. its a really tricky situation and there may not be an easy answer. I should say the wal reset can't complete… as the wal grows, the reads slowdown, and it becomes worse. I've tried a few options of interleaving checkpoints, but overtime they eventually report that the database is busy, regardless of how you interleave them. it's difficult to say, if there are sufficient cases that the checkpointing can occur, the it shouldn't cause problems, however, under continuous heavy load all bets are off
another possibility would be to do some read/write locking in the library. I've got a small POC, i could push it on a branch of my fork:
https://github.com/sjlombardo/libsqlfs/tree/rwlock branch called rwlock
#5 Updated by hans over 4 years ago
- Target version changed from 0.1 to 61
#6 Updated by hans over 4 years ago
sjlombardo posted this possible solution to this issue, it needs to be reviewed:
https://github.com/sjlombardo/libsqlfs/commit/9906642f89187288738d0be69c99e35063de0172
#7 Updated by hans almost 4 years ago
- Target version deleted (
61)