Feature #525: handle SQLITE_FULL, SQLITE_IOERR, SQLITE_NOMEM throughout libsqlfs
ensure sqlfs_t_init never returns 0 in busy situations
|Status:||In Progress||Start date:||01/11/2013|
I experienced some elusive crashes today that I traced back to sqlfs_t_init returning 0, I expect due to this snippet at the very end:
r = ensure_existence(sql_fs, "/", TYPE_DIR); if( !r ) return 0; return (void *) sql_fs;
There was heavy contention for the database (multiple threads thrashing it), and I believe ensure_existence() was failing with a SQLITE BUSY error.
This is BAD NEWS because this function is called by the pervasive get_sqlfs() in newly spawned threads to get a handle on the database. If this fails, then new threads will crash, because they don't expect get_sqlfs() to return 0.
#1 Updated by abeluck over 4 years ago
Perhaps we could have two different sqlfs_t_init functions, one for actual initing (when the database is first created), and another when it's merely another thread/process getting a handle on an existing database?
This is low priority, because thanks to Hans, we've resolved some of the multithreading issues, but I'm sure this will come back to bite us in the ass again, so I want to record have a record of it.
#2 Updated by hans over 4 years ago
- Status changed from New to Resolved
- Assignee set to abeluck
- Target version set to 0.1
This was fixed using SQLite's busy timeout, in this commit:
Abel, if you agree its resolved, please close.
#4 Updated by abeluck over 4 years ago
- Status changed from Resolved to In Progress
- Assignee deleted (
- Target version changed from 0.1 to 61
- Parent task set to #525
As discussed on IRC, we're bumping this to Beta 2 waiting implementation of #525
Resolving the contention issue means that at the moment we no longer see crashes due to 0 being returned from sqlfs_t_init, but, technically, it is still possible. Such circumstances could be when the underlying disk is full or there is even greater thread contention, or other Dooms Day scenarios I haven't thought of.
The correct fix is to never return 0 from sqlfs_t_init, and instead error out as gracefully as possible with a POSIX I/O error (see #525), this will pop the error up to the client and they can decide how to explode.