Bug #522

Feature #525: handle SQLITE_FULL, SQLITE_IOERR, SQLITE_NOMEM throughout libsqlfs

ensure sqlfs_t_init never returns 0 in busy situations

Added by abeluck over 4 years ago. Updated over 3 years ago.

Status:In ProgressStart date:01/11/2013
Priority:LowDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Component:libsqlfs

Description

I experienced some elusive crashes today that I traced back to sqlfs_t_init returning 0, I expect due to this snippet at the very end:

    r = ensure_existence(sql_fs, "/", TYPE_DIR);
    if( !r )
        return 0;
    return (void *) sql_fs;

There was heavy contention for the database (multiple threads thrashing it), and I believe ensure_existence() was failing with a SQLITE BUSY error.

This is BAD NEWS because this function is called by the pervasive get_sqlfs() in newly spawned threads to get a handle on the database. If this fails, then new threads will crash, because they don't expect get_sqlfs() to return 0.

History

#1 Updated by abeluck over 4 years ago

Idea:

Perhaps we could have two different sqlfs_t_init functions, one for actual initing (when the database is first created), and another when it's merely another thread/process getting a handle on an existing database?

This is low priority, because thanks to Hans, we've resolved some of the multithreading issues, but I'm sure this will come back to bite us in the ass again, so I want to record have a record of it.

#2 Updated by hans over 4 years ago

  • Status changed from New to Resolved
  • Assignee set to abeluck
  • Target version set to 0.1

This was fixed using SQLite's busy timeout, in this commit:

https://github.com/guardianproject/libsqlfs/commit/1048fcfd956e78adb1df225de29a44bf6871eb60

Abel, if you agree its resolved, please close.

#3 Updated by abeluck over 4 years ago

  • Component set to libsqlfs

#4 Updated by abeluck over 4 years ago

  • Status changed from Resolved to In Progress
  • Assignee deleted (abeluck)
  • Target version changed from 0.1 to 61
  • Parent task set to #525

As discussed on IRC, we're bumping this to Beta 2 waiting implementation of #525

Resolving the contention issue means that at the moment we no longer see crashes due to 0 being returned from sqlfs_t_init, but, technically, it is still possible. Such circumstances could be when the underlying disk is full or there is even greater thread contention, or other Dooms Day scenarios I haven't thought of.

The correct fix is to never return 0 from sqlfs_t_init, and instead error out as gracefully as possible with a POSIX I/O error (see #525), this will pop the error up to the client and they can decide how to explode.

#5 Updated by hans over 3 years ago

  • Target version deleted (61)

Also available in: Atom PDF