SQLite Write-Ahead Logging - Nikola's Digital Garden

#readwise # Write-Ahead Logging ![rw-book-cover](https://readwise-assets.s3.amazonaws.com/static/images/article1.be68295a7e40.png) ## Metadata - Author: [[sqlite.org]] - Full Title: Write-Ahead Logging - URL: https://www.sqlite.org/wal.html ## Summary WAL lets commits append to a separate log so readers can keep using the original database. A checkpoint copies log changes back into the main database and runs automatically when the WAL reaches about 1000 pages. Checkpoints stop when any reader still needs pages, so long-lived readers can let the WAL grow without bound. Apps can run checkpoints manually, and the WAL file must stay with the database or data may be lost. ## Highlights There are advantages and disadvantages to using WAL instead of a rollback journal. Advantages include: 1. WAL is significantly faster in most scenarios. 2. WAL provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently. 3. Disk I/O operations tends to be more sequential using WAL. 4. WAL uses many fewer fsync() operations and is thus less vulnerable to problems on systems where the fsync() system call is broken. ([View Highlight](https://read.readwise.io/read/01k2whpcz3zkep9a2w3bs3g4dq)) --- But there are also disadvantages: 1. All processes using a database must be on the same host computer; WAL does not work over a network filesystem. This is because WAL requires all processes to share a small amount of memory and processes on separate host machines obviously cannot share memory with each other. 2. Transactions that involve changes against multiple [ATTACHed](https://www.sqlite.org/wal.html/lang_attach.html) databases are atomic for each individual database, but are not atomic across all databases as a set. 3. It is not possible to change the [page_size](https://www.sqlite.org/wal.html/pragma.html#pragma_page_size) after entering WAL mode, either on an empty database or by using [VACUUM](https://www.sqlite.org/wal.html/lang_vacuum.html) or by restoring from a backup using the [backup API](https://www.sqlite.org/wal.html/backup.html). You must be in a rollback journal mode to change the page size. 4. WAL might be very slightly slower (perhaps 1% or 2% slower) than the traditional rollback-journal approach in applications that do mostly reads and seldom write. 5. There is an additional quasi-persistent "-wal" file and "-shm" shared memory file associated with each database, which can make SQLite less appealing for use as an [application file-format](https://www.sqlite.org/wal.html/appfileformat.html). 6. There is the extra operation of [checkpointing](https://www.sqlite.org/wal.html/wal.html#ckpt) which, though automatic by default, is still something that application developers need to be mindful of. --- **The traditional rollback journal works by writing a copy of the original unchanged database content into a separate rollback journal file and then writing changes directly into the database file.** In the event of a crash or [ROLLBACK](https://www.sqlite.org/wal.html/lang_transaction.html), the original content contained in the rollback journal is played back into the database file to revert the database file to its original state. The [COMMIT](https://www.sqlite.org/wal.html/lang_transaction.html) occurs when the rollback journal is deleted. The WAL approach inverts this. **The original content is preserved in the database file and the changes are appended into a separate WAL file. A [COMMIT](https://www.sqlite.org/wal.html/lang_transaction.html) occurs when a special record indicating a commit is appended to the WAL. Thus a COMMIT can happen without ever writing to the original database, which allows readers to continue operating from the original unaltered database while changes are simultaneously being committed into the WAL. Multiple transactions can be appended to the end of a single WAL file.** ([View Highlight](https://read.readwise.io/read/01k2whx6twh18nevy36g9dph9w)) ^pey8go --- Of course, one wants to eventually transfer all the transactions that are appended in the WAL file back into the original database. Moving the WAL file transactions back into the database is called a "checkpoint". Another way to think about the difference between rollback and write-ahead log is that in the rollback-journal approach, there are two primitive operations, reading and writing, whereas with a write-ahead log there are now three primitive operations: reading, writing, and checkpointing. By default, SQLite does a checkpoint automatically when the WAL file reaches a threshold size of 1000 pages. (The [SQLITE_DEFAULT_WAL_AUTOCHECKPOINT](https://www.sqlite.org/wal.html/compile.html#default_wal_autocheckpoint) compile-time option can be used to specify a different default.) Applications using WAL do not have to do anything in order for these checkpoints to occur. But if they want to, applications can adjust the automatic checkpoint threshold. Or they can turn off the automatic checkpoints and run checkpoints during idle moments or in a separate thread or process. ([View Highlight](https://read.readwise.io/read/01k2whymtdedsv5568nfehefzh)) --- When a read operation begins on a WAL-mode database, it first remembers the location of the last valid commit record in the WAL. Call this point the "end mark". Because the WAL can be growing and adding new commit records while various readers connect to the database, each reader can potentially have its own end mark. But for any particular reader, the end mark is unchanged for the duration of the transaction, thus ensuring that a single read transaction only sees the database content as it existed at a single point in time. When a reader needs a page of content, it first checks the WAL to see if that page appears there, and if so it pulls in the last copy of the page that occurs in the WAL prior to the reader's end mark. If no copy of the page exists in the WAL prior to the reader's end mark, then the page is read from the original database file. Readers can exist in separate processes, so to avoid forcing every reader to scan the entire WAL looking for pages (the WAL file can grow to multiple megabytes, depending on how often checkpoints are run), a data structure called the "wal-index" is maintained in shared memory which helps readers locate pages in the WAL quickly and with a minimum of I/O. The wal-index greatly improves the performance of readers, but the use of shared memory means that all readers must exist on the same machine. This is why the write-ahead log implementation will not work on a network filesystem. ([View Highlight](https://read.readwise.io/read/01k2wj24wsnhdt6ry1q1jk1b4f)) --- Writers merely append new content to the end of the WAL file. Because writers do nothing that would interfere with the actions of readers, writers and readers can run at the same time. However, since there is only one WAL file, there can only be one writer at a time. ([View Highlight](https://read.readwise.io/read/01k2wj2vmpkby6v1bnxechj1x2)) --- A checkpoint operation takes content from the WAL file and transfers it back into the original database file. A checkpoint can run concurrently with readers, however the checkpoint must stop when it reaches a page in the WAL that is past the end mark of any current reader. The checkpoint has to stop at that point because otherwise it might overwrite part of the database file that the reader is actively using. The checkpoint remembers (in the wal-index) how far it got and will resume transferring content from the WAL to the database from where it left off on the next invocation. ([View Highlight](https://read.readwise.io/read/01k2wj4fp24ztq9xtg6kbmkcf9)) --- Write transactions are very fast since they only involve writing the content once (versus twice for rollback-journal transactions) and because the writes are all sequential. Further, syncing the content to the disk is not required, as long as the application is willing to sacrifice durability following a power loss or hard reboot. (Writers sync the WAL on every transaction commit if [PRAGMA synchronous](https://www.sqlite.org/wal.html/pragma.html#pragma_synchronous) is set to FULL but omit this sync if [PRAGMA synchronous](https://www.sqlite.org/wal.html/pragma.html#pragma_synchronous) is set to NORMAL.) ([View Highlight](https://read.readwise.io/read/01k2wjkm8rev70gefhy3fjy01h)) --- ead performance deteriorates as the WAL file grows in size since each reader must check the WAL file for the content and the time needed to check the WAL file is proportional to the size of the WAL file. The wal-index helps find content in the WAL file much faster, but performance still falls off with increasing WAL file size. Hence, to maintain good read performance it is important to keep the WAL file size down by running checkpoints at regular intervals. ([View Highlight](https://read.readwise.io/read/01k2wjkxesk8jqfzw7qdzhse6y)) --- The default strategy is to allow successive write transactions to grow the WAL until the WAL becomes about 1000 pages in size, then to run a checkpoint operation for each subsequent COMMIT until the WAL is reset to be smaller than 1000 pages. By default, the checkpoint will be run automatically by the same thread that does the COMMIT that pushes the WAL over its size limit. This has the effect of causing most COMMIT operations to be very fast but an occasional COMMIT (those that trigger a checkpoint) to be much slower. If that effect is undesirable, then the application can disable automatic checkpointing and run the periodic checkpoints in a separate thread, or separate process. (Links to commands and interfaces to accomplish this are [shown below](https://www.sqlite.org/wal.html/#how_to_checkpoint).) ([View Highlight](https://read.readwise.io/read/01k2wjmkr352r2zt481knab7jm)) --- Note that with [PRAGMA synchronous](https://www.sqlite.org/wal.html/pragma.html#pragma_synchronous) set to NORMAL, the checkpoint is the only operation to issue an I/O barrier or sync operation (fsync() on unix or FlushFileBuffers() on windows). If an application therefore runs checkpoint in a separate thread or process, the main thread or process that is doing database queries and updates will never block on a sync operation. This helps to prevent "latch-up" in applications running on a busy disk drive. The downside to this configuration is that transactions are no longer durable and might rollback following a power failure or hard reset. ([View Highlight](https://read.readwise.io/read/01k2wjp90snrahrmrjftzw5ndb)) --- Notice too that there is a tradeoff between average read performance and average write performance. To maximize the read performance, one wants to keep the WAL as small as possible and hence run checkpoints frequently, perhaps as often as every COMMIT. To maximize write performance, one wants to amortize the cost of each checkpoint over as many writes as possible, meaning that one wants to run checkpoints infrequently and let the WAL grow as large as possible before each checkpoint. The decision of how often to run checkpoints may therefore vary from one application to another depending on the relative read and write performance requirements of the application. The default strategy is to run a checkpoint once the WAL reaches 1000 pages and this strategy seems to work well in test applications on workstations, but other strategies might work better on different platforms or for different workloads. ([View Highlight](https://read.readwise.io/read/01k2wjr4ayxxrr73ytfgj1qx5x)) --- An SQLite database connection defaults to [journal_mode=DELETE](https://www.sqlite.org/wal.html/pragma.html#pragma_journal_mode). To convert to WAL mode, use the following pragma: ```sql PRAGMA journal_mode=WAL; ``` ([View Highlight](https://read.readwise.io/read/01k2wjrqzajrm1kx3k5g5afk8f)) --- programs that want more control can force a checkpoint using the [wal_checkpoint pragma](https://www.sqlite.org/wal.html/pragma.html#pragma_wal_checkpoint) or by calling the [sqlite3_wal_checkpoint()](https://www.sqlite.org/wal.html/c3ref/wal_checkpoint.html) C interface. The automatic checkpoint threshold can be changed or automatic checkpointing can be completely disabled using the [wal_autocheckpoint pragma](https://www.sqlite.org/wal.html/pragma.html#pragma_wal_autocheckpoint) or by calling the [sqlite3_wal_autocheckpoint()](https://www.sqlite.org/wal.html/c3ref/wal_autocheckpoint.html) C interface. A program can also use [sqlite3_wal_hook()](https://www.sqlite.org/wal.html/c3ref/wal_hook.html) to register a callback to be invoked whenever any transaction commits to the WAL. This callback can then invoke [sqlite3_wal_checkpoint()](https://www.sqlite.org/wal.html/c3ref/wal_checkpoint.html) or [sqlite3_wal_checkpoint_v2()](https://www.sqlite.org/wal.html/c3ref/wal_checkpoint_v2.html) based on whatever criteria it thinks is appropriate. (The automatic checkpoint mechanism is implemented as a simple wrapper around [sqlite3_wal_hook()](https://www.sqlite.org/wal.html/c3ref/wal_hook.html).) ([View Highlight](https://read.readwise.io/read/01k2wjtv958v01p4gjfqtdkpxb)) --- There are three subtypes of checkpoints that vary in their aggressiveness: `PASSIVE`, `FULL`, and `RESTART`. The default checkpoint style is `PASSIVE`, which does as much work as it can without interfering with other database connections, and which might not run to completion if there are concurrent readers or writers. All checkpoints initiated by [sqlite3_wal_checkpoint()](https://www.sqlite.org/wal.html/c3ref/wal_checkpoint.html) and by the automatic checkpoint mechanism are `PASSIVE`. `FULL` and `RESTART` checkpoints try harder to run the checkpoint to completion and can only be initiated by a call to [sqlite3_wal_checkpoint_v2()](https://www.sqlite.org/wal.html/c3ref/wal_checkpoint_v2.html). See the [sqlite3_wal_checkpoint_v2()](https://www.sqlite.org/wal.html/c3ref/wal_checkpoint_v2.html) documentation for additional information on `FULL` and `RESET` checkpoints. ([View Highlight](https://read.readwise.io/read/01k2wjxj0shpz0gc4x5zpc8dn1)) --- When the last connection to a database closes, that connection does one last checkpoint and then deletes the WAL and its associated shared-memory file, to clean up the disk. ([View Highlight](https://read.readwise.io/read/01k2wk3nfszxqgzs4szq4t71xe)) ---