Introduction
The (at least initial) purpose of this page is to list some problems in the ports system that could be solved using SQLite database for pkgdb. As the discussion on mailing list and at the in this document shows, SQLite is not the *only* way these problems could be solved, but it may be the most convenient and with minimal development cost.
Until announced at the freebsd-hackers mailing list, this is unfinished WORK IN PROGRESS.
Some benefits of SQLite
- A single-file SQL database
- ACID transaction safety
- Extremely well supported by both open and closed source projects and companies (it is *the* database within iPhone and Android, and is also used in OS X)
- Public-domain license ("freer" then BSD)
- Very stable API and file format
Can be imported as a single C file+header ("amalgamation")
1. transaction safety
The problem:
beastie:/usr/ports/distfiles/xorg# pkg_info | wc -l pkg_info: the package info for package 'xf86-input-summa-1.1.0' is corrupt 250 beastie:/usr/ports/distfiles/xorg# pkg_info | wc -l pkg_info: show_file: can't open '+COMMENT' for reading 251
The output above was gained by executing pkg_info twice in a row while installing several ports (x.org 7.2 in particular). As it shows, there's no concept of "transactions" in the "all-or-nothing" sense, and no support for concurrently running package tools even if they are as simple as pkg_info. Problems that can arise from these are:
- Potential for inconsistent data if the machine crashes in the middle of the package recording phase. Additionally, many such inconsistent states cannot be recovered with existing tools, but only manually editing the package records with a text editor
- No transaction isolation - concurrently executing utilities see inconsistent data from each other. In particular, this is preventing multiple concurrent portupgrades or installs, including parallel builds.
How SQLite could help:
Has built-in support for ACID transactions, including transaction isolation and automatic data recovery (without additional developer or user support). http://www.sqlite.org/lockingv3.html and http://www.sqlite.org/wal.html
Alternatives:
- Build transaction support and recovery in the existing infrastructure. In effect, this would duplicate the work of a "real" database on a tree of text files.
2. Tool/library availability
Text files are incredibly well supported for simple operations (reading and writing) by many tools, and developers can easily support them in their own programs when needed. But moving away from the most simple operations, anything more complex than sequential reading and appending to the end of the file quickly becomes tedious and error-prone. Non-atomic operations require locking but since the ports database is used by many utilities (some of which are not in the base system), there's no guarantee that all those programs will consistently lock it.
How SQlite could help:
- It's available for a huge number of platforms (endian-aware) under a BSD-compatible license, and has a nice developer-friendly API.
- All programs would use the SQLite library to access the database, and the library deals with locking, etc. This library has been ported to practically every language available in the wild.
- Complex operations are done in the same way the simple operations are (SQL)
- There's a built-in command line utility that allows human interaction with the database and which can be called from external programs and scripts, if needed
Alternatives:
- Build a library to work with the pkgdb, modify ALL programs and utilities to use it, make interfaces to this library for various languages (perl, python...).
3. Speed
Tired of a simple command like:
pkg_info | grep apache
taking minutes to complete on a system with many installed ports? Be assured that a SQLite database query fetching the information which pkg_info displays in this mode takes on the order of milliseconds to complete. In all probability, in the above command, time taken by "grep" to parse the output will probably match the time taken by sqlite to generate it.
4. SQL squeamishness
Many hard-core C developers are understandably wary of "helper" languages like SQL. Though SQL really is evil to some extent and exists only because there is nothing better and more wide-spread out there, it shouldn't be dismissed so easily. Think of it the same way you would think of Regular expressions inside a C program - it's a separate, helper language to do something in an easier way.
Besides, in all probability, the raw interface with SQL will be abstracted into lower levels of libpkg and regular mechanisms and functions will be used to pass data within the programs. However, the programs will *have* to be aware they are working on a transactional database and avoid doing silly things like locking the database as the first line in main() and releasing it in atexit()...