`dbstreams`: Rationale and Philosophy

Database as Stream

dbstreams is C++ class library modelled on std::iostreams in the C++ standard library. It treats a database connection as a source or sink for data. The basic operations are the same as for iostreams: open, close, read, and write (but not seek). State information similarly borrows from std::ios: a stream has a state word with enumerated values: goodbit, badbit, failbit, eofbit. And the notation is largely the same, with operator<< and operator>> serving the purpose of sending data to and reading data from the database. In general, the design mimics iostreams, permitting the same kind of terse, economical code.

Databases are similar to but not the same as files

They are not uniform. They vary both by feature set and library API.
They are record- rather than byte-oriented.
Their output is self-describing, and applications need access to the metadata.
They have much more complex error states.

A dbstream has record semantics. Manipulators and operators allow the programmer to extract from the stream by column name or number, and to advance to the next record. There are also read and write functions designed to interact with any STL container, provided the container's value_type defines read and write functions for a dbstream. This pattern is both easy to program and efficient, because only one copy need be made of your data en route to your application.

To handle errors in a consistent way, dbstreams has a dbstatus class. (Functionality can be added by extending it.) In addition to the native message and error code, dbstatus holds the file and line number where the error arose. The overall state of the connection can be determined in the same way as for a std::iostream, using methods (e.g. rdstate) of the same names.

Simplicity, not Portability

The dbstreams library does not seek to conceal or homogenize servers. There are no catalog operations. There is no transaction method. No attempt is made to unify various error numbers and messages. (In this way, dbstreams is also like iostreams: I/O is the same, but the content of the I/O differs.) That was a design choice and perhaps merits some explanation.

From the programmer's point of view, less is more. The more done to make all servers look alike, the thicker the layer of mediation needed, the less transparent that layer will be. If you depend on the mediation layer to unify error codes and whatnot, you will frequently be disappointed, because the devil is in the details.

If you know your database, abstraction layers interfere with getting at the real information. The dbstreams philosophy is that you know your database, its catalog, its flavor of SQL. In fact, more likely than not, it's the only database you know. Instead of hiding it, then, dbstreams exposes the server's features for you to exploit.

Lastly, “it's not portable until it's ported.” True portability is hard, and in most environments changing database vendors is extremely hard. dbstreams's goal isn't portability so much as simplicity, because the real problem most programmers face isn't the next database. It's the current one.

Containers

C libraries and those closely related typically define a recordset object of some kind, a container-of-containers, where is row is represented by a container of “typeless” elements that can then be converted to something useful. That was necessary (or helpful, anyway) in C, where there is no standard container class.

C++ has standard containers. By defining the container<->stream interaction directly, we obviate any need for a recordset. Poof! One aspect of library (and application) complexity disappears.

Errors and Exceptions

Exceptions are thrown when

the underlying library is used in an invalid way
the underlying connection becomes unusable

An example of in an invalid way would be trying to insert data before opening the connection. A connection is unusable when the library says so, as in when the remote server is no longer listening to it.

The first kind of error is in the “shouldn't happen” category. The dbstream interface is well defined, and a correctly designed application uses it such that the underlying libary is used rationally. The stream itself ensures the native library is given valid buffers and so on, and manages the stream state. Failure to do so is a serious error. By making such cases noisy, one hopes the error can be found and fixed.

The second kind of error is “unrecoverable”. The stream must be closed; nothing good can come from trying to continue using it. By putting the interaction in a try/catch block, control naturally returns to the top level, where it belongs.

“Normal” errors — invalid SQL, say, or requesting data for a nonexistent column name — do not throw exceptions. The operation fails and dbstatus::notify is called, but the stream is still valid. It can accept a new query; in some cases, it can continue with the next logical insert/extract operation. The application can monitor the stream's status, and can override dbstatus::notify as desired.

Library as Language: Good design is invisible™

A good library provides the application programmer with nouns and verbs suited to the problem domain. The programmer is freed to focus on the needs of the application. The application winds up being written in terms of the library more than the base language.

Every database programmer knows how far short database libraries fall of that description.

Properties of a good library

Terse, to get out of the way.
Expressive, basic functionality should be discernable on first reading.
Powerful, requiring minimum input to accomplish the task.
Flexible, allowing different levels of access.
Extensible, permitting uses not foreseen by the library author.

It would be too much to claim to have achieved all that. However, those are properties of the std::iostream library, and by adhering to that model as closely as possible, one might at least hope to stand on the shoulders of giants without falling off. This for sure: hewing closely to the std::iostream example was sometimes difficult, but in every case made dbstreams a better library.

Beneficial surprise

To illustrate, consider operator>> and operator<< to read and write data, and the role of a manipulator. In a dbstream, as in iostream, there are data and manipulators. Data sent to the stream are stored, whereas manipulators change the stream's state. The same is true of extraction: variables hold and manipulators do.

Like all good designs, this turns out to be more important and useful than seems on first reading. Manipulators can be of arbitrary complexity, which is good because sometimes databases seem to be arbitrarily complex. And for data the story is the same: any object, no matter how complex, can be inserted into the stream if it can be expressed in datatypes recognized by the database.

Now ask yourself: Is SQL data? Clearly not. Ergo, SQL is never inserted into the stream. Can't be; it's not data. How to send SQL to the server? Put it in a query object, and send that. A query object — unlike straight SQL — is distinct from character data. When the stream (and programmer) sees character data being sent, they both know for certain that those data will be stored. They likewise know what to expect when a query object is inserted into the stream.

Rationale

Many database client libraries — SQLite, Postgres, MySQL, to name but three — already offer a C++ interface. And quite a few cross-database libraries also exist. Why another?

In short: for the same reason you're reading this. They're not good enough.

Your humble author has looked at a dozen C++ libraries in some detail. None of them adheres to the principles described above. They are not terse; they do not simplify the programmer's work; they do not hide the complexity of the underlying API. They are verbose and idiosyncratic. The offer C-holdover features, e.g. recordsets, that are better replaced by standard containers. They are too hard to use. At least harder than need be.

In writing dbstreams, the intention is to make it the last database API you'll ever need to learn. That might not save you from learning another server, but at least you won't have to learn another API. Or, if you're confronted with a database for which there isn't yet a provider, you'll be able to write a provider, because the provider is just an API localization layer, something you'd likely write yourself anyway. Then you can get on with your application.

$Id: rationale.html,v 1.5 2008/02/16 15:32:42 jklowden Exp $

Comments, questions, and encouraging words are welcome. Please email the author, James K. Lowden.

dbstreams: Rationale and Philosophy