.Dd 22 December 2013 .Os SQLite\ VTAB .Dt CSVTAB \&3s "SQLite Virtual Table Functions" .Sh NAME .Nm csvtab .Nd virtual table representing a CSV file .Sh LIBRARY SQLite User-defined function Library (csvtab.so) .Sh SYNOPSIS .Li create virtual table .Ar tablename using .Nm Pf ( Pa csvfile Ns , Op Ar colname ... Ns ) .Sh DESCRIPTION The .Nm virtual table renders .Pa csvfile as a table with user-defined column names. If the .Ar colname list is empty, the first line of the file is used for column names. .Pp To use .Nm in the SQLite shell .Ic sqlite3 , add the following to your .Pa ${HOME}/.sqliterc file: .Bd -literal \&.load /usr/local/lib/sqlite3/csvtab.so .Ed .Sh IMPLEMENTATION\ DETAILS .Pp .Ar csvfile is mapped into memory using .Xr mmap 2 . The contents of the virtual table are .Em not cached. That is, every query initiates a call to .Xr mmap 2 and each row is re-parsed. .Pp Within .Nm , the caller provides the parser with a buffer and a callback function that the parser invokes for each parsed column. The .Fn parse function parses a single row in the buffer, and returns a .Fa "char *" pointing to the position immediately following the last character in the row. Parsing is completed when .Fn parse returns a pointer one past the end of the buffer. .Pp This version of .Nm supplies no indexing or sorting features. If the query given to SQLite mentions a .Nm column in a .Va WHERE , .Va JOIN , .Va GROUP\ BY , or .Va ORDER\ BY clause, SQLite will require the data to be sorted. Because .Nm makes no attempt to do so, SQLite will sort the data itself by writing the rows returned by .Nm initially to a temporary table with an appropriate index. For files with millions of records, this will take some time. You may find it faster to import the CSV into a table using the SQLite shell or with a .Ic "INSERT INTO ... AS SELECT * FROM ..." query. . .Sh NOTES .Pp The CSV parser is a separate module independent of .Nm . It is a pure context-free language parser, written expressly (and reluctantly) for the purpose because existing libraries were not suited to the task. For example, Robert Gamble's well regarded libcsv (http://sourceforge.net/projects/libcsv/) parses buffer-by-buffer, not record-by-record. The SQLite virtual table methods request data row-by-row. .Pp The first version of .Nm .Em was written using libcsv, but the result was rather more complicated than the present version: the library forked a parser, which fed records into a pipe that the virtual table function .Fn xNext read. Each parsed value had several copies in memory: .Pp .Bl -enum -compact .It The buffer provided to the parser .It The buffer returned by the parser to the caller .It The kernel buffer for IPC pipe .It The buffer into which .Fn xNext read the value .It The buffer SQLite creates for .Fa SQLTRANSIENT data .El .Pp The first two copies are necessary because the parser's output is not a mere copy of its input; quotes are removed from quoted data and may appear any number of times in a field. The last copy is also required if .Nm doesn't retain the parsed data for subsequent reference by SQLite. The present implementation has only those three copies. . .Sh AUTHORS .Nm was contributed by James K. Lowden .\" .Sh BUGS