Paper Review DBToaster A SQL Compiler for High Performance Delta Processing in Main Memory Databases

High-level

Summary:

DBToaster compiles queries to high-performance C++ code that incrementally and continuously answer standing aggregate queries using in-memory views.

Note that there is two paper on this work, one is a short workshop paper in VLDB 09, one is a full length one is VLDB 12 (DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views).

Evaluation:

Performance: 1-3 times better in a financial application and a data warehouse loading application, comped to PostgreSQL, HSQLDB, commercial DBMS ‘A', the Stanford STREAM engine, and commercial stream processor ‘B’.

Takeaways:

  • What is the practical problem here? VM - view maintenance. "View" is a virtual table defined by query on existing tables. Maintenance means that incrementalization is essential in making view queries processing efficient, i.e. avoiding linear scans and joins (and asymptotically expensive when doing multiple joins). DBToaster recursively consider deltas of maintenance queries and compile to thoroughly transform queries into code. Then, use map to parameterize them. Then, for each update of the table, the update of the underlying maps are used. Essentially, maps are used to “cache” the equational constraint in WHERE clause, key is one side of the equation and value is the final value of sub-query.
  • DBToaster key technical idea: Recursively compile a query into increments -- because delta of a query is naturally expressed as sub-queries (TODO: example here?)

Details

  • Limitations: TODO
  • Misc: DBToaster includes a debugger and profiler for tracing delta processing functions
  • Query compilation:
    • Use a custom query algebra to define map data structures
    • Our full map algebra is approximately 70 simplification rules