We want to compare three methods of inserting data to a DuckDB table:

  1. Individual Inserts
  2. Individual Inserts - Wrapped in a commit
  3. Batched Value Inserts

For our test we are going to be running via JDBC driver as we want to test the DuckDB performance for QStudio. The numbers are so significantly different that the principle should apply to all DuckDB instances.

Benchmark Code

Individual Inserts

Batched Inserts

DuckDB Benchmark Results

Table shows the milliseconds taken for inserting the given number of rows into an in-memory table for each approach.

rowsDuckDB IndividualWithCommit (ms)DuckDB IndividualInserts (ms)DuckDB BatchInsert (ms)H2 BatchInsert (ms)ColumnarDB Individual (ms)
100040050070
20009261130114
40002340294822051
200001150123450
400002264240800

*Note H2 is a java embedded database. ColumnarDB is an enterprise column-oriented database.

Graph showing Insert Time-Taken. Smaller = Better

DuckDB Insert Methods Benchmark Results per Row

Observations

  • Batched Inserts are 10x faster, even on small data sets.
  • Batched inserts scale sub-linearly, Individual Inserts do not. i.e. The larger the number of inserts, it gets slower per insert.

Open Question, is there a size limit for batched inserts?
I tested 40,000 inserts which ran in 2264ms.

Should also try the appender.