DuckDB: Analytics, Vector Search, and More

DuckDB is an in-process Online Analytical Processing (OLAP) database that's incredibly fast and easy to use. Think of it as the SQLite for analytics. Instead of setting up a complex database server, DuckDB runs directly inside your application (like a Python script or R session), making it perfect for interactive data analysis on your local machine. 🦆

It's designed to handle analytical queries—think aggregations, joins, and window functions—on larger-than-memory datasets with impressive speed.

DuckDB vs Row Databases

DuckDB uses a columnar-vectorized query execution engine, which makes analytical queries significantly faster than row-based databases like SQLite or PostgreSQL for this type of workload.
Just like SQLite, It's a single file with no external dependencies. There's no server to install, manage, or maintain. You can easily share your database file.
Rich, modern SQL dialect, including support for complex types, window functions, and more.

Get Started

pip install duckdb pandas

In essence, DuckDB is as simple as any dataframe library, but it comes with power of SQL, and most importantly — it reduces dev effort by providing a way to make less interations before shipping your solutions.

import duckdb
import pandas as pd
 
pandas_df = pd.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM pandas_df")

Ducks to DataFrame

Duck supports Pandas, Polars, ArrowTable and others:

dataframe = duckdb.sql("SELECT 42").df()

Writing to Disk

duckdb.sql("SELECT 42").write_parquet("out.parquet") # Write to a Parquet file
duckdb.sql("SELECT 42").write_csv("out.csv")         # Write to a CSV file
duckdb.sql("COPY (SELECT 42) TO 'out.parquet'")

Resources

DuckDB Documentation