DuckDB: Analytics, Vector Search, and More
DuckDB is an in-process Online Analytical Processing (OLAP) database that's incredibly fast and easy to use. Think of it as the SQLite for analytics. Instead of setting up a complex database server, DuckDB runs directly inside your application (like a Python script or R session), making it perfect for interactive data analysis on your local machine. 🦆
It's designed to handle analytical queries—think aggregations, joins, and window functions—on larger-than-memory datasets with impressive speed.
DuckDB vs Row Databases
- DuckDB uses a columnar-vectorized query execution engine, which makes analytical queries significantly faster than row-based databases like SQLite or PostgreSQL for this type of workload.
- Just like SQLite, It's a single file with no external dependencies. There's no server to install, manage, or maintain. You can easily share your database file.
- Rich, modern SQL dialect, including support for complex types, window functions, and more.
Get Started
pip install duckdb pandas
In essence, DuckDB is as simple as any dataframe library, but it comes with power of SQL, and most importantly — it reduces dev effort by providing a way to make less interations before shipping your solutions.
import duckdb
import pandas as pd
pandas_df = pd.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM pandas_df")
Ducks to DataFrame
Duck supports Pandas, Polars, ArrowTable and others:
dataframe = duckdb.sql("SELECT 42").df()
Writing to Disk
duckdb.sql("SELECT 42").write_parquet("out.parquet") # Write to a Parquet file
duckdb.sql("SELECT 42").write_csv("out.csv") # Write to a CSV file
duckdb.sql("COPY (SELECT 42) TO 'out.parquet'")