Custom DuckDB Wasm builds for Cloudflare Workers

Custom DuckDB Wasm builds for Cloudflare Workers

Tobias Müller
8 min read

What if you could run full SQL queries, including JOINs, aggregations, and even remote Parquet file reads, directly inside a Cloudflare Worker? No database server, no connection pool, no cold-start latency from external services. Just DuckDB, compiled to WebAssembly, running at the edge.

That's now possible with Ducklings, a minimal DuckDB WASM build I created specifically for browsers and serverless environments. In this post, I'll focus on the @ducklings/workers package, which makes it possible to deploy DuckDB to Cloudflare Workers for the first time (to my knowledge).

The challenge: DuckDB in resource-limited serverless environments like Workers#

DuckDB has had a WASM build for a while through duckdb-wasm, but it was designed for browsers. Cloudflare Workers are a different beast: they don't have access to XMLHttpRequest, SharedArrayBuffer, or threads. Most critically, the Workers runtime doesn't support synchronous I/O, everything must go through async fetch().

This matters because DuckDB's httpfs extension, the one that lets you query remote Parquet, CSV, and JSON files over HTTP, relies on synchronous HTTP calls internally. In a browser, that works via XMLHttpRequest. In a Worker, it simply doesn't.

The solution is Emscripten's Asyncify, a compile-time transformation that lets synchronous C/C++ code call asynchronous JavaScript functions. Ducklings uses Asyncify to bridge DuckDB's synchronous HTTP calls to the Workers fetch() API, making httpfs work transparently.

What Ducklings provides#

Ducklings is a from-scratch WASM build of DuckDB, optimized for minimal size:

PackageSize (gzipped)APIHTTP transport
@ducklings/workers~9.6 MBAsyncfetch() via Asyncify

The @ducklings/workers package includes Parquet, JSON, and httpfs extensions compiled in statically. No runtime extension loading, just what you need, baked into the binary.

The Workers build is larger because Asyncify adds instrumentation to every function in the call chain between DuckDB's HTTP layer and the JavaScript fetch() call. That's the price for making synchronous C++ code work in an async-only environment.

Getting started#

Prerequisites#

You'll need a Cloudflare Workers paid plan ($5/month). The DuckDB WASM binary is ~9.6 MB gzipped, which exceeds the free tier's 3 MB code size limit. The paid plan allows up to 10 MB.

Project setup#

Create a new project and install dependencies:

mkdir duckdb-worker && cd duckdb-worker
npm init -y
npm install @ducklings/workers
npm install -D @cloudflare/vite-plugin vite wrangler typescript @cloudflare/workers-types

Configuration files#

wrangler.jsonc - Cloudflare Workers configuration:

{
  "$schema": "node_modules/wrangler/config-schema.json",
  "name": "duckdb-worker",
  "main": "src/index.ts",
  "compatibility_date": "2026-01-11",
  "compatibility_flags": ["nodejs_compat"]
}

The nodejs_compat flag is required for some Node.js APIs that Emscripten's glue code uses.

vite.config.ts - Build configuration with the Ducklings Vite plugin:

import { defineConfig } from 'vite';
import { cloudflare } from '@cloudflare/vite-plugin';
import { ducklingsWorkerPlugin } from '@ducklings/workers/vite-plugin';

export default defineConfig({
  plugins: [
    ducklingsWorkerPlugin(),
    cloudflare(),
  ],
});

The ducklingsWorkerPlugin() handles resolving the WASM module import so that Vite and Wrangler can bundle it correctly.

tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "strict": true,
    "skipLibCheck": true,
    "types": ["@cloudflare/workers-types/2026-01-11"],
    "jsx": "react-jsx",
    "lib": ["ES2022"]
  },
  "include": ["src"]
}

package.json scripts:

{
  "type": "module",
  "scripts": {
    "dev": "vite dev",
    "build": "vite build",
    "deploy": "vite build && wrangler deploy"
  }
}

Writing the Worker#

Here's a complete Worker that exposes a SQL query API:

src/index.ts:

import {
  init,
  DuckDB,
  version,
  tableToIPC,
  sanitizeSql,
  DuckDBError,
  type Connection,
} from '@ducklings/workers';
import wasmModule from '@ducklings/workers/wasm';

let db: DuckDB | null = null;
let conn: Connection | null = null;
let initialized = false;

async function ensureInitialized(): Promise<void> {
  if (initialized && db && conn) return;

  // Initialize the WASM module
  await init({ wasmModule });

  // Create database and connection
  db = new DuckDB();
  conn = db.connect();

  // Create sample data
  await conn.execute(`
    CREATE TABLE readings (
      sensor_id INTEGER,
      temperature DOUBLE,
      humidity DOUBLE,
      recorded_at TIMESTAMP
    )
  `);

  await conn.execute(`
    INSERT INTO readings VALUES
      (1, 22.5, 45.0, '2026-01-27 10:00:00'),
      (1, 23.1, 44.2, '2026-01-27 11:00:00'),
      (2, 19.8, 62.1, '2026-01-27 10:00:00'),
      (2, 20.3, 60.5, '2026-01-27 11:00:00')
  `);

  initialized = true;
}

export default {
  async fetch(request: Request): Promise<Response> {
    await ensureInitialized();

    const url = new URL(request.url);

    if (url.pathname === '/query' && request.method === 'POST') {
      const { sql } = await request.json() as { sql: string };

      // Block dangerous queries
      try {
        sanitizeSql(sql);
      } catch (e) {
        if (e instanceof DuckDBError) {
          return Response.json({ error: e.message }, { status: 400 });
        }
        throw e;
      }

      const rows = await conn!.query(sql);
      return Response.json({ data: rows, rowCount: rows.length });
    }

    if (url.pathname === '/stats') {
      const stats = await conn!.query(`
        SELECT
          sensor_id,
          AVG(temperature) AS avg_temp,
          AVG(humidity) AS avg_humidity,
          COUNT(*) AS reading_count
        FROM readings
        GROUP BY sensor_id
      `);
      return Response.json({ data: stats });
    }

    return Response.json({
      name: 'DuckDB Worker',
      version: version(),
      endpoints: {
        '/query': 'POST SQL queries',
        '/stats': 'GET sensor statistics',
      },
    });
  },
};

A few things to note about the Workers API:

  • init() takes a pre-compiled WASM module: In Workers, you import the WASM binary directly and pass it to init(). There's no fetch step, the module is bundled into the Worker.

  • All query methods are async: query(), execute(), queryArrow() all return Promises. This is because of Asyncify, under the hood, DuckDB's C++ code may need to pause and wait for an async fetch() call.

  • sanitizeSql() is built in: It blocks duckdb_secrets(), PRAGMA, COPY ... TO, and EXPORT DATABASE patterns. Use it when accepting SQL from untrusted sources.

Querying remote files#

This is where it gets interesting. Because httpfs works in Workers, you can query remote Parquet, CSV, and JSON files directly:

// Remote Parquet file
const data = await conn.query(`
  SELECT *
  FROM 'https://example.com/data.parquet'
  LIMIT 100
`);

// Remote CSV
const csv = await conn.query(`
  SELECT *
  FROM read_csv('https://example.com/data.csv')
`);

// Remote JSON
const json = await conn.query(`
  SELECT *
  FROM read_json('https://example.com/data.json')
`);

No additional configuration needed as httpfs is compiled in and initialized automatically.

Accessing S3 and R2 storage#

For private data, you can configure secrets to access S3-compatible storage. Here's how to set up Cloudflare R2:

interface Env {
  R2_ACCESS_KEY_ID: string;
  R2_SECRET_ACCESS_KEY: string;
  R2_ACCOUNT_ID: string;
}

async function ensureInitialized(env: Env): Promise<void> {
  // ... init code ...

  // Create R2 secret for accessing private buckets
  await conn.execute(`
    CREATE SECRET r2 (
      TYPE R2,
      KEY_ID '${env.R2_ACCESS_KEY_ID}',
      SECRET '${env.R2_SECRET_ACCESS_KEY}',
      ACCOUNT_ID '${env.R2_ACCOUNT_ID}'
    )
  `);
}

// Then query R2 files directly
const data = await conn.query(`
  SELECT * FROM 'r2://my-bucket/data/events.parquet'
`);

Set the secrets via Wrangler:

wrangler secret put R2_ACCESS_KEY_ID
wrangler secret put R2_SECRET_ACCESS_KEY
wrangler secret put R2_ACCOUNT_ID

The same approach works for AWS S3 (TYPE S3) and Google Cloud Storage (TYPE GCS).

Returning Arrow IPC#

If your client understands Apache Arrow, you can return results in Arrow IPC format for efficient data transfer:

import { tableToIPC } from '@ducklings/workers';

// In your request handler:
const table = await conn.queryArrow('SELECT * FROM readings');
const ipcBytes = tableToIPC(table, { format: 'stream' });

return new Response(ipcBytes, {
  headers: {
    'Content-Type': 'application/vnd.apache.arrow.stream',
  },
});

Consumers can read the result with any Arrow-compatible library:

# Python
import pyarrow as pa
import requests

response = requests.get('https://your-worker.dev/arrow')
reader = pa.ipc.open_stream(response.content)
table = reader.read_all()
// JavaScript (Flechette)
import { tableFromIPC } from '@ducklings/workers';

const response = await fetch('https://your-worker.dev/arrow');
const bytes = new Uint8Array(await response.arrayBuffer());
const table = tableFromIPC(bytes);

Deploying#

With everything in place, deploy to Cloudflare:

# Local development
npm run dev

# Deploy to production
npm run deploy

That's it. Wrangler handles bundling the WASM binary into the Worker.

Test it:

# Check the API
curl https://duckdb-worker.<your-subdomain>.workers.dev/

# Run a query
curl -X POST https://duckdb-worker.<your-subdomain>.workers.dev/query \
  -H 'Content-Type: application/json' \
  -d '{"sql": "SELECT 42 AS answer"}'

Prepared statements#

For parameterized queries, use prepared statements to avoid SQL injection:

const stmt = await conn.prepare(
  'SELECT * FROM readings WHERE sensor_id = ? AND temperature > ?'
);
stmt.bindInt32(1, 1);
stmt.bindDouble(2, 22.0);

const results = await stmt.run();
await stmt.close();

Available binding methods: bindBoolean, bindInt32, bindInt64, bindFloat, bindDouble, bindString, bindBlob, bindNull, bindDate, bindTimestamp.

Streaming large results#

For large result sets, avoid loading everything into memory:

const stream = await conn.queryStreaming('SELECT * FROM large_table');

for await (const chunk of stream) {
  // Process each chunk individually
  for (let row = 0; row < chunk.rowCount; row++) {
    const id = chunk.getInt32(row, 0);
    const name = chunk.getString(row, 1);
    // ...
  }
}

Limitations#

A few things to be aware of:

  • Memory: Workers have a 128 MB memory limit. DuckDB itself uses a portion of that, so you'll want to keep datasets and query results reasonably sized.

  • WASM size: The ~9.6 MB gzipped binary requires a paid Workers plan ($5/month). The free tier caps at 3 MB.

  • No dynamic extension loading: Only Parquet, JSON, and httpfs are available. You can't INSTALL or LOAD other extensions at runtime.

  • Single-threaded: No parallel query execution. Workers are single-threaded by nature, and the WASM build has threading disabled.

  • In-memory only: No persistent storage. Each new Worker instance starts fresh. Use httpfs to read external data, or populate tables on init.

Wrapping up#

Ducklings makes it possible to run DuckDB at the edge on Cloudflare Workers, something that wasn't feasible before due to the async I/O constraint. The @ducklings/workers package handles the Asyncify plumbing, provides a TypeScript API, and includes httpfs, Parquet, and JSON extensions out of the box.

This opens up some interesting use cases: SQL APIs without a database server, edge data transformations, Parquet-to-JSON converters, analytics endpoints that query directly from object storage, or lightweight ETL pipelines running globally.

The project is open source at github.com/tobilg/ducklings.

References#