Using Iceberg Catalogs in the Browser with DuckDB-Wasm

Using Iceberg Catalogs in the Browser with DuckDB-Wasm

Tobias MΓΌller
5 min read

With recent updates of DuckDB itself, DuckDB-Wasm and the Iceberg extension it is now possible to query Iceberg catalog directly from the browser, with no backends. involved.

Example clients that work:

πŸ’‘
Be aware the credentials you’re using should not be shared. The below examples are for demo purposes, and not for public or production usage. It’s possible to set this up in a safe manner though, for example by granting a fine-grained AWS IAM role to users of a frontend application when they log in via Cognito. This will grant the users temporary permissions to use the respective services.

Disable the built-in http client in DuckDB-Wasm

To remotely querying Iceberg data, we first need to disable the internal http client of DuckDB-Wasm, and then use the official httpfs extension build:

-- Load the httpfs extension, this disables the built-in http client
LOAD http;

Load the Iceberg extension

You can then load the Iceberg extension:

-- Load Iceberg extension
LOAD iceberg;

Query Iceberg data in S3 Tables

I published another blog post β€œQuery S3 Tables with DuckDB” in Q1/2025 which outlines the basic setup of a S3 Tables buckets, as well as a Namespace and an actual Table, please refer to it when trying to set this up in your environment.

So, if you’d like to attach an S3 Tables bucket as a database to DuckDB-Wasm, you can do this like the following:

ATTACH 'arn:aws:s3tables:us-east-1:12345678912:bucket/duckdb-test'
  AS test_db (
    TYPE iceberg,
    ENDPOINT_TYPE s3_tables
);

The attach target is the ARN of the S3 Table bucket.

You then can check which tables are available:

SHOW ALL TABLES;

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   database   β”‚ schema  β”‚    name     β”‚                column_names                 β”‚      column_types       β”‚ temporary β”‚
β”‚   varchar    β”‚ varchar β”‚   varchar   β”‚                  varchar[]                  β”‚        varchar[]        β”‚  boolean  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ test_db      β”‚ test    β”‚ daily_sales β”‚ [sale_date, product_category, sales_amount] β”‚ [DATE, VARCHAR, DOUBLE] β”‚ false     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

To query the table, just use normal SQL statements like you would with any other local table:

SELECT * FROM test_db.test.daily_sales;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ sale_date  β”‚ product_category β”‚ sales_amount β”‚
β”‚    date    β”‚     varchar      β”‚    double    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2024-01-15 β”‚ Laptop           β”‚        900.0 β”‚
β”‚ 2024-01-15 β”‚ Monitor          β”‚        250.0 β”‚
β”‚ 2024-01-16 β”‚ Laptop           β”‚       1350.0 β”‚
β”‚ 2024-02-01 β”‚ Monitor          β”‚        300.0 β”‚
β”‚ 2024-02-01 β”‚ Keyboard         β”‚         60.0 β”‚
β”‚ 2024-02-02 β”‚ Mouse            β”‚         25.0 β”‚
β”‚ 2024-02-02 β”‚ Laptop           β”‚       1050.0 β”‚
β”‚ 2024-02-03 β”‚ Laptop           β”‚       1200.0 β”‚
β”‚ 2024-02-03 β”‚ Monitor          β”‚        375.0 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Query Iceberg data in R2 Data Catalog / R2

There are a few steps to set up an R2 bucket for the usage as an Iceberg catalog.

Creating a R2 bucket

To create a R2 bucket, go to you account’s Cloudflare Dashboard, and in the left sidebar click on β€œStorage & databases” β†’ β€œR2 Object Storage” and then on β€œOverview”:

After that, click on β€œCreate bucket” in the top right corner. You’ll see the next screen below:

Setting a CORS policy

To set a CORS policy which is necessary to allow specific origin requests from the browser, you need to go to the β€œCORS” Policy” section and click on β€œAdd”. Then, you can copy & paste the CORS policy below and click on β€œSave”.

πŸ’‘
Make sure to add your frontend domain to the AllowedOrigins, otherwise this will NOT work.
[
  {
    "AllowedOrigins": [
      "https://sql-workbench.com",
      "https://embedded.sql-workbench.com",
      "https://terminal.sql-workbench.com",
      "https://shell.duckdb.org"
    ],
    "AllowedMethods": [
      "GET",
      "HEAD",
      "PUT",
      "POST"
    ],
    "AllowedHeaders": [
      "Authorization",
      "Cache-Control",
      "Content-Range",
      "Content-Type",
      "Content-Length",
      "Range",
      "X-Amz-Acl",
      "X-Amz-Content-Sha256",
      "X-Amz-Date",
      "X-Amz-Security-Token",
      "X-Iceberg-Access-Delegation",
      "X-Host-Override"
    ],
    "ExposeHeaders": [
      "Content-Type",
      "Access-Control-Allow-Origin",
      "ETag"
    ]
  }
]

Enabling R2 Data Catalog

Now, enable the R2 Data Catalog in your R2 bucket’s settings:

After clicking on β€œEnable”, it takes a few seconds, and you should see your Catalog URI and your Warehouse Name. Those are both needed when querying the data later.

Creating a Account Token for R2 / R2 Data Catalog access

To access the data via queries, DuckDB needs an Account API Token. In the left sidebar menu, click on β€œManage account” β†’ β€œAccount API Tokens”

Then, on the next screen, click on β€œCreate Token” and on the following screen on β€œGet started” in the Custom Token section.

Next, enter a name for your token, and create the two Permission entries as outlined below:

Click on β€œContinue to summary” and then create your access token

πŸ’‘
Save the newly created Account API Token somewhere safe, you will not be able to retrieve it again. Only a rotation or new creation is possible in case you loose the token afterwards.

Querying data

You now have all the pieces you need for querying the data from your R2 Data Catalog.

As the R2 Data Catalog doesn’t send CORS headers, you need to use our cors.sqlqry.run CORS proxy, which provides a /iceberg route that needs to be postfixed with the R2 Data Catalog base URL concatenated with your Cloudflare Account ID and the name of the R2 bucket your created.

Create a DuckDB secret for the R2 Data Catalog as shown belog. All requests will subsequently use this secret to authorize themselves:

CREATE OR REPLACE SECRET r2secret (
    TYPE iceberg, 
    ENDPOINT 'https://cors.sqlqry.run/iceberg/catalog.cloudflarestorage.com/YOUR_CLOUDFLARE_ACCOUNT_ID/YOUR_BUCKET_NAME', 
    TOKEN 'YOUR_ACCOUNT_API_TOKEN'
);

Then, attach the remote Iceberg catalog as database by supplying the Warehouse Name from the R2 Data Catalog settings:

ATTACH IF NOT EXISTS 'YOUR_WAREHOUSE_NAME' AS r2lake (TYPE iceberg);

If you haven’t created a schema (namespace) yet, you can do this like this with plain SQL:

CREATE SCHEMA IF NOT EXISTS. r2lake.test;

Create an example table:

CREATE TABLE IF NOT EXISTS r2lake.test.sales (
    sale_id INT,
    product_id INT,
    quantity INT,
    price DECIMAL(10, 2),
    sale_date DATE
);

Insert data:

INSERT INTO r2lake.test.sales 
    (sale_id, product_id, quantity, price, sale_date) 
VALUES
    (1, 1, 10, 100.00, '2021-01-01'),
    (2, 2, 20, 234.45, '2021-01-02'),
    (3, 3, 30, 30.99, '2021-01-03');

Query the data:

SELECT * FROM r2lake.test.sales;

Summary

Thanks to the innovations of the DuckDB team, it’s now possible to directly query Iceberg catalogs from the browser. The use cases for this may be in-browser SQL clients for example, or demo applications that demonstrate how Iceberg itself works.