Blog Posts
Insights and articles about cloud computing, efficient analytics, and open source.
Filter by tag
Filter by tag:
Use your favorite AI tool to read the lastest AWS News
You can use the unofficial AWS News MCP Server, which combines the news articles, blog posts and updates of more than 40 different AWS Newsfeeds to a single source. Connecting to the AWS News MCP Server You can use different protocols to access it: ...
Using Amazon SageMaker Lakehouse with DuckDB
Preconditions To use the Amazon SageMaker Lakehouse with DuckDB, you first have to create a S3 Table bucket, a namespace and an actual S3 Table. All those steps are described in my other blog post “Query S3 Tables with DuckDB”, so please make sure yo...
Welcome to the age of $10/month Lakehouses
Recap: Data Warehouses, Data Lakes, Lakehouses? As a short recap, what do these mean, and how are they differentiated? Modern Data Warehouses, like Amazon Redshift, Google BigQuery, and Snowflake, offer fast, SQL-optimized performance for structured ...
Using DuckDB databases as lightweight Data Lake access layer
Data Lakes come in a broad variety and lots of different flavors. AWS, Azure, Google Cloud, Snowflake, DataBricks, etc. they all have their specialties, strong and weak sides. Common among them is that the most, if not all, of them use Object Storage...
Handling GTFS data with DuckDB
The General Transit Feed Specification (GTFS) is a standardized, open data format for public transportation schedules and geographic information. In practice, a GTFS feed is simply a ZIP archive of text (CSV) tables - such as stops.txt, routes.txt, a...
Cost-efficient event ingestion into Iceberg S3 Tables on AWS
Amazon S3 Tables was launched on December 3rd 2024, and provides you “storage that is optimized for tabular data such as daily purchase transactions, streaming sensor data, and ad impressions in Apache Iceberg format”. While S3 Tables can be queried ...
Query S3 Tables with DuckDB
DuckDB has gained a new feature in preview, that allows querying of Iceberg data in AWS S3 Tables. Setting up a S3 Table There are multiple steps which need to be performed to set up a S3 Table that can be then queried with tools like DuckDB. As the ...
Querying IP addresses and CIDR ranges with DuckDB
I had a use case that eventually required performing IP address lookups in a given list of CIDR ranges, as I maintain an open source project that gathers IP address range data from public cloud providers, and also wrote an article in my blog about an...
Chat with a Duck
A while ago I published sql-workbench.com and the accompanying blog post called "Using DuckDB-WASM for in-browser Data Engineering". The SQL Workbench enables its users to analyze local or remote data directly in the browser. This lowers the bar rega...
Using DuckDB-WASM for in-browser Data Engineering
Introduction DuckDB, the in-process DBMS specialized in OLAP workloads, had a very rapid growth during the last year, both in functionality, but also popularity amongst its users, but also with developers that contribute many projects to the Open Sou...