Welcome to Unpacking Data.
Exploring data engineering, AI engineering, and building robust systems at scale.[ SYSTEM_STATUS: CORE_ACTIVE ]

Featured Post

[ FEATURED_MODULE.DLL ]v2.0.0

Cover Image for Running Codex on a local model with LM Studio

AI Engineering

codex local LLM LM Studio MLX qwen agentic model routing DSPy apple silicon

June 22, 2026⋅5 min read

Running Codex on a local model with LM Studio

Most of the jobs my orchestrator hands to Codex are small ones that do not need a frontier model, so I wired the Codex harness to a Qwen MoE running on my own Mac through LM Studio. The wiring was less obvious than expected, and this blogpost collects everything that broke on the way to a working setup, plus the DSPy-style loop it makes affordable.

Dan

Running time-series forecasting in the browser with Rust and WebAssembly

For WaySightAI I set the constraint that raw time-series data never leaves the user's browser, which meant the CSV parsing, cleaning, stationarity tests, and forecasting all had to run client-side. The core math is written in Rust and compiled to WebAssembly, and this blogpost goes through the architecture, the benchmark numbers, and the tradeoffs.

Dan

[ ARTICLE.SYS ]

Cover Image for Orchestrating a forecasting pipeline with Bedrock AgentCore and SageMaker

AI Engineering

AWS SageMaker AgentCore LLM agents MLOps time series forecasting code interpreter

June 8, 2026⋅5 min read

Orchestrating a forecasting pipeline with Bedrock AgentCore and SageMaker

A forecasting system where an LLM agent runs the whole lifecycle: exploratory analysis in a Code Interpreter sandbox, feature engineering, Bayesian hyperparameter tuning and training on SageMaker, and deployment behind a live endpoint. This blogpost goes through the three-layer architecture and the patterns that keep LLM-generated code contained.

Dan

[ ARTICLE.SYS ]

Cover Image for Fine-tuning a local 9B model for multi-turn text-to-SQL

AI Engineering

text-to-sql fine-tuning evaluation data engineering qwen bird-interact cosql

June 3, 2026⋅5 min read

Fine-tuning a local 9B model for multi-turn text-to-SQL

My text-to-SQL agent handled single questions and got lost on follow-ups, so I fine-tuned Qwen 3.5 9B on multi-turn data and compared it with Claude Sonnet 4.6 on the same rows and scorer. The runs point at table and column selection, rather than SQL writing, as the real bottleneck for small models.

Dan

[ ARTICLE.SYS ]

Cover Image for Building a code search chatbot with FAISS and the Strands SDK

AI Engineering

RAG FAISS Strands LLM context engineering vector search AI agents

November 9, 2025⋅5 min read

Building a code search chatbot with FAISS and the Strands SDK

A chatbot that answers questions about the Strands SDK by searching its own source code: FAISS for retrieval, mem0 for conversation memory, Gemini for generation. Retrieval was done in a day while context assembly filled the rest of the week, and this blogpost walks through both, including the re-ranking corner I cut.

Dan

[ ARTICLE.SYS ]

Cover Image for Building data quality checks in your pySpark data pipelines

Data Engineering

data quality pyspark delta live tables

November 18, 2022⋅5 min read

Building data quality checks in your pySpark data pipelines

Data quality is a rather critical part of any production data pipeline. In order to provide accurate SLA metrics and to ensure that the data is correct, it is important to have a way to validate the data and report the metrics for further analysis.

Dan

[ ARTICLE.SYS ]

Cover Image for Improve your PySpark ETL's performance by providing explicit schema

Data Engineering

pyspark databricks json schema tinsel

July 31, 2022⋅5 min read

Improve your PySpark ETL's performance by providing explicit schema

Have you ever stumbled upon a Spark ETL and you were left wondering how a simple loading of a dataset can take hours, even though the filtered dataset you are specifying is relatively small?

Dan

[ ARTICLE.SYS ]

Cover Image for Integration Testing for your Databricks CI/CD Data Pipelines with Microsoft Nutter

Data Engineering

nutter e2e testing integration testing pyspark databricks hypothesis

July 19, 2022⋅5 min read

Integration Testing for your Databricks CI/CD Data Pipelines with Microsoft Nutter

In this blogpost we will continue our journey of testing our Data Pipelines. If you haven't checked out the first post, make sure you do.

Dan

[ ARTICLE.SYS ]

Cover Image for Automate all your PySpark Unit Test with Hypothesis!

Data Engineering

pyspark databricks hypothesis property-based testing

July 15, 2022⋅5 min read

Automate all your PySpark Unit Test with Hypothesis!

Unit testing is often regarded as a main pillar of testing your software applications, and it usually involves testing a single/unit component and ensuring that it covers all the edge cases the software developer can think of.

Dan

Welcome to Unpacking Data.Exploring data engineering, AI engineering, and building robust systems at scale.[ SYSTEM_STATUS: CORE_ACTIVE ]

Featured Post

Recent Posts

Welcome to Unpacking Data.
Exploring data engineering, AI engineering, and building robust systems at scale.[ SYSTEM_STATUS: CORE_ACTIVE ]