About
About Dan
I'm Dan, a software engineer writing about data engineering, AI engineering, and the practical work of building reliable data systems.
Background
My work centers on big data processing, data pipelines, and analytics with Apache Spark, Databricks, and modern Python tooling. Lately I've also been exploring AI engineering: retrieval, evaluation, fine-tuning, and the pieces around LLM systems that have to survive real production usage.
Topics I Cover
- Data Engineering: Best practices, PySpark optimization, data pipelines, and big data processing
- Testing & Quality: Testing strategies for data pipelines, property-based testing, data quality and validation
- AI Engineering: Building and evaluating LLM systems, retrieval, fine-tuning, and ML infrastructure
- Performance: Optimization techniques and performance tuning for big data applications