Writing & Research

Articles & Engineering Notes

This is the hub for my long-form writing on data science, machine learning, and the engineering patterns that support them in production.

Each article aims to balance clarity with rigor: you will find walkthroughs of algorithms, postmortems from experiments, and the practical guardrails that emerge from shipping systems.

Latest articles
Writing approach

Featured Articles

Deep dives and foundational pieces worth exploring first

Featured

Mermaid diagram showing three pillars of LLM evaluation: What to Evaluate (Faithfulness vs Helpfulness), How to Evaluate (Methods and Metrics), and Making it Systematic (Process and Monitoring), connected in a circular feedback loop

Beyond the Vibe Check: A Systematic Approach to LLM Evaluation

Stop relying on gut feelings to evaluate LLM outputs. Learn systematic approaches to build trustworthy evaluation pipelines with measurable metrics, proven methods, and production-ready practices. A practical guide covering faithfulness vs helpfulness, LLM-as-judge techniques, bias mitigation, and continuous monitoring.

~60 min

Decision tree diagram showing when to use contextual bandits versus alternatives

When to Use Contextual Bandits: The Decision Framework

Stop running month-long A/B tests that leave value on the table. Learn when contextual bandits are the right choice for adaptive, personalized optimization—and when to stick with simpler alternatives.

~20 min

Neural network architecture diagram for contextual bandits

Neural Contextual Bandits for High-Dimensional Data

When linear models fail, neural networks step in. Learn when to use neural bandits, how to quantify uncertainty with bootstrap ensembles, and handle high-dimensional action spaces with embeddings and two-stage selection.

~22 min

Latest articles

Use the filters to surface the topics, stacks, and case studies that match your current problem.

Comparison diagram showing PPO with value network versus GRPO with group-based advantage estimation

GRPO: Eliminating the Value Network

Group Relative Policy Optimization replaces PPO's learned value function with a simple insight: sample multiple outputs and use their relative rewards as advantages. 33% memory savings, simpler implementation, and the algorithm powering DeepSeek-R1.

Series

Policy Optimization for LLMs: From Fundamentals to Production Part 3

Feb 3, 2026 ~32 min

Read article

Diagram showing PPO four-model architecture for LLM training

PPO for Language Models: The RLHF Workhorse

Deep dive into Proximal Policy Optimization—the algorithm behind most LLM alignment. Understand trust regions, the clipped objective, GAE, and why PPO's four-model architecture creates problems at scale.

Series

Policy Optimization for LLMs: From Fundamentals to Production Part 2

Jan 18, 2026 ~28 min

Read article

Diagram showing the reinforcement learning loop applied to language model fine-tuning

Reinforcement Learning Foundations for LLM Alignment

Master the RL fundamentals powering modern LLM training: from MDPs and policy gradients through value functions and actor-critic methods. The mathematical foundations you need before diving into PPO, GRPO, and beyond.

Series

Policy Optimization for LLMs: From Fundamentals to Production Part 1

Jan 11, 2026 ~35 min

Read article

Diagram showing the production architecture for contextual bandits deployments

Deploying Contextual Bandits: Production Guide and Offline Evaluation

Systems design, offline evaluation, and monitoring strategies for running contextual bandits safely in production.

Series

Adaptive Optimization at Scale: Contextual Bandits from Theory to Production Part 5

Nov 21, 2025 24 min read

Read article

Neural Contextual Bandits for High-Dimensional Data

Series

Adaptive Optimization at Scale: Contextual Bandits from Theory to Production Part 4

Nov 19, 2025 ~22 min

Read article

Implementing Contextual Bandits: Complete Algorithm Guide

Complete Python implementations of ε-greedy, UCB, LinUCB, and Thompson Sampling. Learn which algorithm to use for your problem with default hyperparameters and practical tuning guidance.

Series

Adaptive Optimization at Scale: Contextual Bandits from Theory to Production Part 3

Nov 17, 2025 ~25 min

Read article

Visual comparison of regret growth curves for different bandit algorithms

Contextual Bandit Theory: Regret Bounds and Exploration

Understand the theory behind contextual bandits: regret bounds, the exploration-exploitation tradeoff, reward models, and why certain algorithms work. Math that directly informs practice.

Series

Adaptive Optimization at Scale: Contextual Bandits from Theory to Production Part 2

Nov 15, 2025 ~18 min

Read article

When to Use Contextual Bandits: The Decision Framework

Series

Adaptive Optimization at Scale: Contextual Bandits from Theory to Production Part 1

Nov 13, 2025 ~20 min

Read article

Beyond the Vibe Check: A Systematic Approach to LLM Evaluation

Nov 5, 2025 ~60 min

Read article

Stylized visualization of the Differential Transformer attention mechanism.

Differential Transformer Notes

I pulled together notes on the Differential Transformer and its take on attention.

Oct 12, 2024 4 min read

Read article

Diagram highlighting the OpenELM model family and efficiency focus.

OpenELM Notes

I wrote about OpenELM and how Apple approaches efficient language models.

Apr 30, 2023 8 min read

Read article

Writing approach

The blog is a living lab notebook. Some essays are polished deep dives, others capture lessons while they are still fresh — both have a place in the learning loop.

Perfection slows the feedback cycle, so I share drafts, return with new data, and document the missteps alongside the breakthroughs.

If something sparks a question or disagreement, please reach out. Dialogue keeps the writing honest and ensures the next revision is better informed.