Article Series
Deep-dive series that build understanding across multiple articles, from foundations to production.
Policy Optimization for LLMs: From Fundamentals to Production
From PPO fundamentals to GRPO and GDPO — the complete policy optimization series for aligning language models with reinforcement learning.
View seriesAdaptive Optimization at Scale: Contextual Bandits from Theory to Production
A 5-part journey from decision frameworks and regret theory through algorithm implementations to production deployment of contextual bandit systems.
View series