Resources
Papers, books, and courses I've found genuinely useful — organized by the topics I write about. This is an opinionated selection, not a comprehensive list.
Reinforcement Learning & LLM Alignment
Papers
- Proximal Policy Optimization Algorithms
→ The foundational PPO paper. Read alongside my PPO deep dive.
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
→ Where group-relative policy optimization (GRPO) originated.
- Direct Preference Optimization with Group Relative Policy Optimization
→ Bridges DPO and GRPO. Covered in detail in part 4 of my RL series.
- Deep Reinforcement Learning from Human Preferences
→ The original RLHF paper — essential context for understanding alignment.
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
→ DPO simplified alignment by removing the reward model. A must-read.
Books
- Reinforcement Learning: An Introduction
→ Currently re-reading. Chapters 1–6 are essential foundations.
Contextual Bandits & Personalization
Papers
- A Contextual-Bandit Approach to Personalized News Article Recommendation
→ The LinUCB paper that kicked off practical contextual bandits.
- Thompson Sampling for Contextual Bandits with Linear Payoffs
→ Rigorous treatment of Thompson Sampling in the contextual setting.
- An Introduction to Multi-Armed Bandits
→ Comprehensive monograph — works as both a textbook and reference.
Books & Courses
- Bandit Algorithms
→ The definitive textbook. Dense but rewarding if you want the full theory.
LLM Evaluation
Papers
- RAGAS: Automated Evaluation of Retrieval Augmented Generation
→ Framework for RAG-specific metrics — faithfulness, relevance, context recall.
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
→ Foundational work on using LLMs to evaluate LLMs.
- [LLM evaluation survey — TODO]
Tools
RAG & Retrieval Systems
Papers
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
→ The original RAG paper. Still the right starting point.
- Retrieval-Augmented Generation for Large Language Models: A Survey
→ Comprehensive survey of RAG architectures and techniques.
- [Advanced RAG techniques paper — TODO]
Tools
- LlamaIndex
→ My go-to for RAG pipelines. Great abstractions for data connectors and indexing.
- Elasticsearch
→ Hybrid search (BM25 + dense vectors) is underrated. Scales well.
ML Systems & Production
Books
- Designing Machine Learning Systems
→ The best overview of MLOps end-to-end. Covers data, training, deployment, monitoring.
- Hands-On Large Language Models
→ Currently reading. Practical patterns for shipping LLM applications.
Courses
- Stanford CS329S: Machine Learning Systems Design
→ Covers the full lifecycle from problem framing to production monitoring.
- Full Stack Deep Learning
→ Practical. Fills the gap between "I trained a model" and "it's in production."