Resources

Papers, books, and courses I've found genuinely useful — organized by the topics I write about. This is an opinionated selection, not a comprehensive list.

Reinforcement Learning & LLM Alignment

Papers

Proximal Policy Optimization Algorithms Schulman et al., 2017
→ The foundational PPO paper. Read alongside my PPO deep dive.
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Shao et al., 2024
→ Where group-relative policy optimization (GRPO) originated.
Direct Preference Optimization with Group Relative Policy Optimization GDPO — 2024
→ Bridges DPO and GRPO. Covered in detail in part 4 of my RL series.
Deep Reinforcement Learning from Human Preferences Christiano et al., 2017
→ The original RLHF paper — essential context for understanding alignment.
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Rafailov et al., 2023
→ DPO simplified alignment by removing the reward model. A must-read.

Books

Reinforcement Learning: An Introduction Sutton & Barto, 2nd edition
→ Currently re-reading. Chapters 1–6 are essential foundations.

Contextual Bandits & Personalization

Papers

A Contextual-Bandit Approach to Personalized News Article Recommendation Li et al. (LinUCB), 2010
→ The LinUCB paper that kicked off practical contextual bandits.
Thompson Sampling for Contextual Bandits with Linear Payoffs Agrawal & Goyal, 2013
→ Rigorous treatment of Thompson Sampling in the contextual setting.
An Introduction to Multi-Armed Bandits Slivkins, 2019 (monograph)
→ Comprehensive monograph — works as both a textbook and reference.

Books & Courses

Bandit Algorithms Lattimore & Szepesvári, 2020
→ The definitive textbook. Dense but rewarding if you want the full theory.

LLM Evaluation

Papers

RAGAS: Automated Evaluation of Retrieval Augmented Generation Es et al., 2023
→ Framework for RAG-specific metrics — faithfulness, relevance, context recall.
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Zheng et al., 2023
→ Foundational work on using LLMs to evaluate LLMs.
[LLM evaluation survey — TODO] Add comprehensive eval survey reference

Tools

RAGAS
→ Best open-source option for RAG evaluation. Good defaults, easy to extend.
DeepEval
→ Pytest-style LLM testing. Great for CI pipelines.
LangSmith
→ End-to-end tracing and eval. Useful if you're already in the LangChain ecosystem.

RAG & Retrieval Systems

Papers

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Lewis et al., 2020
→ The original RAG paper. Still the right starting point.
Retrieval-Augmented Generation for Large Language Models: A Survey Gao et al., 2023
→ Comprehensive survey of RAG architectures and techniques.
[Advanced RAG techniques paper — TODO] Add reference for chunking/reranking strategies

Tools

LlamaIndex
→ My go-to for RAG pipelines. Great abstractions for data connectors and indexing.
Elasticsearch
→ Hybrid search (BM25 + dense vectors) is underrated. Scales well.

ML Systems & Production

Books

Designing Machine Learning Systems Chip Huyen, 2022
→ The best overview of MLOps end-to-end. Covers data, training, deployment, monitoring.
Hands-On Large Language Models Alammar & Grootendorst, 2024
→ Currently reading. Practical patterns for shipping LLM applications.

Courses

Stanford CS329S: Machine Learning Systems Design Chip Huyen
→ Covers the full lifecycle from problem framing to production monitoring.
Full Stack Deep Learning Pieter Abbeel et al.
→ Practical. Fills the gap between "I trained a model" and "it's in production."