Articles tagged grpo

Back to blog
Comparison diagram showing PPO with value network versus GRPO with group-based advantage estimation

GRPO: Eliminating the Value Network

Group Relative Policy Optimization replaces PPO's learned value function with a simple insight: sample multiple outputs and use their relative rewards as advantages. 33% memory savings, simpler implementation, and the algorithm powering DeepSeek-R1.

Series
Policy Optimization for LLMs: From Fundamentals to Production Part 3

~32 min

Read article