I build and write about production-grade AI systems.
I'm Vitor Sousa, a Data Scientist & AI Engineer at Wellhub. This is a curated technical notebook covering experiments, deployments, and engineering lessons from LLM products.
11 articles · 3 projects
Selected writing
GDPO: Multi-Reward RL Done Right
When GRPO meets multiple rewards, advantages collapse. GDPO fixes this by normalizing each reward independently before combining. Learn why this matters for tool calling, math reasoning, and any multi-objective LLM alignment.
GRPO: Eliminating the Value Network
Group Relative Policy Optimization replaces PPO's learned value function with a simple insight: sample multiple outputs and use their relative rewards as advantages. 33% memory savings, simpler implementation, and the algorithm powering DeepSeek-R1.
PPO for Language Models: The RLHF Workhorse
Deep dive into Proximal Policy Optimization—the algorithm behind most LLM alignment. Understand trust regions, the clipped objective, GAE, and why PPO's four-model architecture creates problems at scale.
Selected projects
RAG System with LlamaIndex, Elasticsearch & Llama3
A deep dive into building a local-first retrieval-augmented generation system for document Q&A.
Elasticsearch · LlamaIndex · Llama3 · RAG · Vector Search
LoRA and DoRA Implementation
I implemented LoRA and DoRA from scratch in PyTorch to understand the methods end to end.
llms · peft · pytorch
Large Language Models with MLX
I explored chat tooling on Apple Silicon using MLX to understand the runtime and packaging story.
llms · mistral · llama2