I build production ML systems where learning, decision-making, and evaluation matter.

I’m Vitor Sousa, a Senior Data Scientist at Wellhub on the GenAI & Engagement team, where I build production machine learning systems focused on personalization, decision-making, and user engagement. Previously at Farfetch, I built recommendation and size-prediction systems serving 4M+ customers across 190 countries — deep learning from scratch, learning-to-rank, and a published paper at ACM RecSys. This site goes beyond the day job — it's where I dig into research interests, build things from scratch to understand them deeply, and write about the ideas I'm most curious about — lately: transformer internals, post-training methods, and honest evaluation.

12 articles · 5 projects

Read my writing View projects

Featured series

Reinforcement Learning for LLMs

A 5-part deep dive from RL foundations through PPO, GRPO, and GDPO to a map of the GRPO variant family — covering the full policy optimization stack for language model alignment, with math derivations and from-scratch implementations.

Read the series

Selected writing

After GDPO: A Map of the GRPO Family

GRPO didn't fragment into a zoo. It exposed a handful of design axes — token vs. sequence granularity, clip width, length and std normalization, KL, advantage normalization, sampling — and every 'new' algorithm is a different point in that space. A practitioner's map of Dr. GRPO, DAPO, GSPO, GDPO, and the frontier, with a decision guide for when to reach for each.

Jun 22, 2026 ~18 min

GDPO: Multi-Reward RL Done Right

When GRPO meets multiple rewards, advantages collapse. GDPO fixes this by normalizing each reward independently before combining. Learn why this matters for tool calling, math reasoning, and any multi-objective LLM alignment.

Feb 11, 2026 ~25 min

GRPO: Eliminating the Value Network

Group Relative Policy Optimization replaces PPO's learned value function with a simple insight: sample multiple outputs and use their relative rewards as advantages. 33% memory savings, simpler implementation, and the algorithm powering DeepSeek-R1.

Feb 3, 2026 ~32 min

View all articles

Selected projects

RLVR from Scratch: Full LLM Alignment Pipeline

Personal project 🔨 In Development — Phase 1/5 complete

A from-scratch implementation of the full transformer → pretraining → SFT → GRPO → GDPO pipeline. Each layer built, tested, and documented. The repo is the artifact, the site is the narrative.

deep-learning · transformers · reinforcement-learning · rlvr · grpo · gdpo · pytorch · from-scratch

Tailor: Size Recommendations at Farfetch Scale

Farfetch 📄 ACM RecSys 2023

Sequence classification models for personalized size prediction in luxury fashion — LSTMs, attention mechanisms, and a published paper at ACM RecSys 2023.

recommendation-systems · deep-learning · pytorch · sequence-models · attention · production-ml · ab-testing

RAG System with LlamaIndex, Elasticsearch & Llama3

Personal project

Local-first RAG pipeline with hybrid search: BM25 + dense retrieval on Elasticsearch, LlamaIndex orchestration, and Llama3 for generation. Evaluated with RAGAS metrics across chunking strategies and retrieval configurations.

Elasticsearch · LlamaIndex · Llama3 · RAG · Vector Search

View all projects