Tools & Setup
The tools, frameworks, and services I use for ML engineering, research, and daily work.
ML & Deep Learning
PyTorch Primary framework for building and training deep learning models.
Transformers Hugging Face library for pretrained transformer models.
PEFT Parameter-efficient fine-tuning methods like LoRA and QLoRA.
TRL Transformer reinforcement learning for RLHF and alignment.
DeepSpeed Distributed training and inference optimisation for large models.
Axolotl Streamlined fine-tuning pipeline for LLMs.
Unsloth Fast and memory-efficient LLM fine-tuning.
LLM Tooling
LangChain Framework for building applications powered by language models.
LangGraph Stateful agent orchestration with cycles and persistence.
LlamaIndex Data framework for connecting LLMs with external data.
DSPy Programmatic prompt optimisation and LM pipeline compiler.
LiteLLM Unified API interface for 100+ LLM providers.
vLLM High-throughput LLM serving with PagedAttention.
Ollama Run open-source LLMs locally with a single command.
Data & Analytics
Pandas Data manipulation and analysis in Python.
NumPy Fundamental numerical computing library.
scikit-learn Classical ML algorithms and preprocessing utilities.
XGBoost Gradient boosting for tabular data and competitions.
LightGBM Fast gradient boosting framework for large datasets.
Spark Distributed data processing at scale.
BigQuery Serverless data warehouse for analytics workloads.
SQL The foundation for querying structured data everywhere.
Evaluation & Monitoring
Weights & Biases Experiment tracking, model registry, and visualisation.
MLflow Open-source platform for the full ML lifecycle.
RAGAS Evaluation framework for retrieval-augmented generation.
DeepEval Unit testing framework for LLM outputs.
Opik LLM evaluation and tracing platform.
LangSmith Debugging, testing, and monitoring for LLM applications.
RL & Bandits
Stable Baselines3 Reliable RL algorithm implementations in PyTorch.
Gymnasium Standard API for reinforcement learning environments.
Vowpal Wabbit Fast online learning and contextual bandit algorithms.
Search & Retrieval
Elasticsearch Distributed search and analytics engine.
Weaviate Open-source vector database for AI applications.
FAISS Efficient similarity search and dense vector clustering.
Infrastructure
Docker Containerisation for reproducible environments.
Kubernetes Container orchestration for scalable deployments.
Terraform Infrastructure as code for cloud provisioning.
AWS Cloud platform for compute, storage, and ML services.
Azure Microsoft cloud with strong ML and enterprise tooling.
Airflow Workflow orchestration for data and ML pipelines.
Kubeflow ML toolkit for Kubernetes-native model training and serving.
Databricks Unified analytics platform for data engineering and ML.