Large Language Models with MLX
Chat inference on Apple Silicon using MLX — exploring the runtime, quantization options, and packaging story for local LLM deployment with Mistral and Llama2.
Personal project
Read project Chat inference on Apple Silicon using MLX — exploring the runtime, quantization options, and packaging story for local LLM deployment with Mistral and Llama2.