Use MLX LM for large language models on Apple silicon with MLX

Apple’s MLX is the fastest, most native way to run and fine-tune machine learning models directly on your Mac. Built by Apple’s machine learning research team, MLX is optimized for Apple silicon’s unified memory — meaning your CPU and GPU share the same memory space, and tensors move between them seamlessly.

In this post, you’ll learn how to:

  • Set up MLX in a Conda environment
  • Run a large language model (LLM) locally with mlx-lm

Prerequisites

  • macOS on Apple silicon (M1/M2/M3/M4)
  • Conda or Miniforge installed (recommended: miniforge.org)
  • Python 3.10+
  • At least 10 GB of free disk space for model weights

Create a Conda environment

Conda handles dependencies cleanly, and the MLX wheels are built for arm64. Let’s make a fresh environment:

# Create a new conda env with Python 3.11
conda create -n mlx python=3.11 -y

# Activate it
conda activate mlx

Now install MLX and the optional mlx-lm helper:

# Core MLX library
pip install -U mlx

# (Optional) LLM command-line tools
pip install -U mlx-lm

Note: MLX is a pure Python package with compiled C++/Metal kernels — no extra drivers or toolkits needed. It automatically uses your Mac’s GPU.

Run an LLM in one line

Once mlx-lm is installed, you can pull and run a model directly from Hugging Face. Let’s try a quantized Mistral 7B:

mlx_lm.generate \
  --model mlx-community/Mistral-7B-Instruct-v0.3-4bit \
  --prompt "Explain Swift's async/await in one paragraph for iOS developers."

You can also start an interactive chat session. The first run downloads the weights; later runs are instant there is a .cache directory in your home folder.

mlx_lm.chat --model mlx-community/Mistral-7B-Instruct-v0.3-4bit

tomkausch

Leave a Reply

Your email address will not be published. Required fields are marked *