Apple’s MLX is the fastest, most native way to run and fine-tune machine learning models directly on your Mac. Built by Apple’s machine learning research team, MLX is optimized for Apple silicon’s unified memory — meaning your CPU and GPU share the same memory space, and tensors move between them seamlessly.

In this post, you’ll learn how to:
- Set up MLX in a Conda environment
- Run a large language model (LLM) locally with mlx-lm
Prerequisites
- macOS on Apple silicon (M1/M2/M3/M4)
- Conda or Miniforge installed (recommended: miniforge.org)
- Python 3.10+
- At least 10 GB of free disk space for model weights
Create a Conda environment
Conda handles dependencies cleanly, and the MLX wheels are built for arm64. Let’s make a fresh environment:
# Create a new conda env with Python 3.11
conda create -n mlx python=3.11 -y
# Activate it
conda activate mlx
Now install MLX and the optional mlx-lm helper:
# Core MLX library
pip install -U mlx
# (Optional) LLM command-line tools
pip install -U mlx-lm
Note: MLX is a pure Python package with compiled C++/Metal kernels — no extra drivers or toolkits needed. It automatically uses your Mac’s GPU.
Run an LLM in one line
Once mlx-lm is installed, you can pull and run a model directly from Hugging Face. Let’s try a quantized Mistral 7B:
mlx_lm.generate \
--model mlx-community/Mistral-7B-Instruct-v0.3-4bit \
--prompt "Explain Swift's async/await in one paragraph for iOS developers."
You can also start an interactive chat session. The first run downloads the weights; later runs are instant there is a .cache directory in your home folder.
mlx_lm.chat --model mlx-community/Mistral-7B-Instruct-v0.3-4bit
