Running LLaVA, an Open Source GPT-4V alternative on a Mac with Apple Silicon

Nov 30, 2023

Multimodal LLM has brought LLM to new dimensions. When LLMs can see, their capabilities multiply. GPT-4V(ision) produced some impressive demos, including understanding jokes in images, acting as e-sports commentator, turning mock ups into apps and reproducing pictures in novel styles.

What if you can have a GPT-4V like model running on your laptop? LLaVA is an Open Source multimodal model similar to GPT-4V, reaching 85.1% of relative score in visual chat compared with GPT-4V. Here's how to get LLaVA up and running on macOS.



Prerequisites: Install miniconda.

mkdir -p ~/miniconda3
curl -o ~/miniconda3/
bash ~/miniconda3/ -b -u -p ~/miniconda3
rm -rf ~/miniconda3/

Follow LLaVA installation instructions:

Clone the repository:

git clone
cd LLaVA

Install Python dependencies:

conda create -n llava python=3.10 -y
conda activate llava
python -mpip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install torch==2.1.0 torchvision==0.16.0
pip uninstall bitsandbytes


You can either use Web UI or CLI for inference. You'd need to add --device mps for running model worker or CLI on Apple silicon.

Web UI:

First start controller, gradio web server, and start a model worker, then go to http://localhost:7860/. You can either specify liuhaotian/llava-v1.5-13b as --model-path (models are downloaded to ~/.cache/huggingface/hub/), or download models from and specify the local model path.

Sometimes I got the error "NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE." on the Web UI, and model worker crashes. Restarting model worker would resolve this issue.

# activate conda where you start controller, web server or model worker
conda activate llava

# gradio controller and web server
python -m llava.serve.controller --host --port 10000
python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload

# launch model worker
python -m llava.serve.model_worker --host --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --device mps --model-path liuhaotian/llava-v1.5-13b


Substitute --image-file with your image URL or path to local file.

conda activate llava
python -m llava.serve.cli \
    --model-path liuhaotian/llava-v1.5-7b \
    --image-file "" \
    --device mps

Ending notes

An even easier way if you just want to spin up the Web UI is to use llamafile, the latest hotness which compresses LLMs into a single executable with local web server or CLI. Follow Simon Willison's guide here and it's a matter of downloading and executing a single file executable to run LLMs locally.