Running LLaVA, an Open Source GPT-4V alternative on a Mac with Apple Silicon

Nov 30, 2023

Multimodal LLM has brought LLM to new dimensions. When LLMs can see, their capabilities multiply. GPT-4V(ision) produced some impressive demos, including understanding jokes in images, acting as e-sports commentator, turning mock ups into apps and reproducing pictures in novel styles.

What if you can have a GPT-4V like model running on your laptop? LLaVA is an Open Source multimodal model similar to GPT-4V, reaching 85.1% of relative score in visual chat compared with GPT-4V. Here's how to get LLaVA up and running on macOS.

LLaVA WebUI

Installation

Prerequisites: Install miniconda.

mkdir -p ~/miniconda3
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

Follow LLaVA installation instructions:

Clone the repository:

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA

Install Python dependencies:

conda create -n llava python=3.10 -y
conda activate llava
python -mpip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install torch==2.1.0 torchvision==0.16.0
pip uninstall bitsandbytes

Usage

You can either use Web UI or CLI for inference. You'd need to add --device mps for running model worker or CLI on Apple silicon.

Web UI:

First start controller, gradio web server, and start a model worker, then go to http://localhost:7860/. You can either specify liuhaotian/llava-v1.5-13b as --model-path (models are downloaded to ~/.cache/huggingface/hub/), or download models from https://huggingface.co/liuhaotian/llava-v1.5-13b and specify the local model path.

Sometimes I got the error "NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE." on the Web UI, and model worker crashes. Restarting model worker would resolve this issue.

# activate conda where you start controller, web server or model worker
conda activate llava

# gradio controller and web server
python -m llava.serve.controller --host 0.0.0.0 --port 10000
python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload

# launch model worker
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --device mps --model-path liuhaotian/llava-v1.5-13b

CLI:

Substitute --image-file with your image URL or path to local file.

conda activate llava
python -m llava.serve.cli \
    --model-path liuhaotian/llava-v1.5-7b \
    --image-file "https://llava-vl.github.io/static/images/view.jpg" \
    --device mps

Ending notes

An even easier way if you just want to spin up the Web UI is to use llamafile, the latest hotness which compresses LLMs into a single executable with local web server or CLI. Follow Simon Willison's guide here and it's a matter of downloading and executing a single file executable to run LLMs locally.