Serge is a chat interface based on llama.cpp for running alpaca models. Fully self-hosted, no API key required. Fits in 4GB RAM and runs on CPU.

  • SvelteKit front end
  • Redis for storing chat records and parameters
  • FastAPI + langchain for API, wrapping calls to llama.cpp with python bindings


Setting up Serge is very simple and can be started in one command:

docker run -d \
         -v weights:/usr/src/app/weights -v datadb:/data/db/ \
         -p 8008:8008

Then just go to http://localhost:8008/ That’s it!

API documentation can be found at http://localhost:8008/api/docs turn up

