Exllama Homepage, Documentation and Downloads – Llama HF Converter Rewrite – News Fast Delivery
Exllama is a more memory-efficient rewrite of Llama’s HF converter implementation for quantization weights. Designed to quantize weights Fast and memory-efficient inference (not just attention) Map across multiple devices Built-in (multiple) LoRA support Companion library for funky sampling functions Note that this project is in the proof-of-concept & preview […]