Local AI Hub
LLM ConfigsLLM Runner AIORankingsCoding AgentsLLM RunnersLLM Web UIMultimodal☕Support
Buy Me a Coffee

© 2026 Local AI Hub. All rights reserved.

LLM Runners

Discover the best tools for running local LLMs on your own hardware

RankApp NameOSPlatformSupportedExplanation
1
Ollama
Windows / macOS / LinuxCUDA / ROCm / Metal / CPULLM, RAG, MCP, APIMost popular local LLM runner. Single command: 'ollama run llama3'. Automatic model management, OpenAI-compatible API, Docker support, MCP agent integration.
2
LM Studio
Windows / macOS / LinuxCUDA / Vulkan / Metal / CPULLM, RAG, Chat, APIGUI-based local LLM client. Easy GGUF/Q4 model download and testing. Built-in chat interface, server mode for API access, model comparison tools.
3
llama.cpp
Windows / macOS / LinuxCUDA / Metal / Vulkan / CPU / WebGPULLM, Embeddings, RAGC++ inference engine for LLMs. GGUF format standard. High performance on all hardware. Supports 3-bit to 8-bit quantization. Base for most local LLM tools.
4
KoboldCPP
Windows / macOS / LinuxCUDA / Vulkan / Metal / CPU / WebGPULLM, Roleplay, RAG, Story modeWeb-based LLM frontend built on llama.cpp. Excellent for character roleplay and story mode. Smart context management, NovelAI models support, API compatible.
5
llamafile
Windows / macOS / LinuxCUDA / Metal / Vulkan / CPULLM, Embeddings, Image GenMozilla project: self-extracting executable containing llama.cpp + models. Single file distribution. No dependencies needed. Runs on any device. Code licensed under Apache 2.0.
6
vLLM
Linux / WindowsCUDA / TensorRT / ROCmLLM Serving, API, BatchHigh-throughput LLM serving engine. PagedAttention technology for memory efficiency. Supports Llama, Mistral, Qwen, Gemma. Production-ready API, multi-GPU support.
7
MLX
macOSApple Neural Engine / MetalLLM, Fine-tuning, InferenceApple-optimized ML framework. Best performance on Mac with Apple Silicon. Fast inference, easy fine-tuning. Supports Llama, Mistral, Gemma models natively.
8
Jan
Windows / macOS / LinuxCUDA / Vulkan / Metal / CPULLM, RAG, API, ExtensionsOpen-source, cross-platform LLM runner. Desktop app with clean UI. Supports GGUF, Safetensors models. Extension system, local API server, offline-first design.
9
Text Generation WebUI
Windows / macOS / LinuxCUDA / ROCm / Vulkan / CPULLM, Roleplay, RAG, TrainingFeature-rich web UI for local LLMs. One-click install, extension system. Supports llama.cpp, transformers, ExLlamaV2. Training, embedding, image generation plugins.
10
SillyTavern
Windows / macOS / Linux / DockerNode.js (connects to any backend)LLM Frontend, Roleplay, RAGAdvanced LLM frontend for character roleplay. Connects to Ollama, KoboldCPP, llama.cpp, API backends. Token visualization, story mode, avatar system, extensions.
11
ExLlamaV2
Linux / WindowsCUDALLM Inference, High PerformanceCUDA-optimized LLM inference engine. Faster than llama.cpp for FP16/A16. Supports GPTQ, AWQ, FP16 models. Best performance for NVIDIA GPUs 10XX+.
12
llama.cpp Forks — Aquila
Windows / macOS / LinuxCUDA / CPULLM, Chinese Languagellama.cpp fork optimized for Chinese language models. Supports Aquila, Baichuan, ChatGLM models. Better Chinese tokenization and context handling.
13
llama.cpp Fork — GPTQ.cpp
Windows / macOS / LinuxCUDA / CPULLM, GPTQ Modelsllama.cpp fork for GPTQ quantized models. Runs GPTQ models natively without conversion. Lower VRAM usage, good for 4-bit quantized models.
14
llama.cpp Fork —llama.cpp-Metal
macOSMetal / Apple SiliconLLM, Apple GPU AccelerationMetal-optimized branch of llama.cpp. Best performance on Mac with Apple Silicon. Uses Metal Performance Shaders for GPU acceleration.
15
llama.cpp Fork — WebLLM
Any (Browser)WebGPU / WASMLLM, Browser-based InferenceRun LLMs directly in browser via WebGPU. No server needed. Supports Phi-2, Llama, Mistral in browser. Privacy-first, fully client-side execution.
16
llama.cpp Fork — Ollama.cpp
Windows / macOS / LinuxCUDA / Vulkan / CPULLM, Lightweight RuntimeLightweight C++ runtime inspired by Ollama. Minimal dependencies, fast startup. Good for embedded systems and edge devices. Simple API for local inference.