Skip to content
OBLAIDISH NEWS
WhichLLM ranks local AI models by hardware performance
TX_846498AI

WhichLLM ranks local AI models by hardware performance

The WhichLLM GitHub tool benchmarks local large language models against specific hardware, helping developers pick the fastest, most efficient model for their system.

The WhichLLM GitHub repository, built by andyyyy64, benchmarks local large language models (LLMs) across real-world hardware setups, ranking them by speed, memory use, and compatibility [GitHub Repository]. Instead of relying on vendor claims, it tests models like Llama 3, Mistral, and Phi-3 on CPUs and GPUs from Intel, AMD, and NVIDIA, reporting which runs fastest on a Raspberry Pi, a MacBook, or a mid-tier gaming rig.

The tool runs standardized inference tasks—text generation, prompt loading, token streaming—and logs latency, RAM draw, and CPU utilization. Results are sortable by device type, VRAM capacity, or model size, so a developer with 16GB RAM and no GPU can filter to see only viable models. One test showed Phi-3-mini outperforming Llama 3-8B on Intel Core i5 systems with 32GB RAM, despite being smaller, due to optimized quantization [GitHub Repository].

Hardware directly shapes what models work in practice. A model that runs at 20 tokens/second on a high-end GPU may stutter below 1 token/second on integrated graphics. WhichLLM exposes these gaps, letting users avoid over-provisioning or under-delivering.

Engineers shipping AI features need models that meet latency and resource constraints. WhichLLM replaces guesswork with data, showing exactly which model runs reliably on consumer hardware.

As local AI adoption grows—from edge devices to offline apps—tools that map models to real hardware will become essential. WhichLLM doesn’t rank models by size or hype. It answers a simpler question: which one actually works on your machine?

operator_channel
[ comments_offline · provider_not_configured ]
transmission_log

Subscribe to the broadcast.

Daily digest of the day's most important tech news. No fluff. Engineering signal only.

// delivered via substack · double-opt-in confirmation