AI & ML Advanced By Samson Tanimawo, PhD Published Apr 7, 2026 6 min read

On-Device LLMs: The 7B Sweet Spot

By 2026, a 7B-parameter quantised LLM runs comfortably on flagship phones and competently on laptops. The sweet spot for ‘local AI that works.’

Why 7B specifically

Models below 3B noticeably under-perform on instruction following and reasoning. Models above 13B don’t fit in mobile memory or run at usable speed. 7B is the largest size class that fits in ~4GB at 4-bit quantisation and runs at 5-15 tok/s on flagship hardware.

It’s also the size where capability meets cost: a 7B model handles 70-80% of typical chat and tool-use tasks adequately. The gap to frontier shrinks every six months.

The model lineup

Runtime stacks

Where on-device wins

The use cases compound. Apple’s, Google’s, and Microsoft’s 2026 product strategies all assume on-device 7B-class models become baseline.