Llama 4 Scout, Meta's smaller release in the 4-series, scored 89.2% on HumanEval — the highest of any open-weight model — and came within two points of GPT-4 on MBPP.

Notably it lags the closed flagships on real-world SWE-bench, where post-training and tool integration dominate.