Skip to main content

LLM Comparison 2026

An independent, citation-backed comparison of 12 leading large language models — covering parameter count, context window, input/output modalities, license, and knowledge cutoff. Column structure mirrors Wikipedia's Comparison of large language models for easy cross-reference.

By Daniel Rozin, Founder & Editor-in-Chief·Last updated: ·Methodology

Key takeaways (as of May 2026)

  • Largest context windows: GPT-4.1 and Gemini 2.0 Flash / 2.5 Pro — all at 1M tokens.[1][5]
  • Most open: DeepSeek-V3 (MIT, 671B MoE) and Llama 3.3 70B (community license, 70.6B) — both weights downloadable.[10][7]
  • Multimodal leaders: GPT-4o (text/image/audio in+out) and Gemini 2.0 Flash (text/image/audio/video in, text/image out).[1][5]
  • Parameters undisclosed: OpenAI (GPT-4 series) and Anthropic (Claude 3 series) have not published parameter counts. Cells read “Undisclosed” — not estimated figures.

Comparison table

All data as of May 2026.[*] “Undisclosed” means the vendor has not published the value — not that it is unknown to us. Context window definitions vary; see methodology for details.

ModelVendorParametersContext windowInput modalitiesOutput modalitiesLicenseKnowledge cutoff
GPT-4oOpenAIUndisclosed[1]128KText, Image, AudioText, AudioProprietaryOct 2023
GPT-4.1OpenAIUndisclosed[2]1MText, ImageTextProprietaryUndisclosed
Claude 3.5 SonnetAnthropicUndisclosed[3]200KText, ImageTextProprietaryApr 2024
Claude 3.7 SonnetAnthropicUndisclosed[4]200KText, ImageTextProprietaryUndisclosed
Gemini 2.0 FlashGoogle DeepMindUndisclosed[5]1MText, Image, Audio, VideoText, ImageProprietaryAug 2024
Gemini 2.5 ProGoogle DeepMindUndisclosed[6]1MText, Image, Audio, VideoTextProprietaryUndisclosed
Llama 3.3 70BMeta AI70.6B[7]128KTextTextLlama 3 Community License (open weights)Dec 2023
Mistral Large 2Mistral AIUndisclosed[8]128KTextTextProprietaryUndisclosed
Grok-3xAIUndisclosed[9]131KText, ImageTextProprietaryUndisclosed
DeepSeek-V3DeepSeek671B total / 37B active (MoE)[10]128KTextTextMIT (open weights)Undisclosed
Qwen2.5-72BAlibaba Cloud72.7B[11]128KTextTextQwen License (Apache 2.0 base with usage restrictions)Undisclosed
Command R+CohereUndisclosed[12]128KTextTextProprietaryUndisclosed

How we compile this table

Our full methodology covers column definitions, source tiers, the “Undisclosed” policy, recency policy, COI disclosure, and our correction process. Column structure intentionally mirrors Wikipedia's LLM comparison article for editor cross-reference.

Read the methodology →

Sources

  1. GPT-4o (OpenAI). GPT-4o system card; OpenAI models API reference Accessed May 2026.
  2. GPT-4.1 (OpenAI). OpenAI models API reference Accessed May 2026.
  3. Claude 3.5 Sonnet (Anthropic). Anthropic models list Accessed May 2026.
  4. Claude 3.7 Sonnet (Anthropic). Anthropic Claude 3.7 Sonnet announcement; Anthropic models list Accessed May 2026.
  5. Gemini 2.0 Flash (Google DeepMind). Google AI Gemini model page Accessed May 2026.
  6. Gemini 2.5 Pro (Google DeepMind). Google AI Gemini 2.5 Pro model page Accessed May 2026.
  7. Llama 3.3 70B (Meta AI). Meta Llama 3.3 announcement; Llama 3.3 70B HuggingFace model card; Llama 3 technical report (arXiv 2407.21783) Accessed May 2026.
  8. Mistral Large 2 (Mistral AI). Mistral AI models Accessed May 2026.
  9. Grok-3 (xAI). xAI Grok-3 announcement Accessed May 2026.
  10. DeepSeek-V3 (DeepSeek). DeepSeek-V3 technical report (arXiv 2412.19437); DeepSeek-V3 GitHub Accessed May 2026.
  11. Qwen2.5-72B (Alibaba Cloud). Qwen2.5 technical report (arXiv 2412.15115); Qwen2.5-72B HuggingFace model card Accessed May 2026.
  12. Command R+ (Cohere). Cohere Command R+ model page Accessed May 2026.
  13. * Data reflects information available from primary vendor sources as of May 2026. This is a fast-moving field; check the dateModified stamp above and consult vendor documentation for the latest specifications.
LLM Comparison 2026: GPT-4o vs Claude vs Gemini vs Llama | A Versus B | A Versus B