How We Compare Large Language Models

A Versus B

Last updated: May 22, 2026 · Author: Daniel Rozin

This page describes every column, data source, and editorial decision behind the LLM comparison guide. Column structure mirrors the Wikipedia “Comparison of large language models” article so editors can cross-reference our data directly.

1. Column definitions (Wikipedia-parity)

Column	Definition	Primary source type
Model name	Official model identifier as used in vendor API	Vendor API model list or release announcement
Vendor	Organization that trained and maintains the model	Vendor corporate page
Parameters	Total parameter count (MoE: total / active); 'Undisclosed' when vendor has not published	Vendor technical report (arXiv) or model card; NOT leaked or estimated figures
Context window	Maximum input+output token window per inference call; definition varies by vendor — see note	Vendor API documentation (capability table)
Modality (input)	Input types: text, image, audio, video, code	Vendor API capabilities table or model card
Modality (output)	Output types: text, image, audio, code	Vendor API capabilities table or model card
License	End-user license; 'Open weights' if weights are downloadable	Vendor Terms of Service or GitHub license file
Knowledge cutoff	Nominal training data cutoff date published by vendor; 'Undisclosed' when not published	Vendor API documentation or model card

Context window note: Some vendors report context as input-only tokens; others report combined input+output. We report the figure stated in the vendor's API documentation and flag the definition used in the table footnote.

2. Data sources

Tier 1 (required): Vendor API documentation, model card, or official release blog post — cited with URL and access date.
Tier 1 (required): arXiv technical report authored by the vendor research team — used for parameter counts and architecture details.
Tier 2 (acceptable for benchmarks): LMSYS Chatbot Arena leaderboard (chat.lmsys.org) — public, community-run, cited with snapshot date. HuggingFace Open LLM Leaderboard for open-weight models.
Disallowed: Vendor self-reported benchmark numbers without independent reproduction, Twitter/X announcements as sole source, leaked parameter estimates, AI-generated summaries.
Wikipedia and any Wikipedia mirror/fork — never a cite-worthy source for a cell value. The about.sameAs Wikipedia link (schema §1) is an entity reference only, never a citation. This prevents circular sourcing (WP:CIRCULAR).

Undisclosed values: When a vendor (e.g., OpenAI for GPT-4 parameters, Anthropic for Claude 3 parameters) has not published a figure, the cell reads “Undisclosed” and cites the Tier 1 model card or documentation that makes the same non-disclosure. We do not substitute estimated or leaked numbers.

3. Recency policy

Context windows, knowledge cutoffs, and model versions are updated within 2 weeks of a vendor releasing a new stable model. The page's dateModified stamp reflects the last real-content edit. All time-sensitive cells carry an “as of [YYYY-MM]” note in the table footer.

4. Conflict-of-interest disclosure

A Versus B has no paid relationships with any AI vendor. No model vendor reviewed or approved this guide before publication. A Versus B does not license or resell any of the APIs in this table. The author (Daniel Rozin) holds no equity in any listed organization.

5. Correction policy

Corrections with a primary source may be submitted to contact@aversusb.net. We aim to respond within 48 hours and publish corrections with a visible correction notice and updated dateModified timestamp.

CC-BY-4.0 covers aversusb.net editorial text and table layout; vendor names, logos, and marks remain the property of their owners.

← Back to LLM comparisons