Last updated: · Author: Daniel Rozin
How We Compare Large Language Models
This page describes every column, data source, and editorial decision behind the LLM comparison guide. Column structure mirrors the Wikipedia “Comparison of large language models” article so editors can cross-reference our data directly.
1. Column definitions (Wikipedia-parity)
| Column | Definition | Primary source type |
|---|---|---|
| Model name | Official model identifier as used in vendor API | Vendor API model list or release announcement |
| Vendor | Organization that trained and maintains the model | Vendor corporate page |
| Parameters | Total parameter count (MoE: total / active); 'Undisclosed' when vendor has not published | Vendor technical report (arXiv) or model card; NOT leaked or estimated figures |
| Context window | Maximum input+output token window per inference call; definition varies by vendor — see note | Vendor API documentation (capability table) |
| Modality (input) | Input types: text, image, audio, video, code | Vendor API capabilities table or model card |
| Modality (output) | Output types: text, image, audio, code | Vendor API capabilities table or model card |
| License | End-user license; 'Open weights' if weights are downloadable | Vendor Terms of Service or GitHub license file |
| Knowledge cutoff | Nominal training data cutoff date published by vendor; 'Undisclosed' when not published | Vendor API documentation or model card |
Context window note: Some vendors report context as input-only tokens; others report combined input+output. We report the figure stated in the vendor's API documentation and flag the definition used in the table footnote.
2. Data sources
- Tier 1 (required): Vendor API documentation, model card, or official release blog post — cited with URL and access date.
- Tier 1 (required): arXiv technical report authored by the vendor research team — used for parameter counts and architecture details.
- Tier 2 (acceptable for benchmarks): LMSYS Chatbot Arena leaderboard (chat.lmsys.org) — public, community-run, cited with snapshot date. HuggingFace Open LLM Leaderboard for open-weight models.
- Disallowed: Vendor self-reported benchmark numbers without independent reproduction, Twitter/X announcements as sole source, leaked parameter estimates, AI-generated summaries.
- Wikipedia and any Wikipedia mirror/fork — never a cite-worthy source for a cell value. The
about.sameAsWikipedia link (schema §1) is an entity reference only, never a citation. This prevents circular sourcing (WP:CIRCULAR).
Undisclosed values: When a vendor (e.g., OpenAI for GPT-4 parameters, Anthropic for Claude 3 parameters) has not published a figure, the cell reads “Undisclosed” and cites the Tier 1 model card or documentation that makes the same non-disclosure. We do not substitute estimated or leaked numbers.
3. Recency policy
Context windows, knowledge cutoffs, and model versions are updated within 2 weeks of a vendor releasing a new stable model. The page's dateModified stamp reflects the last real-content edit. All time-sensitive cells carry an “as of [YYYY-MM]” note in the table footer.
4. Conflict-of-interest disclosure
A Versus B has no paid relationships with any AI vendor. No model vendor reviewed or approved this guide before publication. A Versus B does not license or resell any of the APIs in this table. The author (Daniel Rozin) holds no equity in any listed organization.
5. Correction policy
Corrections with a primary source may be submitted to contact@aversusb.net. We aim to respond within 48 hours and publish corrections with a visible correction notice and updated dateModified timestamp.
CC-BY-4.0 covers aversusb.net editorial text and table layout; vendor names, logos, and marks remain the property of their owners.
← Back to LLM comparisons