2026-06-05

Measuring TTRPG Complexity with Language Models

The first complexity matrix came back and said 3D6 Eas — a game designed for accessible play — was roughly as complex as OSRIC. OSRIC is a 412-page AD&D retroclone that takes an experienced GM to run. Something was wrong.

The number was 4.8 on a 1–10 index. OSRIC scored 7.3. Sword of Cepheus, the primary design target, scored 4.0. We were closer to OSRIC than to our own benchmark. And the player combat turn was clocking in at 14 steps versus 5 for Dungeon World and ICRPG.

The problem turned out to be the measurement, not the game. Getting to the right measurement required working through two layers of methodological error, and the process of doing that with a language model taught me something useful about how to use them well.


the project, briefly

3D6 Eas is a low-fantasy medieval TTRPG set in a world called Eas (pronounced Ey-aas). 11th century tech level. Core mechanic is 3d6 + modifiers vs. a Target Number; success margin becomes the Effect, which determines outcome quality. Careers are a lifepath system in the Cepheus Engine lineage. Skills are a GURPS-derived cascade. ORC licensed.

Design goals: complete enough to run a campaign, accessible enough to pick up without a tutorial session, under 180 pages total. The gap it’s trying to occupy: Cepheus-level content depth without the GM prep overhead.

To understand where the design stood, i asked Claude to build a complexity matrix benchmarked against 8 games — Dungeon World, ICRPG, Sword of Cepheus, OSRIC, Fantastic Heroes & Witchery, Qin: The Warring States, Cepheus Modern, and Modern War. DW and ICRPG are the award-winner targets (DW won the 2012 Golden Geek RPG of the Year and the 2013 ENnie for Best Rules; ICRPG has 10,000+ playtesters and runs Creative Commons). SoC at 140 pages is the completeness benchmark. OSRIC is the “this is too complex” warning sign.


the first error: measuring documentation, not complexity

The combat turn table had our game at 14 steps. Step 1: “Conditions (GM describes).” Step 3: “GM sets TN.”

Every TTRPG requires the GM to describe the scene before players act. ICRPG literally has the GM write TARGET on an index card at the start of every scene — thats more explicit than what 3D6 does. But ICRPG’s combat turn counted 5 steps. Ours counted 14.

The matrix was penalizing us for naming something every other game does but doesnt document. Explicit procedure description was being counted as added complexity. Thats not a complexity measurement — thats a measurement of how thorough your documentation is.

Same issue with ability scores. 3D6 has 9: STR, DEX, CON, INT, WIS, CHA, SOC, LUCK, MAGERY. The matrix flagged SOC, LUCK, and MAGERY as “exception cases” — three complexity penalties.

But Sword of Cepheus already has SOC as a core characteristic. Same concept, same name. LUCK functions identically to ICRPG’s Hero Coin: you spend a token to get a reroll or a boost. If those count as exceptions in our system they should count in the systems that already have them. They dont.

The matrix was adding complexity for named equivalents of things the benchmark games do without naming. The fix here is straightforward: dont count implicit universal steps, and apply like-for-like equivalents consistently. When you do, the player combat turn drops from 14 steps to 3–4. The scores that looked like exception cases turn out to be the same engine used three more times.


the second error: counting options as rules

The player knowledge count had us at 55+ rules — comparable to OSRIC. This one took longer to diagnose.

The useful frame here came from a design history i hadn’t fully connected to game complexity: the shift from AD&D 2e to D&D 3.0 is the story of spaghetti code becoming object-oriented design.

When WotC acquired TSR in 1997, they applied design principles from Magic: The Gathering’s architecture to tabletop RPGs. MTG’s Sixth Edition rules overhaul in 1999 introduced the Stack and — more relevantly — keyword abilities and creature subtypes as modular tags. Instead of paragraph-long card text for Flying or Trample, you get one keyword that references a central rule. The card is a template instance. The rule is the engine.

D&D 3.0 in 2000, designed by Monte Cook, Jonathan Tweet, and Skip Williams, brought that architecture to TTRPGs. Feats like [Cleave] and [Combat Reflexes] are keyword abilities. Spell descriptors like [Fire] and [Mind-Affecting] are tags — a monster with “Immunity to [Mind-Affecting]” covers all past and future spells with that descriptor without needing a list. The Undead creature type is a template that grants specific properties: d12 hit die, no Constitution score, immunity to critical hits and poison. You assign the tag; the central rule handles everything it implies.

D&D 3e looks more complex than AD&D 2e because it has more feats, more subtypes, more descriptors. But each individual feat costs the player nothing new to learn, because they all plug into the same underlying structure. The options are content. The rules are engines.

The complexity matrix was treating options as rules. 28 skills counted as 28 facts. 13 careers counted as 13 rules. 9 ability scores counted as 9 subsystems. In reality:

  • Skill(Specialization) is 1 cascade engine with N tags. You learn the engine once — Roll + Skill(Blacksmithing) vs TN — and every skill works identically regardless of whether the tag is “Blacksmithing” or “J-Drive Repair.” Adding a new specialization is adding content, not learning a new procedure.
  • Score → Modifier is 1 engine. SOC, LUCK, and MAGERY are three more parameters running through it. Not three more subsystems.
  • Each career entry is a template instance in the lifepath engine, not a new rule.

When you count only engines, 3D6 Eas has 11 rules. Sword of Cepheus has 10. The one-rule gap is mass combat — a strategic scale engine SoC skips and 3D6 includes because the setting needs it.


what the 5-pillar framework makes visible

The clearest way ive found to count rules consistently is the Core Game Loop framework. Every TTRPG is built on five pillars:

  1. Resolution — how the game answers “did i succeed and what happens next?” Binary vs. Graded matters here. Graded systems (DW’s 10+/7-9/6- bands, 3D6’s Effect categories) mean failures generate narrative consequences rather than dead ends. Thats a feature, not complexity.
  2. Attrition — the depleting resource bundle. 3D6 has two personal attrition engines: HP from wounds, FP from sustained effort (physical exhaustion and magical drain). But there’s a third engine operating at a different scale: Superiority. Superiority handles extended contests where the unit of comparison is resources, not individuals — sappers vs. wall-builders racing against the clock, a merchant-and-bard network spreading a rumor vs. an isolated faction with no reach. The underlying principle is the same in every case: the side with more aggregate resources (time, skill, capital, reach) has better odds. That one principle covers work contests, social campaigns, and operational logistics as template instances. Its not three separate rules — its one engine with three application domains. That’s what separates it from just counting heads (3 arguing vs. 2 arguing); Superiority makes resources the variable, not bodies.
  3. Scale/Scope — which zoom level the rules operate at. 3D6 has three: personal combat, operational travel, strategic mass conflict. SoC has two. That gap is the mass combat engine.
  4. Recovery — how tension releases and at what rate. This is the genre dial. Fast recovery (DW’s Make Camp = full heal) produces heroic pacing. Slow recovery (days of rest, medical care required) produces gritty adventure. 3D6 sits in the gritty-adventure zone, same quadrant as SoC. An optional Fast Play mode (full FP recovery between scenes) moves it to heroic for one-shots or pickup play — no engine changes, just a recovery rate adjustment.
  5. Decision Space — the quality of choices available. The distinction that matters is Orthogonal vs. Redundant. A sword that does 1d8 and a longsword that does 1d8+1 is a false choice — one is mathematically superior. A sword with a free parry vs. an axe that deals double damage to shields is orthogonal — they solve different problems. 3D6’s SOC/LUCK trade-off is orthogonal by design: high SOC gives legal privilege, wealth, and social deference; low LUCK is the cost (LUCK = 16 − SOC modifier). A noble PC and a street agent have genuinely different problem domains. Thats not complexity, thats meaningful differentiation.

The counting rule across all five pillars: rules are engines. Options are template instances within engines. Feats, ability scores, skills, careers, spells — these are options. Dont count them as rules.


what the full comparison reveals: the player-facing dimension

Once the corrected methodology was applied to all 9 books, four clusters emerged:

ClusterBooksScoreEnginesGM rolls?Tables
Award WinnersDW, ICRPG1.2–1.31DW: No / ICRPG: Yes0–4
Cepheus + 3D6CM, 3D6, SoC3.2–4.01Mixed23–33
SpecializedQin, MW, FHW5.0–5.53–5Yes24–37
ReferenceOSRIC7.37+Yes47

The score 3.5 places 3D6 in the right family. But the full comparison surfaced something the complexity score alone didnt capture: whether the GM rolls dice.

This matters more than table count. In Sword of Cepheus and Cepheus Modern, the GM rolls for every NPC action. With 4 NPCs in an encounter thats 24 steps of GM work per round. In DW and 3D6, the GM doesnt roll at all — the encounter has 2–3 steps for the GM regardless of how many enemies are on the table. ICRPG is partially player-facing: the GM rolls for monsters and hazards on the GM’s turn, but the single TARGET keeps the overhead lower than Cepheus-family games. Thats constant vs. linear scaling — the difference between a GM tracking narrative momentum and a GM doing modifier math under pressure for every entity every round.

3D6 was designed player-facing from the start. The complexity score, even the corrected one, gave 3D6 and SoC the same mark in the resolution dimension because they share the same core engine. The GM-facing column wasnt in the original measurement criteria. Adding it splits the Cepheus cluster: SoC and CM scale with encounter size; 3D6 doesnt.

The table counts place 3D6 in the Cepheus family: 24 tables vs. Cepheus Modern’s 23, SoC’s 33, OSRIC’s 47. But table count is another metric that needs unpacking before it means anything.

Most of those tables are prep tools, not runtime lookups. A random encounter table, a career skills list, an equipment price table — these are used before or between sessions, offline, at the GM’s leisure. They reduce prep burden; they dont slow the table down while players are waiting their turn. A GM with a random encounter table spends less time inventing content from scratch, not more time interrupting play to consult a chart.

The tables that actually add runtime complexity are a different category: THAC0 matrices, saving throw grids, combat action look-ups — tables the GM has to consult mid-round while players wait. OSRIC’s 47 tables include a lot of those. 3D6’s 24 tables are weighted toward the other kind: character creation, equipment, career paths, encounter seeds.

So the table count confirms content richness in the Cepheus family range. It doesnt confirm runtime complexity. By GM load — no rolls, constant 2–3 steps per round — 3D6 is closest to DW. The clusters made visible what the single-score matrix was averaging away, but even the cluster data needed one more pass before the table number meant what it appeared to mean.


what this means for using language models on design problems

The LLM produced a plausible matrix with real data from real books. The numbers were internally consistent. The benchmarks were accurate. The methodology was just wrong — and wrong in a way the model couldnt catch because it was never given the right measuring instrument.

Once i pushed back with the right question (“is naming a universal implicit step the same as adding complexity?”) and the right frame (umbrella mechanics vs template instances), the model corrected cleanly. The synthesis was good. The counting was good. What it couldnt do on its own was notice that the unit of measurement was broken.

This is the pattern i keep seeing. LLMs are excellent at “given a methodology, apply it consistently across N examples.” Unreliable at “determine whether this methodology is asking the right question.” That second judgment requires knowing what a table actually feels like when you’re running it — which rules cause a full stop at the table and which ones players absorb mid-session. Thats outside the text.

Right use pattern: bring the framework, let the model count. Bring the right question, let the model synthesize. The model can tell you your combat turn has 14 steps under one definition of “step.” You have to be the one who notices that 3 of those steps happen in every TTRPG and shouldnt count.


The corrected finding: 3D6 Eas has 11 rules. SoC has 10. The gap is one strategic-scale engine. Everything else — the skills, careers, heritages, spells, DR values — is content. 80 options across 11 engines.

The cluster data gives that number meaning. By complexity score (3.5), 3D6 is Cepheus family — complete content, manageable system. By GM load (no GM rolls, 2–3 steps per round constant regardless of encounter size), 3D6 sits closest to DW among all 9 books. ICRPG has GM rolls for monsters; every Cepheus-family book scales with entity count. The design that looked like “almost OSRIC” turned out to be “Cepheus depth with DW runtime.” That was always the intent. The measurement just had to catch up.

Whether 11 engines is the right number for a GM picking this up cold on a Saturday afternoon is still an open question. But its a question you can actually work with once you’re measuring the right thing.

Side-by-Side Comparison — All 8 Books + 3D6 Eas


1. At a Glance — Raw Stats


DWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Pages2~10014086293272427412~150
Year/style201220182020201920202005201420062026
Core die2d6d202d62d62d6Yin/Yang d10d20d203d6
GenreFantasyUniversalSword & SorceryModernMilitaryWuxiaFantasy+SciFiFantasyMedieval Fantasy
LicenseCCCCOGLOGLOGLPublisherOGLOGLORC

2. Resolution — Engines


DWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Resolution engines1222355+7+3
One mechanic for everything?
Outcome typeGraded 3-bandBinary+EffortGraded EffectGraded EffectGraded+modsYin/Yang levelsBinaryBinary/matrixGraded Effect
Failure drives story?✅ 6- GM movePartialPartialPartialPartialFail Effect
Roll type2d6+statd20+stat2d6+skill2d6+skill2d6+skill+modsd10 paird20+variabled20+variable3d6+mods

DW/ICRPG/SoC/CM/3D6 all use one mechanic for everything. MW/Qin/FHW/OSRIC split resolution across subsystems.


3. Player Turn — Steps

StepDWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Identify which action/move
Roll2d6+statd20+stat2d6+skill2d6+skill2d6+skill+modsYin/Yang d10d20d203d6+mods
Compare to threshold✓ TARGET✓ TARGET✓ TN✓ TN✓ TN+mods✓ Aspect✓ THAC0/AC✓ THAC0/ACTN
Calculate margin✓ Effect✓ Effect✓ Effect✓ Specific tableEffect
Determine category✓ 10+/7-9/6-Degrees
Roll damage✓ Effort
Apply DR/cover
Engine calls223345453-4

3D6 matches SoC at 3 engine calls. The Effect step is shared with SoC — it’s the one step DW/ICRPG eliminate.


4. GM Load — The Decisive Dimension

4A. Does the GM Roll?


DWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
GM rolls dice?NoYesYesYesYesYesYesYesNo
Entities per encounterN/AN/A2-62-54-122-63-63-82-6
Steps per entity22667-86-76-87-83
Steps for 4 entities22242428-4024-2824-3228-323

4B. Entity Card Complexity


DWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Stats per entity3-4 tags3-4 fields7-97-98-108-107-97-93 fields
Lookups per entity action002-32-33-52-33-43-40
Morale check?NoNoYesYesYesYesYesYesNo
Saving throws for entity?NoNoNoNoNoNoYesYesNo

This is the split that matters. DW and 3D6 give the GM 2-3 steps/round regardless of encounter size — GM never rolls. ICRPG has GM rolls for monsters/hazards on the GM’s turn (p.91: “The GM represents environmental hazards and monsters and rolls for those on her turn”). All other books scale GM steps linearly with entity count.


5. Player Knowledge

5A. What a Player Must Learn

KnowledgeDWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Engines to learn1244557+7+4-5
Ability scores666+36+365 Aspects666+3
Skills00252015200028
Special resourcesHPHPHP+StaminaHP+StaminaHP+WoundsHP+Chi+BreathHP+PsiHP+SpellsHP+FP
Character sheet fields10-12820+181522201825+
Play as you read?
Total engines1244557+7+4-5

5B. What a GM Must Know

KnowledgeDWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Player engines1244557+7+4-5
Adjudication rulesAgenda+Principles+MovesTARGET+LootTN table+reactionsTN tableModifier stackSocialSubsystemsAllTN+Attitudes+Hazards
Entity creation time10 sec30 sec5 min3 min10 min10 min5 min5 min30 sec
Special subsystemsNoneNoneSorcery+CorruptionPsi+CyberPanic+Injury4 magic systemsPsionics+VehiclesAlignment+PlanesMass combat+Magic+Surgery
Total GM engines~5~5~15~12~18~18~20~30+~18

6. Reference Tables

ChapterDWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Intro001000000
Core Mechanic101112223
Character Creation004333464
Skills001111001
Equipment015610+4886+
Combat014563645
Magic008203585
Monsters/NPCs013222460
Treasure014221480
Environment/Travel002135450
Total14332328+24374724

Books with <10 tables: DW (1), ICRPG (4).
Books with 10-30 tables: CM (23), Qin (24), 3D6 Eas (24), SoC (33).
Books with 30+ tables: FHW (37), OSRIC (47).

Note — table type matters more than table count. Prep tables (random encounters, equipment prices, career skills, treasure) reduce GM prep burden without adding runtime complexity — the GM consults them offline, not while players wait. Runtime tables (THAC0 matrices, saving throw grids, combat lookups) add complexity at the table. OSRIC’s 47 tables skew toward runtime lookups. 3D6’s 24 tables skew toward prep tools. Table count is a content-richness signal, not a complexity signal, unless the table type is specified.


7. Character Creation


DWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Steps548767556
Time5 min10 min30 min20 min20 min60+ min45 min30 min20-30 min
Solo?
Lifepath/career?NoNoNoNoNoNo
Playbook first?NoNoNoNoNoNo
Example?✅ fiction
Sheet fields10-12820+181522201825+

8. Time to First Play


DWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Character creation5 min10 min30 min20 min20 min60+ min45 min30 min20-30 min
Rules reading2 min15 min30 min15 min60 min45 min60 min60 min30-45 min
GM prep (first session)5 min5 min15 min10 min30 min30 min30 min30+ min15-30 min
Total to first play~12 min~30 min~75 min~45 min~110 min~135 min~135 min~120 min~65-105 min
Includes tutorial?No (play sheets)NoNoNo✅ fictionNoNo

9. Teaching & Presentation


DWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Teaching qualityExcellentExcellentHighHighMediumMediumLowMediumMedium
Worked examples114001010
Self-contained chapters?
Reader can play alone?❌ (missing chapters)
Cross-referencesNoneNoneMinimalMinimalModerateHeavyHeavyVery heavyMinimal
Book structurePlay sheetsCore+optionalLinearLinearDoctrine-heavySetting-firstCatalogCatalogLinear

10. All Dimensions — Summary Matrix

Dimension (1-10 scale)DWICRPGSoCCMMWQinFHWOSRIC3D6 Eas
Core engine simplicity1098865427
Player turn speed10108866658
GM load (lower is better)10102212219
Player knowledge (lower = less)10106754445
GM knowledge (lower = less)10105633323
Table burden (lower = less)1093434214
Chargen speed1095662456
Time to play1095732236
Teaching quality10108745234
Content depth2386989107
Rule efficiency (content vs. speed)237654326
Modularity587945537
GM tools6473937105
Setting depth1231610315

11. Book Clusters

Low Complexity ────────────────────────────────────────── High Complexity
DW (1.2) SoC (4.0) FHW (5.5)
ICRPG (1.3) CM (3.2) OSRIC (7.3)
3D6 Eas (3.5)
MW (5.2)
Qin (5.0)

Award Winners ──── Cepheus Family + 3D6 ──── Comprehensive/Heavy

Cluster 1 — Award Winners (1.2-1.3): DW, ICRPG

  • One engine. Zero-lookup play. Play as you read.
  • DW: GM never rolls (GM Moves are narrative, not dice). ICRPG: GM rolls for monsters/hazards on GM turn.
  • Sacrifice: Content depth. Minimal equipment, no skills, no careers, thin setting.

Cluster 2 — Cepheus Family + 3D6 (3.2-4.0): SoC, CM, 3D6 Eas

  • One engine. Effect math. GM rolls in SoC/CM but NOT in 3D6. Moderate tables.
  • Balance: Content depth WITH manageable system. Skills, equipment, careers exist but follow engine rules.

Cluster 3 — Specialized (5.0-5.5): Qin, MW, FHW

  • Multiple engines. GM rolls. Heavy modifiers. Career/class depth.
  • Sacrifice: Speed. Character creation 45-60+ min. GM load 24-40 steps/round.

Cluster 4 — Comprehensive Reference (7.3): OSRIC

  • 7+ engines. GM rolls everything. 47 tables. Reference-first structure.
  • Sacrifice: Teachability. Cannot learn from this book alone. GM must pre-read extensively.

12. Where Each Book Wins (Best in Class)

MetricBest BookWhy
Fastest first playDW (12 min)Play sheets. Read and play.
Easiest GM loadDW, 3D6GM never rolls. DW uses narrative moves; 3D6 is player-facing by design. ICRPG GM rolls for monsters/hazards.
Lowest table burdenDW (1 table)Everything fits on character sheet.
Most modularCM (86 pages)Use only chapters you need.
Best GM toolsOSRIC47 tables, random dungeon gen, monster gen.
Best setting depthQin70+ pages of social conventions, culture, history.
Best chargen speedDW (5 min)Pick playbook, assign stats, go.
Best teachingDW/ICRPGPlayable in 15 min. Every chapter is a procedure.
Most content depth per pageSoC (140 pages)Complete game in minimal pages.
Best compression rulesICRPGTARGET + Hearts + Effort die = everything.
Best player-facing designDW/3D6All outcomes from one roll. Symmetrical.
Highest orthogonal optionsSoC/3D6Careers, SOC/LUCK trade-off create real differences.

13. Where 3D6 Eas Sits Among Them

                     GM Never Rolls

DW ● │ ● 3D6 Eas

ICRPG ● ─ ─ ─ ─ ┤ (GM rolls monsters/hazards only)

├── SoC ● (GM rolls all NPCs)
│ CM ●

├── Qin ●
│ MW ●
│ FHW ●

└── OSRIC ●

GM Rolls for Everything

3D6 is player-facing — GM never rolls. It shares that property only with DW among the 9 books. ICRPG is partial: the GM rolls for monsters and environmental hazards on the GM’s turn (p.91), but the single TARGET keeps overhead lower than the Cepheus family. The books on the bottom half cannot fix their GM load without rebuilding their resolution system from scratch. 3D6 does not have that problem.

What 3D6 shares with the Cepheus family: Effect math, table count (~24), career depth, skill systems, equipment granularity.
What separates 3D6 from the Cepheus family: Player-facing (GM never rolls), FP (broader attrition), mass combat, SOC/LUCK orthogonal design.
What 3D6 shares with DW: GM never rolls, single core engine, graded outcomes, constant GM load regardless of encounter size.
What separates 3D6 from the award winners: Effect calculation, 24 tables vs 1-4, FP tracking, no worked examples, no fast play mode.

Leave a Reply