Summary of "Jonathan Blow on JSON and custom serialization"
High-level summary
- Core argument: parsing has two separate problems — (1) reading primitive values (numbers, strings) and (2) validating/mapping file data into the exact structure your program expects. JSON helps with (1) but not with (2). That missing validation forces dynamic checks at runtime, hurting performance and increasing error/uncertainty across the program.
- JSON is fine for quick prototyping, but a custom, well‑designed serialization format (text for development + binary shipping form) is superior for production: more readable, faster to load, and avoids runtime ambiguity.
Technical concepts and system design
Two-phase parsing
- Primitive reading
- Straightforward, especially for binary formats.
- Involves parsing raw numbers, strings, booleans, etc.
- Structural validation & placement
- Crucial step the loader must perform.
- Maps parsed primitives into concrete, typed program structures and validates them.
- JSON does not solve this; you must implement it to avoid runtime errors and checks.
Runtime costs of leaving data in generic JSON trees
- Slower access (random access into a generic tree).
- Requires runtime checks everywhere the code touches data.
- Encourages dropping or mishandling errors (because checks are scattered and ad-hoc).
Versioning
- Versioning is only difficult when forward compatibility is required.
- JSON provides no built-in version semantics or validation — you still need version-aware loading/validation logic.
- Their approach:
- Per-field version stamps (member version numbers) determine which fields to read or skip during load.
- Deprecated fields are retained in the loader to support older versions; when shipping, files are typically updated to the latest version and dev-churn entries removed.
- They do not support forward compatibility: newer-format files cannot be meaningfully read by older code.
Readability vs. human-editability
- Entity text format is human-readable and usually cleaner than raw JSON.
- Supports comments, concise lines, and markers.
- They use markers (a star) to indicate values that differ from defaults so default lines can be omitted.
- This makes files smaller and easier to reason about.
- Changing a default can automatically propagate to entities that did not override it.
Binary shipping format
- Many entities are packed together into a binary format for shipping.
- Binary format loads faster and uses hex floats to avoid floating-point divergence between platforms.
Hierarchy and entity design
- Prefer limited, controlled hierarchies; avoid deep arbitrary hierarchical data for internal entity systems.
- Field names in the text format are for human readability only:
- The loader skips names and relies on known field ordering derived from the version number.
Save system differences
- Level source data / entities are stored in the described entity format.
- Campaign/save state uses a different approach:
- Save diffs of entities and a large undo stack rather than serializing entire entities on each save (reduces save size).
Practical implications / recommendations
- Don’t rely on JSON to solve versioning or validation problems — design your serialization to include explicit version semantics and a validation/load step that produces concrete, typed program structures.
- Use a compact text format with explicit defaults and version stamps for development/readability.
- Ship a binary packed format for performance and determinism.
- If you don’t need forward compatibility:
- Keep versioned fields simple.
- Skip or keep deprecated fields as needed.
- Require consumers to update to the latest versions where feasible.
Main speaker / sources
- Jonathan Blow (primary speaker)
- Questions/interaction came from Twitch chat / an interviewer during the live session.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...