Specification
Version: 0.1.0-draft
Status: Draft
Scope: a minimal standard for machine-addressable canonical text references.
1. Purpose
Section titled “1. Purpose”TextRefs defines a minimal registry standard for stable, machine-addressable references to texts.
A conforming TextRefs registry MUST provide persistent identifiers for canonical references and MUST describe the citation systems by which those references are formed. It MAY record dereferenceable locations for those references and curated mappings to external identifiers or other references.
The standard is deliberately small. Its centre is a single idea: a reference is an abstract identity, separate from any location, edition, or translation where the referenced text can be read.
2. Conformance
Section titled “2. Conformance”A dataset conforms to the TextRefs Standard if it satisfies all of the following:
- It represents registry data using the object types defined in this standard.
- Every registry object includes the required fields for its object type.
- Every
Work.keyandCitationSystem.keyis a flat, stable key that occupies one URI path segment. - Every
CanonicalReferencepoints to one knownWorkand one knownCitationSystem. - Every
CanonicalReference.locatorvalidates syntactically against the referencedCitationSystemand semantically by being a registered reference point for the referencedWork. - Every
CitationSystemdeclares valid and invalid examples for automated tests. - Every dereferenceable location is represented as an entry in the
resolver_targetsarray of itsCanonicalReference, and every external identifier or cross-reference equivalence through aMappingAssertion. - Every registry object includes administrative metadata.
- Registry records contain identifiers, metadata, mappings, provenance, and resolver targets rather than primary text content.
3. Normative language
Section titled “3. Normative language”The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in BCP 14, RFC 2119, and RFC 8174 when, and only when, they appear in all capitals.
4. Identity versus location
Section titled “4. Identity versus location”TextRefs separates identity from location.
- Identity is abstract and language-independent.
Work,CitationSystem, andCanonicalReferenceanswer the question “which passage”: for example the New Testament, book-chapter-verse, John.3.16. There is exactly one such identity, regardless of how many editions, translations, or websites carry it. - Location and equivalence answer “where can I read it” and “what else is this the same as”. The
resolver_targetsarray embedded in eachCanonicalReferencelists places where the reference can be read (specific translations, editions, or providers).MappingAssertionrecords that aWorkis equivalent to an external identifier or to anotherWork.
A reference such as John.3.16 is the same identity whether read in Greek, the King James Version, or the Lutherbibel. The translation is a property of the location, never of the identity. This is what lets the model scale to works with many editions and translations (see §13).
TextRefs registry records store identifiers, metadata, mappings, provenance, and resolver targets. This keeps the registry legally reusable and stable across editions. A conforming record MUST NOT include full text, critical apparatus, commentary, translation text, or copyrighted edition content.
5. Core object types
Section titled “5. Core object types”A conforming registry MUST support these object types. Each top-level object MUST carry a type field matching one of them.
| Type | Layer | Purpose |
|---|---|---|
Work | identity | An abstract textual work. |
CitationSystem | identity | A notation that fragments works into locators. |
CanonicalReference | identity + location | One abstract reference point in a work, with embedded resolver targets. |
MappingAssertion | equivalence | A curated equivalence between a Work and an external identifier. |
Dereferenceable locations are not a separate object type. They are recorded as entries in the resolver_targets array embedded in each CanonicalReference (see §9). This keeps language-tagged locations co-located with the reference they describe, and means a work with N translations adds N array entries — not N standalone records.
Every object additionally carries the shared administrative metadata of §12 (omitted from the diagram for clarity).
classDiagram
class Work {
+URI id
+string key
+string preferred_label
}
class CitationSystem {
+URI id
+string key
+string preferred_label
+string locator_regex
+string normalization_version
}
class CanonicalReference {
+URI id
+string work_key
+string citation_system_key
+string locator
+string normalization_version
+ResolverTargetEntry[] resolver_targets
}
class ResolverTargetEntry {
+IRI url
+string language
+string edition
+string provider
+enum access
+string license
}
class MappingAssertion {
+URI id
+URI subject
+enum relation
+string source
}
CanonicalReference --> "1" Work : work_key
CanonicalReference --> "1" CitationSystem : citation_system_key
CanonicalReference *-- "0..*" ResolverTargetEntry : resolver_targets
MappingAssertion --> "1" Work : subject
MappingAssertion.subject MUST be a Work IRI. Per-passage external identifiers (e.g. the CTS URN of a single verse) are derived from work-level mappings and the reference locator at resolve time, not stored as separate assertions (see §10).
6. Work
Section titled “6. Work”A Work represents an abstract textual work, independent of editions, translations, manuscripts, files, websites, or resolver targets.
Only canonical texts with an established reference system SHOULD be accepted as Work records. The existence of an author, title, edition, file, or web page is not by itself sufficient.
A Work.key is a single flat registry key used to identify the abstract work in references and deterministic UUID seeds. Choose a stable, human-readable key such as plato.respublica or new-testament, and treat the whole string as the identifier. Rich bibliographic and authority data belongs in external systems and is connected to TextRefs records through MappingAssertions.
{ "id": "https://textrefs.org/id/work/plato.respublica", "key": "plato.respublica", "type": "Work", "preferred_label": "Republic (Plato)", "status": "candidate", "created": "2026-05-31", "modified": "2026-05-31"}Required: id, key, type (Work), preferred_label, status, plus administrative metadata (§12). The id MUST be a persistent TextRefs HTTP URI of the form https://textrefs.org/id/work/{key}, where {key} is one flat key and occupies exactly one URI path segment. The key MUST be stable and suitable for deterministic identity generation.
External identifiers for a Work (e.g. Wikidata Q-ID, DOI, VIAF) are recorded as MappingAssertions whose subject is the Work. They are not fields on the Work itself.
7. CitationSystem
Section titled “7. CitationSystem”A CitationSystem defines the notation and validation rules used to identify locations within one or more works. It is independent of any edition, provider, resolver service, or software implementation. Different versification or pagination traditions are different citation systems.
A CitationSystem.key is a single flat registry key for a locator notation and its validation rules. Choose a stable, human-readable key such as bekker, stephanus, or bible-book-chapter-verse. The key is used by canonical references through citation_system_key, so changing the key changes identity.
{ "id": "https://textrefs.org/id/system/bible-book-chapter-verse", "key": "bible-book-chapter-verse", "type": "CitationSystem", "preferred_label": "Bible book-chapter-verse (OSIS-style)", "normalization_version": "1.0.0", "locator_regex": "^(?<book>[A-Za-z][A-Za-z0-9_]*)\\.(?<chapter>[1-9][0-9]*)\\.(?<verse>[1-9][0-9]*)$", "examples": { "valid": ["Genesis.1.1", "Psalms.23.1", "Matthew.5.3"], "invalid": ["Genesis.0.1", "Genesis.1", "1.1.1", "Genesis 1:1"] }, "status": "candidate", "created": "2026-05-31", "modified": "2026-05-31"}Required: id, key, type (CitationSystem), preferred_label, normalization_version, locator_regex, examples.valid, examples.invalid, plus administrative metadata. The id MUST be a persistent TextRefs HTTP URI of the form https://textrefs.org/id/system/{key}, where {key} is one flat key and occupies exactly one URI path segment.
locator_regexMUST be a valid ECMAScript regular expression.locator_regexprovides machine-checkable pre-validation for locator shape only; it need not fully describe citation systems whose valid references cannot be expressed completely as a regular language.- Citation systems SHOULD use an anchored
locator_regexwhen the pattern is intended to describe the full locator string. - Regex success does not by itself prove that a reference point exists in a work.
normalization_versionMUST use semantic versioning.examples.validMUST all matchlocator_regex;examples.invalidMUST all fail it.- Unicode handling for keys and locators MUST follow Identifier syntax.
- A pull request that adds or changes a citation system MUST include the profile, valid examples, and invalid examples. See Citation-system profiles.
- A
CanonicalReferencelinks to its citation system throughcitation_system_key. JSON-LD serializations MAY additionally expose that relation withskos:inScheme.
8. CanonicalReference
Section titled “8. CanonicalReference”A CanonicalReference represents one atomized, language-independent reference point, identified by combining a work, a citation system, a normalized locator, and a normalization version. It also carries the set of dereferenceable external locations for that reference as an embedded resolver_targets array (see §9).
{ "id": "https://textrefs.org/id/ref/{uuid}", "type": "CanonicalReference", "work_key": "new-testament", "citation_system_key": "bible-book-chapter-verse", "locator": "John.3.16", "normalization_version": "1.0.0", "resolver_targets": [ { "url": "https://www.stepbible.org/?q=version=SBLG|reference=John.3.16", "language": "grc", "edition": "SBL Greek New Testament", "provider": "STEP Bible", "access": "open", "license": "CC-BY-4.0" } ], "status": "candidate", "created": "2026-05-31", "modified": "2026-05-31"}Required: id, type (CanonicalReference), work_key, citation_system_key, locator, normalization_version, resolver_targets (MAY be empty), plus administrative metadata.
work_keyMUST reference a knownWork;citation_system_keyMUST reference a knownCitationSystem.work_keyandcitation_system_keyMUST be treated as opaque flat keys. Implementations MUST NOT infer author, corpus, title, hierarchy, or resolver behaviour by splitting either key.locatorMUST match the system’slocator_regex; additional profile-specific validation MAY be required for systems that are not fully regex-checkable.- An accepted
CanonicalReferenceMUST represent an attested reference point for the referencedWorkunder the referencedCitationSystem. normalization_versionis part of the reference’s identity and is fixed when the reference is minted; it records the normalization in force at that time and need not equal the citation system’s currentnormalization_version. Its correctness is verified by the deterministic identifier (see §14 and Identifier syntax).- The
idMUST be generated deterministically per Identifier syntax; its UUID component is the deterministic seed output. resolver_targetsMUST validate per §9.
9. Embedded resolver targets
Section titled “9. Embedded resolver targets”resolver_targets is the array on each CanonicalReference that records dereferenceable external locations where the reference can be read — typically specific translations, editions, or providers. Each entry is a plain object; it has no independent id or type of its own, because its identity is the parent reference plus its position in the array.
{ "url": "https://www.biblegateway.com/passage/?search=John%203%3A16&version=KJV", "language": "en", "edition": "King James Version", "provider": "Bible Gateway", "access": "open", "license": "CC0-1.0", "license_url": null, "last_checked": "2026-01-01"}Required per entry: url, access.
urlMUST be a dereferenceable external IRI (RFC 3987).languageMUST be present when the entry is language-specific (e.g. a translation), as a BCP 47 language tag (RFC 5646). Tags MUST include an ISO 15924 script subtag when the entry uses a non-default script for the language (e.g.grc-Grek,hbo-Hebr,grc-Latn).editionSHOULD name the specific edition or version when known.accessMUST be one ofopen,paywalled,restricted,unknown.licenseSHOULD be a current SPDX license identifier (e.g.CC0-1.0,CC-BY-4.0) when the licence of the target resource is known. For licences not in the SPDX list, omitlicenseand use the optionallicense_urlto point at the licence text.- Values implying permission to host copyrighted full text (e.g. a
licenseofproprietaryaccompanied by hosted text) are forbidden; the no-text rule in §2 governs. - A
CanonicalReferencewhoseresolver_targetsis an empty array remains a valid identity record; adding or removing an entry MUST NOT change the parent reference’sid. - Tombstoning a single bad URL is done by removing the entry; tombstoning the whole reference uses the parent
statusfield. There is no independent status on individual entries.
10. MappingAssertion
Section titled “10. MappingAssertion”A MappingAssertion records a curated equivalence claim between a TextRefs Work and an external identifier (CTS URN, Wikidata Q-ID, DOI, ARK, …) or another TextRefs Work. There is no separate object type for external identifiers; they are always expressed as mapping targets.
{ "id": "https://textrefs.org/id/mapping/{uuid}", "type": "MappingAssertion", "subject": "https://textrefs.org/id/work/new-testament", "relation": "exactMatch", "target": { "target_kind": "wikidata", "identifier": "https://www.wikidata.org/entity/Q18813" }, "source": "manual-curation", "status": "candidate", "created": "2026-05-31", "modified": "2026-05-31"}Required: id, type (MappingAssertion), subject, relation, target, source, plus administrative metadata.
subjectMUST be aWorkIRI of the formhttps://textrefs.org/id/work/{work_key}. Per-passage external identifiers (e.g. the CTS URN of a single verse) are derived from work-level mappings combined with the reference locator at resolve time; they MUST NOT be stored as separateMappingAssertionrecords.target.identifierMUST be an IRI (RFC 3987) that identifies a textual resource: a work, edition, manuscript, citation system, or another TextRefsWork.target.target_kindis OPTIONAL and is a human-readable scheme hint (e.g."cts","doi","wikidata","textrefs"). Validators MUST NOT key behaviour off it. The presence or absence oftarget_kindcarries no normative weight; the IRI inidentifieris authoritative. See Appendix B for non-normative examples.relationMUST be one of the SKOS-compatible valuesexactMatchorcloseMatch. UseexactMatchonly when the mapped resource identifies the same work with sufficient precision; if there is any uncertainty about edition, coverage, or work boundaries, usecloseMatch.sourcedocuments the basis for the assertion. A structured W3C PROV-O mapping is reserved for a future version.
11. Identifier policy
Section titled “11. Identifier policy”TextRefs identifiers MUST be persistent HTTP URIs (RFC 3986) or IRIs (RFC 3987), independent of external URLs, resolver targets, edition identifiers, provider-specific identifiers, and website structures. The deterministic UUID seed remains ASCII-only; see Identifier syntax.
Work identifiers MUST use https://textrefs.org/id/work/{key} and CitationSystem identifiers MUST use https://textrefs.org/id/system/{key}. In both cases {key} is the complete flat key and MUST NOT contain additional path segments. For example, https://textrefs.org/id/work/plato.respublica is valid; https://textrefs.org/id/work/plato/respublica is not.
A CanonicalReference identifier MUST be generated deterministically. The identity seed MUST include work_key, citation_system_key, locator, and normalization_version, in that order (see Identifier syntax).
A MappingAssertion identifier MUST be generated deterministically from subject, relation, and target.identifier, in that order, using the mapping namespace (see Identifier syntax). It MUST remain UUID-based and MUST NOT be derived from provider URLs, corpus paths, or resolver structures. Resolver-target entries do not have their own identifiers.
An implementation MUST NOT silently change the identity-defining fields of an existing CanonicalReference. Because those fields seed the deterministic identifier, any change produces a new CanonicalReference with a new identifier. The prior reference MUST be retained as a tombstone (status deprecated or withdrawn, §12) and SHOULD be linked to its replacement through an exactMatch MappingAssertion (§10).
A conforming registry SHOULD publish each /id/{type}/{key} IRI at two static URLs: the canonical URL itself (HTML for browsers) and a sibling with a .json extension carrying the JSON-LD payload. The HTML representation SHOULD advertise the JSON-LD sibling via <link rel="alternate" type="application/json" href="…json"> in the document head. Accept-header content negotiation is not required.
12. Administrative metadata
Section titled “12. Administrative metadata”Every registry object MUST include:
{ "status": "active", "created": "2026-01-01", "modified": "2026-01-01"}createdandmodifiedMUST be ISO 8601 calendar dates inYYYY-MM-DDform.statusMUST be one of:candidate— proposed but not yet accepted as stable.active— accepted and recommended for use.deprecated— retained but no longer recommended.withdrawn— removed from active use because it was erroneous or has been superseded. If a successor exists, it is linked by anexactMatchMappingAssertion; see Versioning for tombstones.blocked— retained as a visible tombstone because of a rights, trust, or policy dispute.
Deprecated, withdrawn, and blocked records SHOULD remain visible unless removal is required for legal, privacy, or safety reasons.
13. Worked example: a multi-translation work
Section titled “13. Worked example: a multi-translation work”This is the case that motivates separating identity from location. The New Testament exists in many editions and translations, yet John.3.16 is one reference in the OSIS-style book-chapter-verse system.
One identity — a single Work, CitationSystem, and CanonicalReference. The reference embeds all language-tagged locations as resolver_targets:
{ "work": { "key": "new-testament", "type": "Work", "preferred_label": "New Testament (SBLGNT)" }, "citation_system": { "key": "bible-book-chapter-verse", "type": "CitationSystem", "locator_regex": "^(?<book>[A-Za-z][A-Za-z0-9_]*)\\.(?<chapter>[1-9][0-9]*)\\.(?<verse>[1-9][0-9]*)$" }, "canonical_reference": { "type": "CanonicalReference", "work_key": "new-testament", "citation_system_key": "bible-book-chapter-verse", "locator": "John.3.16", "resolver_targets": [ { "url": "https://www.stepbible.org/?q=version=SBLG|reference=John.3.16", "language": "grc", "edition": "SBL Greek New Testament", "provider": "STEP Bible", "access": "open", "license": "CC-BY-4.0" } ] }}Adding another edition or translation appends one entry to resolver_targets. The reference identity — its UUID, its work, its citation system, its locator — does not change.
Divergent versification is the one case that does create separate references. Where traditions number verses differently (e.g. the Psalms in the Masoretic text versus the Vulgate/Septuagint), each tradition is a distinct CitationSystem, its references are distinct CanonicalReferences, and the equivalence between them is recorded as a closeMatch MappingAssertion — not by collapsing them into one identity.
14. Validation requirements
Section titled “14. Validation requirements”A conforming validator MUST check:
- required fields for each object type and for each
resolver_targetsentry; - object
typevalues and TextRefs URI patterns, includingWorkandCitationSystemIDs whose keys occupy exactly one path segment; - flat-key syntax and uniqueness for
Work.keyandCitationSystem.key; - administrative metadata and
statusvalues; - citation-system
locator_regexsyntax, and its valid/invalid examples; - canonical-reference locator syntax (the
normalization_versionis the value fixed at minting, verified by the deterministic identifier in item 8, not matched against the system’s current version); - canonical-reference semantic validity: accepted records must be registered, attested reference points for their
WorkandCitationSystem; - deterministic-identifier correctness for canonical references and mapping assertions;
- UUID-based identifier shape for
CanonicalReferenceandMappingAssertionrecords; resolver_targetsentries:accessvalues, BCP 47 syntax oflanguageand its presence for language-specific entries, and SPDX syntax oflicensewhen present;- mapping
relationvalues and the Work-IRI shape ofMappingAssertion.subject; - absence of forbidden full-text/apparatus/commentary content.
A validator SHOULD report errors in a machine-readable format, and SHOULD distinguish syntactically valid, registered, mapped, and resolvable references. An input locator that matches locator_regex but has no corresponding registered CanonicalReference is syntactically valid but not a valid TextRefs reference.
A normative JSON Schema 2020-12 document, generated from the canonical Zod schemas, is published at https://textrefs.org/schemas/v1/textrefs.schema.json. The Zod schemas are the implementation source of truth; the JSON Schema is the published machine-readable contract.
15. Extensions
Section titled “15. Extensions”Implementations MAY define extensions, but extensions MUST NOT change the meaning of standard fields and MUST NOT make non-standard fields required for conformance. Content-related extensions MUST be defined separately from this standard.
16. Normative references
Section titled “16. Normative references”This standard relies on the following external standards. Each is normative wherever it is cited above.
| Topic | Standard |
|---|---|
| Normative keywords | BCP 14 / RFC 2119 / RFC 8174 |
| Language tags | BCP 47 / RFC 5646 |
| Script subtags | ISO 15924 |
| Dates | ISO 8601 |
| URIs | RFC 3986 |
| IRIs | RFC 3987 |
| UUIDs | RFC 4122 |
| Unicode normalization (NFC) | Unicode Standard Annex #15 |
| Regular expression dialect | ECMA-262 §22.2 |
| Versioning | SemVer 2.0.0 |
| Linked-data serialization | JSON-LD 1.1 |
| Concepts and mapping relations | SKOS |
| Dates, provenance, language, licence | Dublin Core Terms |
| URL, provider, edition, work type | schema.org |
| Licence identifiers | SPDX License List |
| Machine-readable schema | JSON Schema 2020-12 |
Appendix A. Conformance boundary
Section titled “Appendix A. Conformance boundary”This standard defines the minimum requirements for a TextRefs registry. Applications, resolvers, editorial tools, APIs, and visualizations may be built on top of it; they conform only insofar as their registry records satisfy this standard.
Build on the core registry by keeping these concerns in application, extension, or resolver layers:
- full-text hosting, edition/manuscript modelling, translation hosting, textual apparatus, commentary, thematic annotation;
- authority-file or catalogue modelling for agents, organisations, subjects, genres, or corpora;
- citation-style rendering, recommendation systems, legal rights clearance for external content.
Appendix B. Well-known external identifier schemes (informative)
Section titled “Appendix B. Well-known external identifier schemes (informative)”The following identifier schemes commonly satisfy §10’s “textual resource” rule and are useful values for MappingAssertion.target.identifier. Treat this table as implementation guidance: the authoritative rule is still whether the IRI identifies a textual resource.
| Scheme | target_kind hint | Example identifier |
|---|---|---|
| TextRefs | textrefs | https://textrefs.org/id/ref/988e0b39-… |
| CTS URN | cts | urn:cts:greekLit:tlg0031.tlg004:3.16 |
| DTS | dts | https://dts.example/api/collection?id=urn:cts:… |
| DOI | doi | https://doi.org/10.5281/zenodo.7702622 |
| ARK | ark | https://n2t.net/ark:/12148/btv1b8451636f |
| Handle | handle | https://hdl.handle.net/1887/4531 |
| PURL | purl | https://purl.org/dc/terms/ |
| URN:NBN | urn-nbn | urn:nbn:de:bvb:12-bsb00012345-2 |
| Wikidata | wikidata | https://www.wikidata.org/entity/Q42 |
TextRefs keeps mappings focused on textual resources. Identifiers of agents, organisations, instruments, or non-textual datasets (e.g. ROR, ORCID, ISNI) belong in external authority systems reached through mapped textual resources, not in MappingAssertion.target.
A passage-level external identifier (e.g. the CTS URN of a single verse) is derived at resolve time from the work-level mapping plus the reference locator; it is not stored as a separate MappingAssertion. For example, a Work mapping new-testament → urn:cts:greekLit:tlg0031.tlg004 plus the reference locator John.3.16 can yield a derived passage URN for that verse. Source data carries the work-level mapping plus a locator template; the registry does not store one mapping per passage.