Identifier syntax
Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.
TextRefs canonical-reference identifiers use deterministic UUID v5 generation. The algorithm is intentionally strict so that independent implementations produce identical identifiers from the same input.
Namespace
Section titled “Namespace”The TextRefs reference namespace UUID is:
b1a3670e-2ac7-544c-a1b9-396e0dc193f7This namespace is derived from uuidv5(uuid.NAMESPACE_DNS, "textrefs.org/reference") and is frozen for v0.1.0-draft.
Seed sequence
Section titled “Seed sequence”The UUID seed string is the following four-field sequence, in this exact order:
work_keycitation_system_keylocatornormalization_versionSerialization rules:
- Encode the seed as UTF-8.
- Join the four values with a single line feed character,
U+000A. - Use each field exactly as normalized, with no leading or trailing whitespace.
- End the seed after
normalization_version, with no final trailing line feed. - Use the registry key fields themselves; labels, URIs, aliases, and external identifiers belong in metadata or mappings.
- Each field MUST already be normalized by its owning profile before UUID generation.
Flat key syntax
Section titled “Flat key syntax”work_key and citation_system_key are flat, opaque registry keys. Treat the whole string as the identifier when validating records, generating UUIDs, and constructing TextRefs URIs.
Keys MUST match this regular expression:
^[a-z0-9][a-z0-9._-]*$Keys therefore MUST start with a lowercase letter or digit and contain only ASCII lowercase letters, ASCII digits, ., _, and -.
The canonical Work URI is https://textrefs.org/id/work/{work_key}. The canonical CitationSystem URI is https://textrefs.org/id/system/{citation_system_key}. In both cases the key occupies exactly one URI path segment.
Unicode normalization
Section titled “Unicode normalization”Deterministic identifiers depend on byte-identical seed strings. Before validation and UUID generation:
work_keyandcitation_system_keyMUST follow the flat key syntax above.locatorMUST be normalized to Unicode NFC.locatorMUST NOT contain leading or trailing whitespace, control characters, or internal whitespace unless the citation-system profile explicitly allows it.- Implementations MUST NOT apply NFKC, case folding, digit folding, punctuation folding, transliteration, or script conversion unless the citation-system profile explicitly defines that rule.
- Profiles for mixed-script locators MUST state the allowed scripts and enforce them through
locator_regex. - Any change to locator normalization that can change a normalized locator MUST change the citation system’s
normalization_version.
The seed bytes used for UUID v5 generation are ASCII-restricted (keys) and NFC-normalized UTF-8 (locators). This is independent of whether downstream TextRefs identifiers are expressed as URIs (RFC 3986) or IRIs (RFC 3987).
Example
Section titled “Example”Input tuple:
work_key = plato.respublicacitation_system_key = stephanuslocator = 514anormalization_version = 1.0.0Seed string:
plato.respublicastephanus514a1.0.0Result:
c9e0b270-39de-503c-a231-33d8ae4503b4Canonical URI:
https://textrefs.org/id/ref/c9e0b270-39de-503c-a231-33d8ae4503b4MappingAssertion seed
Section titled “MappingAssertion seed”MappingAssertion identifiers are also deterministic UUID v5, derived from the assertion’s content so that recompiling the same source produces a byte-identical record.
The mapping namespace UUID is:
f16bb214-4241-549d-ad41-7b011f02befbThis namespace is derived from uuidv5(uuid.NAMESPACE_DNS, "textrefs.org/mapping") and is frozen for v0.1.0-draft.
The seed string is the following three-field sequence, joined with single line feed characters and no trailing newline:
subjectrelationtarget.identifiersubject MUST be the canonical Work IRI (https://textrefs.org/id/work/{work_key}). relation MUST be the literal string exactMatch or closeMatch. target.identifier MUST be used as supplied by the source record, after any IRI normalization the source profile already applies. target.target_kind is a non-normative hint and does NOT enter the seed.
The canonical URI is https://textrefs.org/id/mapping/{uuid}.
Immutability
Section titled “Immutability”Once a deterministic identifier is published, it is permanent. If a record is found to be wrong, it MUST be marked deprecated, withdrawn, or blocked; the original URI MUST remain dereferenceable as a tombstone.
See Specification §11 for the normative identifier policy.