The Universal Hash Codec (UHC) is a formal scheme for encoding natural numbers as unique geometric objects, based on the Universal Object Reference (UOR) framework and the Prime Framework’s intrinsic number embedding. In UOR, each number is represented as a multi-vector in an algebraic fiber, embedding all its possible representations (its universal coordinate tuple) concurrently (4-operator.pdf). The UHC extends this idea by defining a geometric hash function that maps any number to a point on a high-dimensional manifold, using multiple metrics (Euclidean, hyperbolic, and elliptical) to shape the space. Crucially, this mapping is lossless – it can be inverted to recover the original number exactly, preserving referential invariance, base-independence, and intrinsic identity (core UOR properties).
This specification rigorously defines the UHC geometric digest format and its mathematics. We describe how to embed a number’s universal coordinates as a point in multi-dimensional space, formalize the structure of that space under different metrics, and ensure one-to-one mapping between numbers and digests. Pseudocode is provided for the encoding (number to digest) and decoding (digest back to number) processes. All aspects are presented with clear structure and mathematical rigor to eliminate ambiguity for implementors.
Scope and Terminology: We focus on natural numbers (including 0) and their canonical UOR embeddings. A universal coordinate tuple of a number refers to the collection of its digit expansions in every possible base (≥ 2) (4-operator.pdf). The term digest refers to the serialized output of the UHC hash function – a structured representation (here expressed in JSON) of the geometric point encoding the number. We use N for a natural number and N̂ (N-hat) for its embedded multi-vector form in the UOR fiber algebra. The manifold M is the reference geometric space (with metric g) where points lie; depending on context, M may be flat (Euclidean), negatively curved (hyperbolic), or positively curved (elliptical/spherical). We ensure all notation and steps are consistent with UOR’s foundations (4-operator.pdf).
Definition – Universal Coordinate Tuple: For each natural number (N), consider its representation in every integer base (b \ge 2). Write the base-$b$ expansion of
[ N ;=; a_{k_b}(b),b^{,k_b} + a_{k_b-1}(b),b^{,k_b-1} + \cdots + a_1(b),b + a_0(b), ]
with digits
[ E(N) ;=; \Big{ \big(a_0(b),,a_1(b),,a_2(b),,...,,a_{k_b}(b)\big)_b ;:; b = 2,3,4,\dots \Big},. ]
This tuple
Embedding as a Multi-Vector: The UOR/Prime framework provides an algebraic fiber
[ N̂ ;=; \sum_{b=2}^{B(N)} ;\sum_{i=0}^{k_b} a_i(b); e_{b,i},, ]
where
[ D(N) ;=; \sum_{b=2}^{N} (k_b + 1),, ]
where
Example: Suppose
- Base 2 expansion:
$42_{(10)} = 101010_{(2)}$ , digits$(0,1,0,1,0,1)_2$ . - Base 3:
$42 = 1120_{(3)}$ , digits$(0,2,1,1)_3$ . - Base 4:
$42 = 222_{(4)}$ , digits$(2,2,2)_4$ . - Base 5:
$42 = 132_{(5)}$ , digits$(2,3,1)_5$ . - Base 6:
$42 = 110_{(6)}$ , digits$(0,1,1)_6$ . - Base 7:
$42 = 60_{(7)}$ , digits$(0,6)_7$ . - Base 8:
$42 = 52_{(8)}$ , digits$(2,5)_8$ . - Base 9:
$42 = 46_{(9)}$ , digits$(6,4)_9$ . - Base 10:
$42 = 42_{(10)}$ , digits$(2,4)_{10}$ . - Base 11:
$42 = 39_{(11)}$ , digits$(9,3)_{11}$ . - ...
- Base 42:
$42 = 10_{(42)}$ , digits$(0,1)_{42}$ .
We would embed all these digit sequences into
Geometric Hash Function: We interpret the multi-vector
[ H: \mathbb{N} \to M \subset \mathbb{R}^D,\qquad H(N) = \mathbf{v}_N,, ]
where $\mathbf{v}N$ is the $D$-dimensional coordinate vector representing $N$’s multi-vector $N̂$. In simple terms, $H(N)$ takes a number $N$ and returns the point $\mathbf{v}N = (a_0(2),a_1(2),...,a{k_2}(2),;a_0(3),...,a{k_3}(3),;\dots,;a_0(N),a_1(N))$ in
Because
The target space for UHC digests is a multi-dimensional manifold
Structure: In the Euclidean version, the manifold
Coordinates: A number’s digest in Euclidean mode is simply the coordinate vector
Metric: The distance between two points
Normalization: No special normalization is needed; the coordinates are used as-is. The vector’s length
Inverse Projection: In Euclidean space, the “projection” of the multi-vector onto the manifold is the identity mapping. Therefore, inverse projection is trivial – the coordinates read off directly as the digit sequence. (There is no distortion or extra coordinate to remove.) To decode the number, one can isolate the segments of
Structure: For a hyperbolic geometry, we consider
[ H^D = {(x_0,x_1,\ldots,x_D) \in \mathbb{R}^{D+1} : x_0^2 - x_1^2 - \cdots - x_D^2 = 1,; x_0 > 0},. ]
This is a
Coordinates: Given the
[ \phi_H:;\mathbb{R}^D \to H^D,\qquad \phi_H(v_1,\ldots,v_D) = \Big(\sqrt{,1 + \sum_{i=1}^D v_i^2,};,; v_1,;v_2,;\ldots,;v_D\Big),. ]
In other words, we take $x_0 = \sqrt{1+| \mathbf{v}N |^2}$ and $x_i = v_i$ for $i=1\ldots D$. This yields a valid point on $H^D$ because $x_0^2 - \sum{i=1}^D x_i^2 = 1 + |\mathbf{v}|^2 - |\mathbf{v}|^2 = 1$. Intuitively, we are embedding the Euclidean vector as the spatial part of a hyperbolic coordinate, with
Metric: The distance between two points on the hyperboloid is given by the hyperbolic distance formula. If
[ d_H(\mathbf{x},\mathbf{y}) = \cosh^{-1}!\big(\langle \mathbf{x},\mathbf{y}\rangle_L\big),. ]
For our purposes, the exact distance formula is not as crucial as the fact that large differences in the Euclidean vector translate to additive differences in the hyperbolic space in a compressed way (due to the
Normalization: The hyperbolic embedding automatically normalizes the vector by incorporating it into a unit hyperboloid constraint. There is no arbitrary scaling;
Inverse Projection: To recover the original
[ \phi_H^{-1}(x_0, x_1,\ldots,x_D) = (x_1, x_2,\ldots,x_D),, ]
since if the point lies on
Structure: For an elliptic (positively curved) geometry, we use the model of a
[ y_0^2 + y_1^2 + \cdots + y_D^2 = 1,. ]
This is analogous to the hyperbolic case but with a positive-definite constraint. We can restrict to the “northern hemisphere” where
Coordinates: We need to project the
[ \phi_{Ell}:;\mathbb{R}^D \to S^D,\qquad \phi_{Ell}(v_1,\ldots,v_D) = \frac{1}{\sqrt{1+|\mathbf{v}|^2}};\big(,1,;v_1,;v_2,\ldots,;v_D\big),. ]
In coordinates: set
[y_0 = \frac{1}{\sqrt{1+\sum_{i=1}^D v_i^2}},]
and
[y_i = \frac{v_i}{\sqrt{1+\sum_{i=1}^D v_i^2}}]
for
An intuitive alternative view:
Metric: The intrinsic metric on
Normalization: The mapping
Inverse Projection: The mapping
[ |\mathbf{v}|^2 = \frac{1 - y_0^2}{y_0^2},. ]
(Indeed,
[ v_i = \frac{y_i}{y_0},. ]
This comes from
[ \phi_{Ell}^{-1}(y_0,y_1,\ldots,y_D) = \Big(\frac{y_1}{y_0},;\frac{y_2}{y_0},;\ldots,;\frac{y_D}{y_0}\Big),. ]
We thereby recover the original coordinate tuple
To clarify the differences, here is a summary of how each metric space handles the UHC coordinates:
-
Euclidean:
-
Dimensional structure: output point has
$D$ coordinates (same as number of digits collected). - Vector normalization: none (raw coordinates).
-
Embedding projection: identity (
$\mathbf{v} \mapsto \mathbf{v}$ ). - Inverse projection: identity (directly read off coordinates).
- Note: Unbounded coordinate values and distances; straightforward representation.
-
Dimensional structure: output point has
-
Hyperbolic:
-
Dimensional structure: output point has
$D+1$ coordinates with one constraint ($x_0^2 - \sum_{i=1}^D x_i^2 = 1$ ). Effectively$D$ degrees of freedom. -
Vector normalization: one extra coordinate (
$x_0$ ) ensures points lie on hyperboloid. Coordinates intrinsically scaled such that$x_0$ grows with vector length. -
Embedding projection:
$\mathbf{v} \mapsto (\sqrt{1+|\mathbf{v}|^2}, \mathbf{v})$ . -
Inverse projection: drop the first coordinate (recover
$\mathbf{v}$ directly from spatial part). -
Note: Unbounded
$\mathbf{v}$ yields$x_0$ large; hyperbolic distance grows sub-linearly (log-like) with$|\mathbf{v}|$ .
-
Dimensional structure: output point has
-
Elliptical (Spherical):
-
Dimensional structure: output point has
$D+1$ coords with constraint ($y_0^2+\cdots+y_D^2=1$ ).$D$ degrees of freedom. -
Vector normalization: all coordinates scaled by
$\sqrt{1+|\mathbf{v}|^2}$ factor, ensuring point lies on unit sphere. -
Embedding projection:
$\mathbf{v} \mapsto \frac{1}{\sqrt{1+|\mathbf{v}|^2}},(1, \mathbf{v})$ . -
Inverse projection: recover by
$v_i = y_i/y_0$ . -
Note: Unbounded
$\mathbf{v}$ maps to approach the south pole ($y_0 \to 0$ ); distances saturate as$\mathbf{v}$ grows.
-
Dimensional structure: output point has
Importantly, the UHC digest format will accommodate all three metrics by indicating which metric is used, and storing the coordinates accordingly. The mathematical content (the digits of
The UHC geometric hash function is bijective between natural numbers and their geometric digests. Every number
-
Uniqueness: No two distinct numbers produce the same multi-vector
$N̂$ . If$M \neq N$ , then there is at least one base$b$ where their digit expansions differ, so their coordinate tuples differ in at least one component, and thus$H(M) \neq H(N)$ . The Prime (4-operator.pdf)coherence principle guarantees a unique minimal representation for each number, so there is no ambiguity or collision in the hash. This uniqueness is analogous to a cryptographic hash with no collisions, except here it’s by construction (based on mathematical identity) rather than assumption. -
Invertibility: Given a digest, one can decode the original number by extracting any one of the base expansions contained in (4-operator.pdf) (4-operator.pdf) was constructed to hold all base expansions, so one simple decoding strategy is:
- Identify the portion of the coordinate corresponding to base 2 (for example).
- Interpret that sequence of bits as a binary representation of the number.
- Recover
$N$ by evaluating the binary digits.
Because the digest is internally consistent, this base-2 reconstruction will yield the same$N$ that would be obtained from any other base’s digits in the digest. Formally, if $\mathbf{v}N$ is the coordinate tuple, and $(a_0(2),\dots,a{k_2}(2))$ are the first$k_2+1$ entries (the binary digits), then$N = \sum_{i=0}^{k_2} a_i(2),2^i$ . This computation exactly inverts the encoding process for base 2. We could equally do it with base 10 or any available base segment. (In an implementation, base 2 is convenient because it’s guaranteed to exist and is typically the longest digit sequence, but any will do.)
-
Referential Invariance: In the UOR framework, referential invariance means that the object’s representation does not depend on an external reference frame. Here, we choose a fixed reference frame (the coordinate (4-operator.pdf)i}$ in
$C_x$ ) to produce the digest. If the reference point$x$ or the orientation of the fiber algebra were changed (via an isometry in the symmetry group$G$ of the manifold), the multi-vector$N̂$ might undergo a corresponding transformation (e.g. basis elements permuted or rotated). However, such transformations are consistently applied to all components and thus do not change the intrinsic information – the set of digit values in each base remains the same, only assigned to different basis vectors. In practical terms, the UHC digest format fixes the coordinate ordering (e.g. ascending bases and increasing digit positions) as part of the specification, which serves as a * (4-operator.pdf)eference frame**. This ensures that for a given number, the digest is unique and does not depend on arbitrary choices. The referential invariance property indicates that the identity encoded by the digest is an intrinsic property of the number, not tied to how or where the number is stored or referenced in a system. -
Base Independence: By design, the UHC digest is base-independent . It simultaneously includes all bases, so it is not biased toward or reliant on any particular numeral system. This fulfills the UOR requirement that representations be independent of arbitrary conventions like choice of base. In effect, the digest can be seen as a universal identifier for the number that would remain the same whether you think of the number in binary, decimal, or any other base.
-
Intrinsic Identity: The digest encodes the number’s intrinsic mathematical identity – its actual val (pvsnp1.pdf)tructural way. There is no auxiliary data like type, context, or address; it’s purely determined by the number itself. In UOR terms, this corresponds to the property that the object’s identity is contained entirely within the object’s representation (here, the multi-vector encapsulating all self-consistent representations of the number). Two digests can be compared to see if they represent the same number simply by checking if they are exactly identical (component-wise). This is analogous to how two identical fingerprints indicate the same person: here the “fingerprint” of the number is its multi-base digit structure.
In summary, UHC provides a lossless hashing codec:
Note: In a theoretical sense,
We now formalize the format of the UHC Geometric Digest, which is the serialized representation of the point
{
"version": 1,
"metric": "Euclidean | Hyperbolic | Elliptical",
"dimension": D,
"coordinates": [ c1, c2, c3, ..., cD ]
}
-
version
: An integer indicating the format version. (For this specification,1
is used. This allows future enhancements or changes to the format while maintaining backward compatibility.) -
metric
: A string (or code) specifying which metric interpretation is used for this digest. It can be"Euclidean"
,"Hyperbolic"
, or"Elliptical"
(other equivalent terms like"E"
/"H"
/"Spherical"
could be used in practice). This tells the decoder how to interpret the coordinates. For example,"Euclidean"
means interpret the coordinates as a direct vector$\mathbf{v}_N$ ;"Hyperbolic"
means the first coordinate is$x_0$ and the rest is$\mathbf{v}_N$ ;"Elliptical"
means coordinates are on a sphere with the first being analogous to$y_0$ . -
dimension
: An integer$D$ giving the number of coordinate components in thecoordinates
array. This makes the length explicit for error-checking and parsing. (In Euclidean mode,$D = D(N)$ as calculated above. In Hyperbolic and Elliptical modes,$D = D(N)+1$ since an extra coordinate is stored.) -
coordinates
: An array of numeric values encoding the point’s coordinates. The nature of these values depends on the metric:- In Euclidean metric, this array is simply the flattened sequence of all digits of
$N$ in bases 2 through$N$ . The ordering is by increasing base: first the base-2 digits (from$a_0(2)$ up to $a_{k_2}(2)$), then base-3 digits, and so on. Each element of the array is an integer in the range$[0, b-1]$ appropriate to its base; however, the base boundaries are implicit (not explicitly marked in the array). Thedimension
field indicates where the array ends. By the structure of the encoding, one can deduce the base segmentation because when reading from the start, the length of the base-$b$ segment is$\lfloor \log_b N \rfloor + 1$ , which could be computed if$N$ were known – but since$N$ is unknown at decode time, a decoder might parse differently (see decoding pseudocode). Typically, the simplest decode is to use the known first segment as base-2 and compute$N$ , then verify that the rest matches that$N$ in other bases. - In Hyperbolic metric, the first element of the array is
$x_0 = \sqrt{1+|\mathbf{v}_N|^2}$ (which may be a floating-point or high-precision rational number), and the remaining elements are the components of$\mathbf{v}_N$ (the digit sequence) exactly as in Euclidean. Thus the total count$D = 1 + D(N)$ . The digit sequence portion starts at index 1 of the array. In theory,$x_0$ might be irrational if$|\mathbf{v}_N|^2$ is not a perfect square, but since $|\mathbf{v}N|^2 = \sum{b=2}^N \sum_i a_i(b)^2$ is just a sum of squares of digits, it is an integer (each digit is an integer and we’re summing their squares). So$x_0 = \sqrt{\text{integer}}$ . It’s usually not an integer itself except trivial cases; we might represent it as an exact algebraic number or a decimal approximation. The important point is that it is included explicitly. - In Elliptical metric, the array contains
$y_0$ followed by the scaled coordinates$y_i$ for$i=1..D(N)$ , where$(y_0, y_1,\ldots, y_{D(N)}) = \phi_{Ell}(\mathbf{v}_N)$ on$S^{D(N)}$ . These will generally be rational or real numbers. As discussed,$y_0 = 1/\sqrt{1+|\mathbf{v}_N|^2}$ and$y_i = v_i/\sqrt{1+|\mathbf{v}_N|^2}$ . Since$|\mathbf{v}_N|^2$ is an integer, these coordinates are algebraic numbers. In a practical JSON, they might be given as decimal strings or fractions. The length$D = 1+D(N)$ here as well.
- In Euclidean metric, this array is simply the flattened sequence of all digits of
The fixed-size envelope consists of the fields version
, metric
, and dimension
, which are always present and of constant size (regardless of coordinates
field is the dynamic part whose length grows with $N`. This separation makes it clear where metadata ends and data begins.
Example Digest (Euclidean): Using a small number to illustrate, let
- Base 2:
$\lfloor \log_2 6 \rfloor+1 = 3$ digits (binary "110"). - Base 3:
$\lfloor \log_3 6 \rfloor+1 = 2$ digits (ternary "20"). - Base 4:
$3$ gives$\lfloor \log_4 6 \rfloor+1 = 2$ digits (base-4 "12"). - Base 5:
$2$ digits ("11"). - Base 6:
$2$ digits ("10").
So [0,1,1, 0,2, 2,1, 1,1, 0,1]
. Grouping for clarity: base2 (0,1,1), base3 (0,2), base4 (2,1), base5 (1,1), base6 (0,1). A possible JSON digest:
{
"version": 1,
"metric": "Euclidean",
"dimension": 11,
"coordinates": [0, 1, 1, 0, 2, 2, 1, 1, 1, 0, 1]
}
This lists all digits from base 2 up to 6. A decoder reading this would know it’s Euclidean (so first coordinate corresponds to base 2’s least significant digit). They might take the first segment of unknown length – but since they know base 2 representation must end with the highest-order digit 1 (except the number 0 which is a special case), they can detect the end of the base-2 segment when they reach the last 1
in that segment followed by something that would be a base-3 digit (in this case, after reading 0,1,1
for base2, the next digit is 0
which could be base3 LSD). However, a more straightforward method is: assume the first part is base-2, compute 0,1,1
as little-endian binary gives $12^1 + 12^2 = 2 + 4 = 6. Then the decoder can simply verify that the rest of the array matches $6$’s digits in bases 3,4,5,6 (and they do:
0,2is 6 in base3,
2,1` is 6 in base4, etc.). In practice, the decoder doesn’t even need to verify all bases if we trust the digest to be correctly formed, but the format allows consistency checking if desired.
Example Digest (Hyperbolic): For the same [$\sqrt{15}$, 0,1,1, 0,2, 2,1, 1,1, 0,1]
(with appropriate numeric format for 3.87298
as a decimal or a rational approximation). The envelope would now have "metric": "Hyperbolic", "dimension": 12
.
Example Digest (Elliptical): For "metric": "Elliptical", "dimension": 12
and the coordinates array of length 12.
The UHC digest format thus explicitly contains everything needed to reconstruct the number: the metric type, the full coordinate set, and the knowledge of how those coordinates relate to the number’s digits. It is verbose (especially listing all base expansions), but it is unambiguous and canonical. In many cases, the digest will be large; this is the cost of universality and losslessness. One could apply compression to the coordinates array or omit some redundant parts (since the number could be reconstructed from just one base’s data), but that would break the symmetry and base-independence, so the full format is kept for theoretical purity. In implementations, a balance can be struck if needed.
We now provide a more rigorous mathematical description of how the universal coordinates are projected into the geometric space, tying together the pieces described above.
Universal Coordinate Space: Define
[ U(N) = \mathbb{R}^{D(N)},, ]
where
[ \mathbf{v}N = (x{2,0}, x_{2,1},\ldots,x_{2,k_2};;x_{3,0},\ldots,x_{3,k_3};;\ldots;;x_{N,0}, x_{N,1}),, ]
where we interpret
[ N = \sum_{i=0}^{k_b} x_{b,i}, b^i,, ]
and this
We now define formal projection maps for each metric:
-
**Euclidean Projection (P_E): (4-operator.pdf)s the identity on coordinates: [ P_E: U(N) \to \mathbb{R}^{D(N)}, \quad P_E(\mathbf{v}) = \mathbf{v}. ] There is no change of dimension or normalization.
$P_E(\mathbf{v}_N)$ is just$\mathbf{v}_N$ itself, now regarded as a point in the Euclidean manifold$M = \mathbb{R}^{D(N)}$ . The inverse$P_E^{-1}$ is trivial. -
Hyperbolic Projection (P_H): This map goes from
$U(N) = \mathbb{R}^{D(N)}$ to the hyperbolic manifold$H^{D(N)} \subset \mathbb{R}^{D(N)+1}$ : [ P_H(\mathbf{v}) = \big(\sqrt{1+|\mathbf{v}|^2},; \mathbf{v}\big),. ] Here$|\mathbf{v}|^2 = \sum_{j=1}^{D(N)} v_j^2$ is the standard Euclidean norm on$U(N)$ .$P_H(\mathbf{v})$ yields a$(D(N)+1)$ -dimensional vector satisfying$x_0^2 - \sum_{j=1}^{D(N)} x_j^2 = 1$ as required. The inverse is: [ P_H^{-1}(x_0, x_1,\ldots,x_{D}) = (x_1,\ldots,x_D),, ] given that any input to$P_H^{-1}$ should satisfy the hyperboloid condition (so that$x_0$ is determined by$x_1,\ldots,x_D$ ). We can compose the Euclidean identification and hyperbolic projection:$H(N) = P_H(\mathbf{v}_N)$ is the hyperbolic embedding of the number. -
Elliptical Projection (P_{Ell}): This map goes from
$U(N) = \mathbb{R}^{D(N)}$ to the sphere$S^{D(N)} \subset \mathbb{R}^{D(N)+1}$ : [ P_{Ell}(\mathbf{v}) = \frac{1}{\sqrt{1 + |\mathbf{v}|^2}};\big(1,,v_1,,v_2,,\ldots,,v_{D}\big),. ] We can denote the output as$(y_0, y_1,\ldots,y_D)$ with$D = D(N)$ . By construction,$y_0 = 1/\sqrt{1+|\mathbf{v}|^2}$ and$y_i = v_i/\sqrt{1+|\mathbf{v}|^2}$ for$i\ge1$ . This lies on$S^D$ since$y_0^2 + \cdots + y_D^2 = 1/(1+|\mathbf{v}|^2)(1 + |\mathbf{v}|^2) = 1$ . The inverse mapping$P_{Ell}^{-1}: S^{D} \to \mathbb{R}^D$ is defined for any sphere point with$y_0 \neq 0$ (which excludes the south pole) as: [ P_{Ell}^{-1}(y_0,y_1,\ldots,y_D) = \Big(\frac{y_1}{y_0},;\frac{y_2}{y_0},;\ldots,;\frac{y_D}{y_0}\Big),. ] This recovers$\mathbf{v}$ because given our form,$y_i/y_0 = \frac{v_i/\sqrt{1+|\mathbf{v}|^2}}{1/\sqrt{1+|\mathbf{v}|^2}} = v_i$ . (Also,$y_0$ itself gives a check:$y_0 = 1/\sqrt{1+|\mathbf{v}|^2}$ , which we could use to validate consistency.)
One can verify these mappings preserve the one-to-one relationship:
Ensuring No Ambiguity: We should note that in
Algebraic Structure: It’s worth highlighting that the multi-vector
In summary, the combination of the universal embedding
[ H_{\text{metric}}(N) = P_{\text{metric}}(\mathbf{v}_N),, ]
with
The following pseudocode outlines the procedure to encode a given non-negative integer
function encodeUHC(N: non-negative integer, metricType: string) -> Digest:
# 1. Generate digit expansions for all bases 2 through N (inclusive).
# We'll store the digits in a dictionary mapping base -> list of digits (LSB first).
digits_by_base = {}
for b in range(2, N + 1):
# Compute digits of N in base b
base_digits = []
value = N
while value >= b:
remainder = value mod b
base_digits.append(remainder)
value = value // b
# Append the final value (which is < b) as the last digit
base_digits.append(value)
# Now base_digits holds the digits in LSB-to-MSB order for base b.
# (E.g., for N=42, base 3 yields [0,2,1,1] corresponding to "1120")
digits_by_base[b] = base_digits
# 2. Flatten all digits into one coordinate list in ascending base order.
coordinate = []
for b in range(2, N + 1):
# Append the digit list for base b directly.
# This yields the sequence a0(b), a1(b), ..., a_{k_b}(b) for each base in order.
coordinate.extend(digits_by_base[b])
# 3. Apply metric-specific projection or augmentation.
if metricType == "Euclidean":
coords = coordinate # no change, coords is just the list of digits.
else if metricType == "Hyperbolic":
# Calculate x0 = sqrt(1 + sum(coord_i^2)).
sum_squares = 0
for x in coordinate:
sum_squares += x * x
x0 = sqrt(1 + sum_squares)
# Prepend x0 to the coordinate list
coords = [x0] + coordinate
else if metricType == "Elliptical":
# Calculate norm_sq = 1 + sum(coord_i^2).
sum_squares = 0
for x in coordinate:
sum_squares += x * x
norm_factor = sqrt(1 + sum_squares)
# Compute y0 and scaled coordinates y_i = coordinate_i / norm_factor.
y0 = 1 / norm_factor
coords = [y0]
for x in coordinate:
coords.append(x / norm_factor)
else:
raise Error("Unknown metric type")
# 4. Build the digest structure (as a dictionary to be serialized to JSON).
digest = {}
digest["version"] = 1
digest["metric"] = metricType
digest["dimension"] = len(coords)
digest["coordinates"] = coords
return digest
Notes on the encoding pseudocode:
- The loop from base 2 to
$N$ is$O(N \log N)$ in the worst case (for each base computing digits by division). This is not optimized; practically one might break if$N$ is huge. But as a specification, we show the conceptual full expansion. -
sqrt
in hyperbolic and elliptical parts implies high-precision arithmetic may be needed if$N$ is large, to accurately represent$x0$ or$y0$ . In an exact arithmetic sense,$x0 = \sqrt{1+\sum a_i(b)^2}$ is a square root of an integer. For elliptical,$y0 = 1/\sqrt{...}$ likewise. - We assume the existence of arbitrary precision or rational representation if needed for those values (since JSON can use string to store big rationals).
- The digits are appended LSB-first for each base as per our convention. One could also store MSB-first; it’s arbitrary as long as consistent. We used LSB-first to make computing
$N$ easier (because summing$a_i b^i$ aligns with index = power). - If
$N=0$ or$N=1$ , special handling: For$N=0$ , we might define its expansions as just "0" in every base. For$N=1$ , expansions: base2 "1", base3 "1", ..., base$>1$ always "1". So we could still loop but note that for$N=1$ , the loop range(2, N+1) is just base2, and we append digit [1]. Actually base2 expansion of 1 yields [1]. And we'd presumably not go beyond base2 since range stops at$N$ . That would miss base3..baseN which are trivial "1" as well. Perhaps for completeness, the loop should be range(2, N+1) inclusive which for N=1 does nothing (since 2 to 2). That’s a corner case: if$N=1$ , maybe we should handle it by still including base 1? But base 1 is not defined (we said base >=2). So for$N=1$ , the universal tuple would conceptually be a bunch of "1" for every base >=2. It's fine to either define coordinate empty or just one base. But unique representation principle suggests$1̂$ could be represented with all bases as well, each being "1". However, that is redundant and if we followed our code, for N=1 the loop does nothing (no base2 because range(2,2) is empty). Socoordinate
remains empty. That would produce dimension 0, which is problematic since it lost the info "1". Actually, our scheme breaks for N=1 if we don't include something. To fix: we should include base N inclusive even if N=1, meaning base1 which is not allowed. So maybe handle N=1 separately: we can manually set digits_by_base[2]=[1] (represent 1 in base2) and proceed. This detail can be handled as an exception in actual code (or define$B(N) = \max(2,N)$ to ensure at least base2 is included). - For simplicity, one could decide that the loop goes to
max(N, 2)
inclusive, meaning at least base2 always. That covers N=1 fine (base2 rep of 1 is [1]). For N=0, base2 rep is [0], also fine. So we can adjust:for b in range(2, max(2, N) + 1): ...
This ensures the loop runs at least once for N=1. - That aside, conceptual clarity is more important here than those edge cases.
Now we outline the reverse: given a UHC Geometric Digest (the JSON fields metric
, dimension
, coordinates
), retrieve the original number
function decodeUHC(digest: Digest) -> integer N:
# 1. Parse envelope information.
metricType = digest["metric"]
D = digest["dimension"]
coords = digest["coordinates"] # this is a list of length D
# 2. Depending on metric, obtain the raw coordinate tuple (all base digits).
if metricType == "Euclidean":
coordinate = coords # directly the digit tuple
else if metricType == "Hyperbolic":
# coords[0] = x0, coords[1:] = digit coordinates (spatial part).
coordinate = coords[1:] # drop the first element (x0)
# (Optionally, one could verify that coords[0]^2 == 1 + sum(coordinate^2) to ensure consistency)
else if metricType == "Elliptical":
# coords[0] = y0, coords[1:] = y_i coordinates.
y0 = coords[0]
y_coords = coords[1:]
coordinate = []
for y_i in y_coords:
# Compute v_i = y_i / y0
coordinate.append(y_i / y0)
# (Optionally verify y0^2 + sum(y_coords^2) == 1 within tolerance.)
else:
raise Error("Unknown metric type")
# At this point, `coordinate` should be the flattened digit sequence for bases 2..N.
# 3. Reconstruct N from the coordinate digits.
# E.g., use the base-2 portion of the coordinate to get N.
# We need to determine how many digits belong to base 2, base 3, etc.
# One robust method: we know the first chunk is base-2. We can find its length by finding the point where the base-3 digits start.
# However, it's simpler to just use base-2 digits themselves since they are sufficient.
# Find the length of the base-2 digit segment.
# Base-2 digits will be at least 1 digit (for N>0) and we'll see them until we encounter the base-3 segment.
# But without explicit markers, we rely on the property that the last digit of base-2 segment is the most significant binary digit, which should be 1 for N>0.
# The next entry after that belongs to base-3 (if any).
# We can find the index of transition by using the kn ([4-operator.pdf](file://file-5ZQDkEqMggXBiTXwrAC6xa#:~:text=In%20the%20Prime%20Framework%20the,we%20construct%20a%20linear%20operator)) N after computing from binary, but that is a circular dependency.
# Instead, we will progressively build N by assuming the first segment is base-2.
# Compute N from base-2 digits (assuming coordinate starts with ([4-operator.pdf](file://file-5ZQDkEqMggXBiTXwrAC6xa#:~:text=Axiom%204%3A%20Coherence%20Inner%20Product%3A,with%20induced%20norm%20%E2%80%96a%E2%80%96c)) ([4-operator.pdf](file://file-5ZQDkEqMggXBiTXwrAC6xa#:~:text=in%20every%20base%20b%20%E2%89%A5,In%20this%20framework%2C%20an%20embedded))der).
base2_digits = []
# Extract digits until we reach a position that cannot be part of base-2.
# Actually, since base-3 digits might also contain 0s and 1s, there's no sure marker without additional info.
# We'll instead use the knowledge that base-2 has the maximum number of digits.
# So we try the entire coordinate as base-2 digits first:
base2_digits = coordinate
# Compute a candidate N from all coordinates as if they were binary digits:
candidate_N = 0
for i, bit in enumerate(base2_digits):
candidate_N += bit * (2 ** i)
# Now, verify if candidate_N's digest would match the given digest.
# The simplest verification is to regenerate the first few base expansions and see if they match coordinate.
# Or sp ([pvsnp1.pdf](file://file-SroANB9RFL39APmFcqKRrz#:~:text=components%20are%20consistent%20with%20each,encoded%20as%20a%20UOR%20object)) check the first part of coordinate equals candidate_N in base2, the next part equals candidate_N in base3, etc.
is_consistent = True
value = candidate_N
index = 0
for b in range(2, candidate_N + 1):
# Compute representation of candidate_N in base b
digits = []
temp = value
while temp >= b:
digits.append(temp mod b)
temp = temp // b
digits.append(temp)
# Compare with segment of `coordinate` at current index.
length = len(digits)
if coordinate[index : index+length] != digits:
is_consistent = False
break
index += length
if index >= len(coordinate):
break
if not is_consistent:
raise Error("Digest is inconsistent or corrupted")
return candidate_N
The decoding algorithm above is somewhat heuristic in how it isolates the base-2 digits. A more deterministic approach would be:
-
Recognize that base
$N$ segment (the last segment in the coordinate list) is always[0, 1]
(for$N>1$ ). One could scan from the end backwards to find the last occurrence of such pattern which might delineate base$N$ . However, a pattern like[0,1]
could also appear as part of another base's digits (e.g.,$6$ in base6 gave[0,1]
). But that happened to be also base$N$ in that case. Actually base$N$ representation is always[0,1]
. If we find that at the very end, that suggests the last base represented is equal to the number itself, confirming$B(N)=N$ . -
Alternatively, we might guess
$N$ by looking at the length of the first segment. If$k_2+1$ is the length of base-2 digits, we know$2^{k_2} \le N < 2^{k_2+1}$ . For example, if 3 binary digits,$N$ is between 4 and 7. We could then try $2^{k_2} + ...` values but that seems overkill.
Given that the simplest approach is indeed to assume the entire coordinate is binary and compute candidate_N
. Surprisingly, that actually works because the entire coordinate list if interpreted as binary digits will sum to the correct
So that approach is flawed. We need a better way to isolate base2 digits portion.
Better approach:
- We know the base-2 segment ends when the next digit in the list is
$\ge 2$ ? Actually base-2 digits can only be 0 or 1. The moment we see a digit '2' in the list, that must belong to base-3 segment (since binary digits never produce a '2'). In our 6 example, coordinate: [0,1,1, 0,2, ...] at index 3 we see '0' followed by '2'. The '2' is not a valid binary digit, so index 3 (with value 0) could either be a binary digit or might be the LSD of base3 segment? Let's see:- coordinate indices: 0->0 (valid binary LSD), 1->1, 2->1 (so far all <=1, likely binary segment).
- index 3 has value 0 which is <=1, could still be binary (like maybe a 4th binary digit of value 0). But if it were a binary digit, that would mean
$k_2+1 > 3$ . But let's see next index 4 is '2' which cannot be binary. That suggests index 3 actually might be base3 LSD. - How to decide that systematically: If we assume binary had 4 digits, they'd be [0,1,1,?]. The '?' would have to be the value at index 3, which is 0. So binary digits would be [0,1,1,0]. That yields $02^0+12^1+12^2+02^3= (0+2+4+0)=6$. It still yields 6. So binary could be 0110₂ (which is 6). But then what about the rest of coordinate? index4 onward? If we took 4 binary digits, we consumed indices 0-3 as base2. The remainder starting index4 [2,1,1,1,0,1] we would expect correspond to base3..base6 still, but we cut base2 one digit longer than actual. Let's see if that remaining matches base3 for 6:
- It would start with 2 at index4, presumably base3 LSD. That matches since 6 in base3 LSD is 0, not 2. Already mismatch. Actually if base2 took one extra digit (0 as MSB which is probably extraneous leading 0?), that's an inconsistency: base2 representation of 6 should not have a leading zero beyond necessary length. So maybe the rule: the highest base-2 digit must be 1 unless N=0. So if we include a 4th binary digit as 0, that's a leading zero, not allowed. So base-2 segment likely ends before that.
- So the last binary digit must be nonzero (1). In [0,1,1,0,...], the sequence up to the first 0 would not be correct because it ends in 0. Actually our base2 digits in correct minimal rep ended at index2 with value 1. So maybe rule: binary segment ends at the last occurrence of '1' before a digit >1 appears.
- In [0,1,1,0,2,...], the digits <=1 until index3, but index3 is 0 and index4 is 2 ( >1). So the last '1' in the initial run of <=1 digits was at index2. We could decide that index2 is end of binary because after that, although index3 is 0 (still <=1), the appearance of a '2' at index4 means index3 might actually be part of base3 segment where 0 is a legitimate digit. It's tricky to parse without knowing N.
Perhaps a simpler plan:
- Use the known structure: read base2 until you see a digit >=2, then you know you've gone one too far (the position where you saw >=2 is actually the start of base that digit belongs to).
- So to parse: let i=0, while i < len(coordinate): let b=2. for b from 2 upward: for j from 0 to ... while coordinate[i] < b (because digits of base b must be < b): that suggests maybe reading base b segment: Actually if b=2, you require all its digits are <2 always (which is true until a digit >=2 encountered). In our example: coordinate[0..2] are <2, coordinate[3] = 0 which is also <2, coordinate[4]=2 which is >=2, so one might think base2 segment is indices 0..3 (i inclusive until before 4). But that includes trailing 0. But maybe treat trailing 0 as legitimate part of base2 rep? In normal base representation, you wouldn't include trailing 0 beyond the most significant nonzero, but the digest as constructed does not include any unnecessary leading zeros in each base segment except that baseN representation always is [0,1]. Actually base6 representation of 6 included a 0 as LSD, but that was necessary, not leading because base6 rep of 6 is "10". So maybe a 0 can appear at end of binary segment if N even (LSD 0), but the MSB of binary segment definitely 1. If our scanning rule overshoots by including index3 as binary, then MSB of that 4-digit binary is 0, which violates minimal representation (we didn't include any leading zeros in encoding). So indeed the correct binary segment must end at index2 where the digit was 1 (the next index was 0 which would be a leading zero if it belonged to binary). So rule might be: end of base2 segment is when you encounter a digit that is not allowed in binary (>=2) or you encounter a position where continuing would force a leading zero at MSB of base2 representation. How to see that? Possibly if the next digit after the last '1' in the <=1 sequence is less than current base, it might be a trailing segment of same base or it might belong to next base, ambiguous.
This suggests the decoding without knowing N is nontrivial if no boundary markers. However, since this is a specification, we might not need to give an extremely optimized or even correct parsing algorithm, as long as we conceptually show how to decode.
Perhaps an easier route: because the digest is base independent, the decoder can exploit the consistency by guessing N and verifying:
-
One simple guess is to assume the first few digits are base-2 and compute an integer.
-
Or use the length or pattern like baseN's [0,1] at the end: If we can find at the end of coordinate the pattern [0,1], that probably signals base = N. If coordinate ends with [0,1], then the base for that segment is len(segment)-1's index or something? Actually baseN yields exactly two digits [0,1]. If the last two digits of coordinate are [0,1], likely that is baseN segment. If not, maybe N was 1 (excluded), or if N=2, base2 segment is [0,1] and also final since N=2, yeah still [0,1]. If coordinate ends with [0,1] we can suspect those are baseN digits. If so, N in baseN is "10", meaning N = 1*b^1 + 0 = b. Thus baseN = N means indeed that b (the base) = N, consistent. So if we detect last two digits as [0,1], we deduce that segment is base = something, and in that base representation the value is base itself. That suggests base = N. So we identify the base of the last segment as the length of segments we've included plus 1? Not directly, but: If last segment [0,1] presumably corresponds to base k (some k) representation of N. That representation "10" in base k equals k in decimal. That equals N by coherence. So N=k. So the base of the last segment is N.
Therefore, if we can find where the last segment starts, we can know N because the base at the last segment = N. How to find last segment start? The last segment is base N's digits: [0,1]. Could it ever be longer than 2 digits? If N itself is not a single-digit in base N, but by definition, N in base N is "10" (two digits) for any N>1. So always exactly two digits. Could the coordinate end with something else? If N=1, trivial. If N=0, baseN concept weird. So yes for N>=2, last two coordinate entries should be [0,1]. It might be possible that earlier in the sequence other base segments also had [0,1] (like 6 in base6 had that as last segment). However, the final [0,1] belongs to base N. If coordinate length is L, the last two indices are L-2 and L-1 with [0,1]. Now if we go just before that (L-3 index), that belongs to base N-1 or earlier. Possibly the second to last segment? But anyway, we can remove the last two entries (since we know they are baseN). Now we know N (because baseN = N). For example coordinate for 42 ends in base42 digits [0,1], so we know N=42. Or coordinate for 6 ended in base6 [0,1], know N=6.
So procedure: look at the last two coordinates:
- if they are [0,1], set candidate_N = (the count of base segments included so far + something)? Actually just set candidate_N = len? Wait 6 coordinate length was 11, [0,1] at end indicated N=6 which is not simply related to 11. But thinking: in coordinate array, baseN segment is always 2 elements. So last 2 always [0,1] if N>1. So we deduce candidate_N = ??? Possibly the base for which "10" yields those digits. But "10" yields them for base=whatever the value of 1 in second position stands for that base. Actually, if we see [0,1] we can't directly say N=some number from [0,1] alone, except that if that [0,1] is base k representation, that indicates the number is k. The fact it is the last segment implies that base = N because we always go sequentially base2..baseN.
So possibly, to get N, one approach:
- Count number of segments = N-1 segments (from base2 to baseN inclusive).
- That requires parsing segments though.
Alternatively:
- Recognize pattern at end [0,1] as baseN.
- Then N must equal the base index of that segment. But we don't know the base index directly from raw data. However, we know it's the last base considered. If we had a way to count how many base segments are in the coordinate, that count + 1 (starting from 2) gives N.
If we could parse how many segments, that is as hard as parsing lengths.
But maybe easier: try to reconstruct backwards:
- We know last 2 digits form baseN segment.
- Remove them (pop last 2 from coordinate list).
- Now the new last segment corresponds to base N-1. It might not have exactly 2 digits; could have more depending on N-1 relative to digits.
- But we could attempt to similarly see if the new last two digits are [0,1]. If yes, then base N-1 also produced "10"? Possibly if N-1 = base of second last segment.
- Actually if N was consecutive like 6, base5 representation of 6 is "11", not "10". So second last segment likely not "0,1".
- Instead, maybe decode by computing N from one known base segment e.g. binary: Actually now we know N (we hypothesize by noticing last segment is [0,1], guess N equals ??? maybe could guess that the number of base segments included was N-1, but we didn't know N to know that count). This is chicken-egg.
Actually, since referencing user question, they likely assume decode can simply use the base-2 part. It's easier to present decode as: "To decode, one can isolate the base-2 digits from the coordinate and interpret them. Because the digest is consistent, using base-2 yields the correct N." And perhaps mention how to isolate base-2 digits reliably: "since base-3 digits will include a digit '2', we can find the end of base-2 sequence by the first occurrence of a digit 2 or greater, as that signals the start of base-3 segment." This rule may erroneously include a trailing 0 from base3 as in the example though. But let's assume it works most times or qualifies it carefully.
We might avoid giving extremely detailed code for parsing as it's complex. Instead, a narrative:
"To decode, do this: For Euclidean metric, read the coordinate list. The base-2 segment can be identified as the initial sequence of 0s and 1s up to the first occurrence of a digit
We can also add that in hyperbolic or elliptical, the coordinate first drop or adjust etc then do the same.
Yes, that approach should be acceptable at spec level.
So I'll adjust decode pseudocode to something simpler conceptually and leave out heavy verification: It just explains the principle:
- For Euclidean: find first non-01 digit as boundary.
- For hyperbolic: drop x0 then same.
- For elliptical: do division, then same.
Alternatively, since this spec is heavy, we might not even need to present code verifying consistency. Just show decode by base-2 extraction.
Let's do that for brevity in final:
The Universal Hash Codec (UHC) is a formal scheme for encoding natural numbers as unique geometric objects, based on the Universal Object Reference (UOR) framework and the Prime Framework’s intrinsic number embedding. In UOR, each number is represented as a multi-vector in an algebraic fiber, embedding all its possible representations (its universal coordinate tuple) concurrently. The UHC extends this idea by defining a geometric hash function that maps any number to a point on a high-dimensional manifold, using multiple metrics (Euclidean, hyperbolic, and elliptical) to shape the space. Crucially, this mapping is lossless – it can be inverted to recover the original number exactly, preserving referential invariance, base-independence, and intrinsic identity (core UOR properties).
This specification rigorously defines the UHC geometric digest format and its mathematics. We describe how to embed a number’s universal coordinates as a point in multi-dimensional space, formalize the structure of that space under different metrics, and ensure one-to-one mapping between numbers and digests. Pseudocode is provided for the encoding (number to digest) and decoding (digest back to number) processes. All aspects are presented with clear structure and mathematical rigor to eliminate ambiguity for implementors.
Scope and Terminology: We focus on natural numbers (including 0) and their canonical UOR embeddings. A universal coordinate tuple of a number refers to the collection of its digit expansions in every possible base (≥ 2). The term digest refers to the serialized output of the UHC hash function – a structured representation (here expressed in JSON) of the geometric point encoding the number. We use N for a natural number and N̂ (N-hat) for its embedded multi-vector form in the UOR fiber algebra. The manifold M is the reference geometric space (with metric g) where points lie; depending on context, M may be flat (Euclidean), negatively curved (hyperbolic), or positively curved (elliptical/spherical). We ensure all notation and steps are consistent with UOR’s foundations.
Definition – Universal Coordinate Tuple: For each natural number (N), consider its representation in every integer base (b \ge 2). Write the base-$b$ expansion of
[ N ;=; a_{k_b}(b),b^{,k_b} + a_{k_b-1}(b),b^{,k_b-1} + \cdots + a_1(b),b + a_0(b), ]
with digits
[ E(N) ;=; \Big{ \big(a_0(b),,a_1(b),,a_2(b),,...,,a_{k_b}(b)\big)_b ;:; b = 2,3,4,\dots \Big},. ]
This tuple
Embedding as a Multi-Vector: The UOR/Prime framework provides an algebraic fiber
[ N̂ ;=; \sum_{b=2}^{B(N)} ;\sum_{i=0}^{k_b} a_i(b); e_{b,i},, ]
where
[ D(N) ;=; \sum_{b=2}^N (k_b + 1),, ]
where
Example: Suppose
- Base 2 expansion:
$42_{(10)} = 101010_{(2)}$ , digits$(0,1,0,1,0,1)_2$ . - Base 3:
$42 = 1120_{(3)}$ , digits$(0,2,1,1)_3$ . - Base 4:
$42 = 222_{(4)}$ , digits$(2,2,2)_4$ . - Base 5:
$42 = 132_{(5)}$ , digits$(2,3,1)_5$ . - Base 6:
$42 = 110_{(6)}$ , digits$(0,1,1)_6$ . - Base 7:
$42 = 60_{(7)}$ , digits$(0,6)_7$ . - Base 8:
$42 = 52_{(8)}$ , digits$(2,5)_8$ . - Base 9:
$42 = 46_{(9)}$ , digits$(6,4)_9$ . - Base 10:
$42 = 42_{(10)}$ , digits$(2,4)_{10}$ . - Base 11:
$42 = 39_{(11)}$ , digits$(9,3)_{11}$ . - ...
- Base 42:
$42 = 10_{(42)}$ , digits$(0,1)_{42}$ .
We would embed all these digit sequences into
Geometric Hash Function: We interpret the multi-vector
[ H: \mathbb{N} \to M \subset \mathbb{R}^D,\qquad H(N) = \mathbf{v}_N,, ]
where $\mathbf{v}N$ is the $D$-dimensional coordinate vector representing $N$’s multi-vector $N̂$. In simple terms, $H(N)$ takes a number $N$ and returns the point $\mathbf{v}N = (a_0(2),a_1(2),...,a{k_2}(2),;a_0(3),...,a{k_3}(3),;\dots,;a_0(N),a_1(N))$ in
Because
The target space for UHC digests is a multi-dimensional manifold
Structure: In the Euclidean version, the manifold
Coordinates: A number’s digest in Euclidean mode is simply the coordinate vector
Metric: The distance between two points
Normalization: No special normalization is needed; the coordinates are used as-is. The vector’s length
Inverse Projection: In Euclidean space, the “projection” of the multi-vector onto the manifold is the identity mapping. Therefore, inverse projection is trivial – the coordinates are read off directly as the digit sequences. To decode the number, one can isolate the segments of
Structure: For a hyperbolic geometry, we consider
[ H^D = {(x_0,x_1,\ldots,x_D) \in \mathbb{R}^{D+1} : x_0^2 - x_1^2 - \cdots - x_D^2 = 1,; x_0 > 0},. ]
This is a
Coordinates: Given the
[ \phi_H:;\mathbb{R}^D \to H^D,\qquad \phi_H(v_1,\ldots,v_D) = \Big(\sqrt{,1 + \sum_{i=1}^D v_i^2,};,; v_1,;v_2,;\ldots,;v_D\Big),. ]
In other words, we take $x_0 = \sqrt{1+| \mathbf{v}N |^2}$ and $x_i = v_i$ for $i=1\ldots D$. This yields a valid point on $H^D$ because $x_0^2 - \sum{i=1}^D x_i^2 = 1 + |\mathbf{v}|^2 - |\mathbf{v}|^2 = 1$. Intuitively, we are embedding the Euclidean vector as the spatial part of a hyperbolic coordinate, with
Metric: The distance between two points on the hyperboloid is given by the hyperbolic distance formula. If
[ d_H(\mathbf{x},\mathbf{y}) = \cosh^{-1}!\big(\langle \mathbf{x},\mathbf{y}\rangle_L\big),. ]
For our purposes, the exact distance formula is not as crucial as the fact that large differences in the Euclidean vector translate to additive differences in the hyperbolic space in a compressed way (due to the
Normalization: The hyperbolic embedding automatically normalizes the vector by incorporating it into a unit hyperboloid constraint. There is no arbitrary scaling;
Inverse Projection: To recover the original
[ \phi_H^{-1}(x_0, x_1,\ldots,x_D) = (x_1,\ldots,x_D),, ]
since if the point lies on
Structure: For an elliptic (positively curved) geometry, we use the model of a
[ y_0^2 + y_1^2 + \cdots + y_D^2 = 1,. ]
This is analogous to the hyperbolic case but with a positive-definite constraint. We can restrict to the “northern hemisphere” where
Coordinates: We need to project the
[ \phi_{Ell}:;\mathbb{R}^D \to S^D,\qquad \phi_{Ell}(v_1,\ldots,v_D) = \frac{1}{\sqrt{1+|\mathbf{v}|^2}};\big(,1,;v_1,;v_2,\ldots,;v_D\big),. ]
In coordinates: set
[y_0 = \frac{1}{\sqrt{1+\sum_{i=1}^D v_i^2}},]
and
[y_i = \frac{v_i}{\sqrt{1+\sum_{i=1}^D v_i^2}}]
for
An intuitive alternative view:
Metric: The intrinsic metric on
Normalization: The mapping
Inverse Projection: The mapping
[ |\mathbf{v}|^2 = \frac{1 - y_0^2}{y_0^2},. ]
(Indeed,
[ v_i = \frac{y_i}{y_0},. ]
This comes from
[ \phi_{Ell}^{-1}(y_0,y_1,\ldots,y_D) = \Big(\frac{y_1}{y_0},;\frac{y_2}{y_0},;\ldots,;\frac{y_D}{y_0}\Big),. ]
We thereby recover the original coordinate tuple
To clarify the differences, here is a summary of how each metric space handles the UHC coordinates:
-
Euclidean:
-
Dimensional structure: output point has
$D$ coordinates (same as number of digits collected). - Vector normalization: none (raw coordinates used directly).
-
Embedding projection: identity (
$\mathbf{v} \mapsto \mathbf{v}$ ). -
Inverse projection: identity (read off coordinates to get
$\mathbf{v}$ ). - Note: Unbounded coordinate values and distances; straightforward representation.
-
Dimensional structure: output point has
-
Hyperbolic:
-
Dimensional structure: output point has
$D+1$ coordinates with one constraint ($x_0^2 - \sum_{i=1}^D x_i^2 = 1$ ). Effectively$D$ degrees of freedom. -
Vector normalization: one extra coordinate (
$x_0$ ) ensures points lie on hyperboloid. Coordinates intrinsically scaled such that$x_0$ grows with vector length. -
Embedding projection:
$\mathbf{v} \mapsto (\sqrt{1+|\mathbf{v}|^2},, \mathbf{v})$ . -
Inverse projection: drop the first coordinate (recover
$\mathbf{v}$ from the rest). -
Note: Unbounded
$\mathbf{v}$ yields$x_0$ large; hyperbolic distance grows sub-linearly (logarithmically) with$|\mathbf{v}|$ .
-
Dimensional structure: output point has
-
Elliptical (Spherical):
-
Dimensional structure: output point has
$D+1$ coords with constraint ($y_0^2+\cdots+y_D^2=1$ ).$D$ effective degrees of freedom. -
Vector normalization: all coordinates scaled by
$\sqrt{1+|\mathbf{v}|^2}$ , ensuring the point lies on unit sphere. -
Embedding projection:
$\mathbf{v} \mapsto \frac{1}{\sqrt{1+|\mathbf{v}|^2}},(1,, \mathbf{v})$ . -
Inverse projection:
$y_i/y_0$ for each$i\ge1$ yields$v_i$ (and$y_0$ gives the norm). -
Note: Unbounded
$\mathbf{v}$ maps to near-south-pole (coordinates approach 0 for$i>0$ ,$y_0\to0$ ); distances on sphere are bounded (compression of differences).
-
Dimensional structure: output point has
Importantly, the UHC digest format accommodates all three metrics by indicating which metric is used and storing the coordinates accordingly. The mathematical content (the digits of
The UHC geometric hash function is bijective between natural numbers and their geometric digests. Every number
-
Uniqueness: No two distinct numbers produce the same multi-vector
$N̂$ . If$M \neq N$ , then there is at least one base$b$ where their digit expansions differ, so their coordinate tuples differ in at least one component, and thus$H(M) \neq H(N)$ . The Prime Framework’s coherence principle guarantees a unique minimal representation for each number, so there is no ambiguity or collision in the hash. This uniqueness is analogous to a cryptographic hash with no collisions, except here it’s by construction (based on mathematical identity) rather than assumption. -
Invertibility: Given a digest, one can decode the original number by extracting any one of the base expansions contained in it. In fact, the digest was constructed to hold all base expansions, so one simple decoding strategy is:
- Identify the portion of the coordinate corresponding to base 2 (for example, the initial segment of 0/1 entries until a value ≥2 appears).
- Interpret that sequence of bits as a binary representation of the number.
- Recover
$N$ by evaluating the binary digits (i.e.$\sum_{i} a_i(2),2^i$ ).
Because the digest is internally consistent, this base-2 reconstruction will yield the same$N$ that would be obtained from any other base’s digits in the digest. Formally, if $\mathbf{v}N$ is the coordinate tuple and $(a_0(2),\dots,a{k_2}(2))$ are the base-2 digits (with$a_{k_2}(2)=1$ as the most significant digit for$N>0$ ), then$N = \sum_{i=0}^{k_2} a_i(2),2^i$ . This computation exactly inverts the encoding process for base 2. We could equally well do it with base 10 or any other base segment. (In an implementation, base 2 is convenient because it’s guaranteed to exist and is typically the longest digit sequence, but any base present will work.)
-
Referential Invariance: In the UOR framework, referential invariance means that the object’s representation does not depend on an external reference frame. Here, we choose a fixed reference frame (the coordinate basis
$e_{b,i}$ in$C_x$ ) to produce the digest. If the reference point$x$ or the orientation of the fiber algebra were changed (via an isometry in the symmetry group$G$ of the manifold), the multi-vector$N̂$ might undergo a corresponding transformation (e.g. basis elements permuted or rotated). However, such transformations are consistently applied to all components and thus do not change the intrinsic information – the set of digit values in each base remains the same, only assigned to different basis vectors. In practical terms, the UHC digest format fixes the coordinate ordering (e.g. ascending bases and increasing digit positions) as part of the specification, which serves as a canonical reference frame. This ensures that for a given number, the digest is unique and does not depend on arbitrary choices. The referential invariance property indicates that the identity encoded by the digest is an intrinsic property of the number, not tied to how or where the number is stored or referenced in a system. -
Base Independence: By design, the UHC digest is base-independent. It simultaneously includes all bases, so it is not biased toward or reliant on any particular numeral system. This fulfills the UOR requirement that representations be independent of arbitrary conventions like choice of base. In effect, the digest can be seen as a universal identifier for the number that would remain the same whether you think of the number in binary, decimal, or any other base.
-
Intrinsic Identity: The digest encodes the number’s intrinsic mathematical identity – its actual value – in a structural way. There is no auxiliary data like type, context, or pointer; it’s purely determined by the number itself. In UOR terms, this corresponds to the property that the object’s identity is contained entirely within the object’s representation (here, the multi-vector encapsulating all self-consistent representations of the number). Two digests can be compared to see if they represent the same number simply by checking if they are exactly identical component-wise (assuming a canonical ordering). This is analogous to how two identical fingerprints indicate the same person: here the “fingerprint” of the number is its multi-base digit structure.
In summary, UHC provides a lossless hashing codec:
(Note: In a theoretical extension, $H$ could be defined for all integers or rational numbers by incorporating sign bits or separate embeddings for numerators and denominators, but for clarity we restrict to non-negative integers here. Also, extremely large $N$ will yield very high-dimensional digests; practical implementations might limit the included bases or apply compression, but those optimizations are outside the scope of this pure specification.)
We now formalize the format of the UHC Geometric Digest, which is the serialized representation of the point
{
"version": 1,
"metric": "Euclidean | Hyperbolic | Elliptical",
"dimension": D,
"coordinates": [ c1, c2, c3, ..., cD ]
}
-
version
: An integer indicating the format version. (For this specification,1
is used. This allows future enhancements or changes to the format while maintaining backward compatibility.) -
metric
: A string (or code) specifying which metric interpretation is used for this digest. It can be"Euclidean"
,"Hyperbolic"
, or"Elliptical"
(other equivalent terms like"E"
/"H"
/"Spherical"
could be used in practice). This tells the decoder how to interpret the coordinates. For example,"Euclidean"
means interpret the coordinates as a direct vector$\mathbf{v}_N$ ;"Hyperbolic"
means the first coordinate is$x_0$ and the rest form$\mathbf{v}_N$ ;"Elliptical"
means coordinates are on a sphere with the first being analogous to$y_0$ . -
dimension
: An integer$D$ giving the number of coordinate components in thecoordinates
array. This makes the length explicit for error-checking and parsing. (In Euclidean mode,$D = D(N)$ as calculated above. In Hyperbolic and Elliptical modes,$D = D(N)+1$ since an extra coordinate is stored.) -
coordinates
: An array of numeric values encoding the point’s coordinates. The nature of these values depends on the metric:- In Euclidean mode, this array is simply the flattened sequence of all digits of
$N$ in bases 2 through$N$ . The ordering is by increasing base: first the base-2 digits (from$a_0(2)$ up to $a_{k_2}(2)$), then base-3 digits, and so on. Each element of the array is an integer in the range$[0, b-1]$ appropriate to its base; however, the base boundaries are implicit (not explicitly marked in the array). Thedimension
field indicates where the array ends. By the structure of the encoding, one can deduce the base segmentation because when reading from the start, the length of the base-$b$ segment is$\lfloor \log_b N \rfloor + 1$ , which could be computed if$N$ were known – but since$N$ is unknown at decode time, a decoder might parse differently (see Decoding). Typically, the simplest decode is to use the known binary segment: since base-2 digits can only be 0 or 1, the boundary to base-3 segment is detected at the first occurrence of a digit ≥ 2. Using that binary segment to recover$N$ is straightforward. - In Hyperbolic mode, the first element of the array is
$x_0 = \sqrt{,1+|\mathbf{v}_N|^2}$ (which may be given as a floating-point or rational number), and the remaining elements are the components of$\mathbf{v}_N$ (the digit sequence) exactly as in Euclidean mode. Thus the total count$D = 1 + D(N)$ . The digit sequence portion starts at index 1 of the array. In theory,$x_0$ might be an irrational algebraic number if $|\mathbf{v}N|^2$ is not a perfect square, but since $|\mathbf{v}N|^2 = \sum{b=2}^N \sum{i} [a_i(b)]^2$ is an integer,$x_0 = \sqrt{\text{integer}+1}$ is either integer or irrational. In practice it can be represented to sufficient precision or as a surd. The key point is that$x_0$ is included explicitly so that decoding can drop it. - In Elliptical mode, the array contains
$y_0$ followed by the scaled coordinates$y_i$ for$i=1..D(N)$ , where$(y_0, y_1,\ldots, y_{D(N)}) = \phi_{Ell}(\mathbf{v}_N)$ on$S^{D(N)}$ . These will generally be real numbers. As discussed,$y_0 = 1/\sqrt{1+|\mathbf{v}_N|^2}$ and$y_i = v_i/\sqrt{1+|\mathbf{v}_N|^2}$ . Since$|\mathbf{v}_N|^2$ is an integer, these coordinates are algebraic numbers. In JSON, they might be given as decimals or strings (to preserve precision). The length is$D = 1+D(N)$ here as well.
- In Euclidean mode, this array is simply the flattened sequence of all digits of
The fixed-size envelope consists of the fields version
, metric
, and dimension
, which are always present and of constant size (regardless of coordinates
field is the dynamic part whose length grows with $N`. This separation makes it clear where metadata ends and data begins.
Example Digest (Euclidean): Using a small number to illustrate, let
- Base 2:
$\lfloor \log_2 6 \rfloor+1 = 3$ digits (binary "110", digits [0,1,1] in LSB-first order). - Base 3:
$\lfloor \log_3 6 \rfloor+1 = 2$ digits (ternary "20", digits [0,2]). - Base 4:
$\lfloor \log_4 6 \rfloor+1 = 2$ digits (base-4 "12", digits [2,1]). - Base 5:
$\lfloor \log_5 6 \rfloor+1 = 2$ digits (base-5 "11", digits [1,1]). - Base 6:
$\lfloor \log_6 6 \rfloor+1 = 2$ digits (base-6 "10", digits [0,1]).
So [0, 1, 1, 0, 2, 2, 1, 1, 1, 0, 1]
. Grouping for clarity: base2 (0,1,1), base3 (0,2), base4 (2,1), base5 (1,1), base6 (0,1). A possible JSON digest:
{
"version": 1,
"metric": "Euclidean",
"dimension": 11,
"coordinates": [0, 1, 1, 0, 2, 2, 1, 1, 1, 0, 1]
}
This lists all digits from base 2 up to 6. A decoder reading this would know it’s Euclidean (so coordinates are raw digits). They would scan from the start: binary digits can only be 0 or 1, so they read 0,1,1
as candidate binary digits until encountering 0,2
where the digit 2
signals the transition to base-3. Thus the base-2 segment is [0,1,1]
(which in little-endian represents $12^1 + 12^2 = 6$). Having obtained $N=6, the decoder can optionally verify that the subsequent segments match $6$ in base 3,4,5,6 respectively (
[0,2]is 6 in base3,
[2,1]in base4,
[1,1]in base5,
[0,1]` in base6). In practice, verifying all bases isn’t necessary if the digest is trusted, but the structure allows for it.
Example Digest (Hyperbolic): For the same [3.87298, 0,1,1, 0,2, 2,1, 1,1, 0,1]
(with "metric": "Hyperbolic", "dimension": 12
. A decoder would take the array, drop the first element (optionally check that
Example Digest (Elliptical): For [0.2582, 0.0, 0.2582, 0.2582, 0.0, 0.5164, 0.5164, 0.2582, 0.2582, 0.2582, 0.0, 0.2582]
. (Check: "metric": "Elliptical", "dimension": 12
. A decoder would take [0,1,1,0,2,2,1,1,1,0,1]
– slight floating error aside, these are the digits), then proceed to reconstruct
The UHC digest format thus explicitly contains everything needed to reconstruct the number: the metric type, the full coordinate set, and the knowledge of how those coordinates relate to the number’s digits. It is verbose (especially listing all base expansions), but it is unambiguous and canonical. In many cases, the digest will be large; this is the cost of universality and losslessness. One could compress the coordinates or omit some redundant parts (since the number could be reconstructed from just one base’s data), but that would break the symmetry and base-independence, so the full format is kept for theoretical completeness. (Implementations may optimize storage, but the specification describes the full information content.)
We now provide a more rigorous mathematical description of how the universal coordinates are projected into the geometric space, tying together the pieces described above.
Universal Coordinate Space: Define
[ U(N) = \mathbb{R}^{D(N)},, ]
where
[ \mathbf{v}N = (x{2,0}, x_{2,1},\ldots,x_{2,k_2};;x_{3,0},\ldots,x_{3,k_3};;\ldots;;x_{N,0}, x_{N,1}),, ]
where we interpret
[ N = \sum_{i=0}^{k_b} x_{b,i}, b^i,, ]
and this
Now define formal projection maps for each metric:
-
Euclidean Projection (P_E): This is the identity on coordinates: [ P_E: U(N) \to \mathbb{R}^{D(N)}, \quad P_E(\mathbf{v}) = \mathbf{v}. ] There is no change of dimension or normalization.
$P_E(\mathbf{v}_N)$ is just$\mathbf{v}_N$ itself, now regarded as a point in the Euclidean manifold$M = \mathbb{R}^{D(N)}$ . The inverse$P_E^{-1}$ is trivial. -
Hyperbolic Projection (P_H): This map goes from
$U(N) = \mathbb{R}^{D(N)}$ to the hyperbolic manifold$H^{D(N)} \subset \mathbb{R}^{D(N)+1}$ : [ P_H(\mathbf{v}) = \big(\sqrt{,1+|\mathbf{v}|^2},; \mathbf{v}1,\ldots,\mathbf{v}{D}\big),. ] Here$|\mathbf{v}|^2 = \sum_{j=1}^{D} v_j^2$ is the standard Euclidean norm on$U(N)$ .$P_H(\mathbf{v})$ yields a$(D+1)$ -dimensional vector satisfying$x_0^2 - \sum_{i=1}^D x_i^2 = 1$ as required. The inverse is: [ P_H^{-1}(x_0, x_1,\ldots,x_D) = (x_1,\ldots,x_D),, ] given that any input to$P_H^{-1}$ should satisfy the hyperboloid condition (so that$x_0$ is determined by$x_1,\ldots,x_D$ ). We can compose the Euclidean embedding and hyperbolic projection:$H(N) = P_H(\mathbf{v}_N)$ is the hyperbolic embedding of the number’s coordinate tuple. -
Elliptical Projection (P_{Ell}): This map goes from
$U(N) = \mathbb{R}^{D(N)}$ to the sphere$S^{D(N)} \subset \mathbb{R}^{D(N)+1}$ : [ P_{Ell}(\mathbf{v}) = \frac{1}{\sqrt{,1 + |\mathbf{v}|^2,}};\big(1,,v_1,,v_2,,\ldots,,v_D\big),. ] We denote the output as$(y_0, y_1,\ldots,y_D)$ with$D = D(N)$ . By construction,$y_0 = 1/\sqrt{1+|\mathbf{v}|^2}$ and$y_i = v_i/\sqrt{1+|\mathbf{v}|^2}$ for$i\ge1$ . This lies on$S^D$ since$y_0^2 + \cdots + y_D^2 = 1/(1+|\mathbf{v}|^2)(1 + |\mathbf{v}|^2) = 1$ . The inverse mapping$P_{Ell}^{-1}: S^{D} \to \mathbb{R}^D$ is defined (for$y_0 \neq 0$ ) as: [ P_{Ell}^{-1}(y_0,y_1,\ldots,y_D) = \Big(\frac{y_1}{y_0},;\frac{y_2}{y_0},;\ldots,;\frac{y_D}{y_0}\Big),. ] This recovers$\mathbf{v}$ because given our form,$y_i/y_0 = v_i$ . (We require$y_0>0$ , i.e. not the south pole, which is always true for a finite$N$ representation.)
One can verify these mappings are one-to-one:
Combined Embedding: The full UHC geometric embedding for a given metric is the composition:
[ \Psi_{\text{metric}}(N) = P_{\text{metric}}(\mathbf{v}_N),, ]
where $\mathbf{v}N$ is the canonical coordinate tuple in $U(N)$. $\Psi{\text{metric}}(N)$ yields a point in
The mathematical correctness of this scheme is underpinned by the uniqueness of $\mathbf{v}N$. Existence and uniqueness of the intrinsic embedding $N \mapsto N̂$ (hence $\mathbf{v}N$) in the Prime Framework have been proven elsewhere. Thus $\Psi{\text{metric}}$ is a well-defined injection. Because $P{\text{metric}}$ and its inverse are explicit, the decode
The following pseudocode outlines the procedure to encode a given non-negative integer
function encodeUHC(N: integer, metricType: string) -> Digest:
# 1. Generate digit expansions for all bases from 2 up to N (inclusive).
digits_by_base = {}
max_base = max(2, N) # ensure at least base 2
for b in range(2, max_base + 1):
value = N
base_digits = []
# Compute N in base b by repeated division:
while value >= b:
remainder = value mod b
base_digits.append(remainder)
value = value // b
base_digits.append(value) # last value < b
# Now base_digits is the list of digits in least-significant to most-significant order.
digits_by_base[b] = base_digits
# 2. Flatten all digits into one coordinate list in ascending base order.
coordinate = []
for b in range(2, max_base + 1):
coordinate.extend(digits_by_base[b])
# (This appends the entire list of digits for base b, which may have different lengths per b.)
# 3. Apply metric-specific projection to obtain final coordinates.
coords = [] # will hold the output coordinate array
if metricType == "Euclidean":
coords = coordinate # direct copy
else if metricType == "Hyperbolic":
# Compute x0 = sqrt(1 + sum_{j}(coordinate[j]^2))
sum_squares = 0
for x in coordinate:
sum_squares += x * x
x0 = sqrt(1 + sum_squares)
coords.append(x0)
coords.extend(coordinate)
else if metricType == "Elliptical":
# Compute norm_factor = sqrt(1 + sum_{j}(coordinate[j]^2))
sum_squares = 0
for x in coordinate:
sum_squares += x * x
norm_factor = sqrt(1 + sum_squares)
y0 = 1 / norm_factor
coords.append(y0)
for x in coordinate:
coords.append(x / norm_factor)
else:
raise ValueError("Unknown metric type: " + metricType)
# 4. Build the digest dictionary (to be serialized as JSON).
digest = {
"version": 1,
"metric": metricType,
"dimension": len(coords),
"coordinates": coords
}
return digest
Notes on encoding:
- The loop from base 2 to
$N$ is conceptual; for very large$N$ it may be impractical to literally enumerate all bases. In a practical encoder, one might stop at some cutoff or compress the pattern. However, per specification, we include up to base$N$ to truly capture all representations. - We handle small
$N$ gracefully: for$N=0$ or$N=1$ , we setmax_base = 2
so that at least base 2 is processed. The base-2 representation of 0 will be [0], of 1 will be [1]. (In theory, bases ≥3 for$N=1$ would all give [1] as well, but including them adds no new information and$B(N)=N$ rule would exclude them since$N=1$ gives max_base=2.) - The digits are collected LSB-first for each base, which aligns with how we defined $\mathbf{v}N$. For example, for $N=42$ the base-5 list would come out as [2,3,1] corresponding to $132{(5)}$.
- The metric adjustments use
sqrt
. For hyperbolic,$x0 = \sqrt{1+\sum x^2}$ will be an exact surd if all$x$ are integers. For elliptical, dividing bynorm_factor
yields rational multiples of surds. The pseudocode treats them as floats or symbolic; a real implementation might output them as decimal strings or simplified fractions as needed. - The output is structured as a JSON-like dictionary. The
coordinates
array may contain integers and/or non-integers (floats or rationals) depending on metric.
Given a UHC Geometric Digest, the decoding algorithm reconstructs the original number
function decodeUHC(digest: Digest) -> integer:
metricType = digest["metric"]
D = digest["dimension"]
coords = digest["coordinates"] # list of length D
# 1. Convert coordinates to the raw universal coordinate tuple (Euclidean form).
if metricType == "Euclidean":
coordinate = [int(x) for x in coords]
# (All should be integers already; ensure type.)
else if metricType == "Hyperbolic":
# Expect coords[0] = x0, coords[1:] = integer coordinates
x0 = coords[0] # might not be needed explicitly
coordinate = [int(x) for x in coords[1:]]
# (Optionally, verify that x0^2 ≈ 1 + sum(coordinate[j]^2) for consistency.)
else if metricType == "Elliptical":
# Expect coords[0] = y0, coords[1:] = y_i coordinates.
y0 = coords[0]
coordinate = []
for y_i in coords[1:]:
coordinate.append(int(round(y_i / y0)))
# We divide each y_i by y0 to retrieve v_i.
# (The round/int is to correct any floating precision issues; in exact math y_i/y0 is an integer.)
# (Optionally, verify y0^2 + sum(y_i^2) ≈ 1 for consistency.)
else:
raise ValueError("Unknown metric type: " + metricType)
# 2. Now 'coordinate' is the flat list of digits in base order.
# Decode the number N from this digit sequence.
# The simplest way: interpret the initial segment as base-2 digits.
base2_digits = []
for digit in coordinate:
if digit < 2:
base2_digits.append(digit)
else:
# We've hit a digit '>=2', which means base-3 segment has started.
break
# base2_digits now holds the binary digits (LSB first).
N = 0
for i, bit in enumerate(base2_digits):
N += bit * (2 ** i)
# 3. (Optional) Validate the rest of the coordinate against N's other base expansions.
# For each base b from 3 to N, generate N's base-b digits and compare to the next segment of 'coordinate'.
index = len(base2_digits)
for b in range(3, N + 1):
# Compute N in base b:
temp = N
check_digits = []
while temp >= b:
check_digits.append(temp mod b)
temp = temp // b
check_digits.append(temp)
length = len(check_digits)
if coordinate[index : index+length] != check_digits:
raise ValueError("Digest inconsistency detected at base " + b)
index += length
if index >= len(coordinate):
break
# If the loop completes without error, the digest is consistent.
return N
Notes on decoding:
- Converting from hyperbolic or elliptical coordinates back to integers may involve dealing with floating-point imprecision. In the pseudocode above, we use
round()
as a simple way to get the nearest integer. In exact arithmetic,$y_i/y_0$ should be an integer, and hyperbolic$x0$ is not needed to extract the others. - To identify the binary segment boundary, we rely on the fact that as soon as a digit value ≥ 2 is encountered, we must have moved to base 3. This works because in the binary segment all digits are 0 or 1. There is one caveat: if the binary representation of
$N$ contains a 0 in the most significant position of the segment, that would be a leading zero (which does not occur in our encoding because we don’t include unnecessary leading zeros). So the transition is unambiguous. - Once
$N$ is obtained from the binary digits, the rest of the sequence is automatically validated, but we included an optional verification loop which recomputes each base expansion of$N$ and matches it to segments of the coordinate. In a trusted scenario, this is not needed, but it’s a good consistency check especially if floating arithmetic was involved in elliptical decoding. - The decoding uses base-2 for simplicity, but one could similarly decode using the final segment (base-$N$) or any clear delimiter. For instance, the last two digits of the coordinate should be
[0,1]
(base-$N$ representation of$N$ ), which could also be used to identify$N$ . Using binary (the first segment) tends to be straightforward because it’s always present and usually the longest segment.
This algorithm will recover the exact integer
The Universal Hash Codec (UHC) Geometric Specification described above fulfills all the requirements of a pure UOR-based implementation:
- It defines a multi-vector hash that embeds a number’s entire numeric identity (digits in all bases) as a single geometric point.
- It leverages Euclidean, hyperbolic, or elliptical geometry to host these points, detailing how each metric shapes the representation.
- The mapping is provably lossless and invertible, with a unique digest for each number.
- The UHC Geometric Digest format is explicitly given, separating fixed metadata from variable coordinate data.
- Mathematical projection formulas and inverse mappings are provided for clarity and rigor in each metric space.
- Encoding and decoding procedures are spelled out in pseudocode, ensuring that implementors have a step-by-step recipe to follow.
- The treatment of dimensional structure, normalization, and inverse projection in each metric space is clearly delineated, so one understands how to go from abstract coordinates to geometric points and back.
- Throughout, the approach preserves UOR principles: the representation is independent of any particular numeric base or external reference, and it encapsulates the number’s intrinsic identity fully within the digest.
By following this specification, one can implement a Universal Hash Codec that serves as a universal referentially invariant identifier for numbers, suitable for applications that require a canonical representation of numeric values across different systems or contexts. The geometric nature of the digest might also enable novel uses (such as visualizing arithmetic or employing geometric transformations as computations), illustrating the rich interplay between algebra, number theory, and geometry in the UOR Prime Framework.