PRIBOR: CHE — Contextual Hyper-Embedding (uint8)
A more economical alternative to classical attention
CHE (Contextual Hyper-Embedding, uint8) offers a radical gain in efficiency compared with the standard attention mechanisms of large language models. Similar methods have appeared in recent research, but none reach CHE’s level of economy.
1 · Memory efficiency
-
Standard attention: float16 / float32 matrices → 700 to 4000 bits per token
-
CHE (uint8): 8 bits per token
➡ A ×500 to ×5000 reduction in memory footprint.
2 · Comparable approaches
Several projects already explore integer-based attention, confirming that the paradigm shift is underway:
-
INT-FlashAttention (Peking University, 2024): full INT8 attention — 72 % faster, 82 % less error.
-
SageAttention (OpenReview, 2024): INT8 attention + smoothing, plug-and-play.
-
LLM.int8() (NeurIPS 2022): matrix multiplication entirely in INT8.
In other words, uint8 quantization is already the new normal for efficient attention.
🔗 Proof of concept – Combinatorial Magic Logic (Paul Jorion Blog, 2025)
3 · CHE’s distinctive principle
CHE compresses each token into a single uint8 value — the truncated form (SHA-256 [0 : 8]) within a ℝ⁴ triplet.
-
No 700×700 matrix.
-
No softmax.
-
No floating-point computation.
Just a compact integer-based representation: lighter, faster, and natively compatible with existing quantized-attention architectures.
Contact: pauljorion@pribor.ai