Paul Jorion's Blog

PRIBOR: CHE — Contextual Hyper-Embedding (uint8)

A more economical alternative to classical attention

CHE (Contextual Hyper-Embedding, uint8) offers a radical gain in efficiency compared with the standard attention mechanisms of large language models. Similar methods have appeared in recent research, but none reach CHE’s level of economy.

1 · Memory efficiency

Standard attention: float16 / float32 matrices → 700 to 4000 bits per token
CHE (uint8): 8 bits per token
➡ A ×500 to ×5000 reduction in memory footprint.

2 · Comparable approaches

Several projects already explore integer-based attention, confirming that the paradigm shift is underway:

INT-FlashAttention (Peking University, 2024): full INT8 attention — 72 % faster, 82 % less error.
SageAttention (OpenReview, 2024): INT8 attention + smoothing, plug-and-play.
LLM.int8() (NeurIPS 2022): matrix multiplication entirely in INT8.

In other words, uint8 quantization is already the new normal for efficient attention.

🔗 Proof of concept – Combinatorial Magic Logic (Paul Jorion Blog, 2025)

3 · CHE’s distinctive principle

CHE compresses each token into a single uint8 value — the truncated form (SHA-256 [0 : 8]) within a ℝ⁴ triplet.

No 700×700 matrix.
No softmax.
No floating-point computation.

Just a compact integer-based representation: lighter, faster, and natively compatible with existing quantized-attention architectures.

Contact: pauljorion@pribor.ai

October 18, 2025

Paul Jorion

Artificial Intelligence, Human complex systems

Artificial Intelligence, CHE (Contextual Hyper-Embedding), Combinatorial Magic, Large Language Models, LLM

PRIBOR: CHE — Contextual Hyper-Embedding (uint8)

PRIBOR: CHE — Contextual Hyper-Embedding (uint8)

A more economical alternative to classical attention

1 · Memory efficiency

2 · Comparable approaches

3 · CHE’s distinctive principle

Leave a Reply Cancel reply