How it works

How Cubrim compresses, step by step

Cubrim turns a stream of bytes into a compact archive through a fixed pipeline. Watch each stage run on a real input below — drag to rotate, step through the stages, and see exactly what happens to your data.

Stage / 6 — Domainize Build the cube Distance map Run-length encode Code the values Assemble & safeguard

Drag to rotate · scroll to zoom

Domainize

Bytes → values

The input bytes become a sequence of values. Each position in the file carries one value in the range 0–255. This is the raw material the cube is built from.

Build the cube

φ: index → coordinates

Every value lands at a coordinate inside an N-dimensional cube. By default N=2 and the edge is 256, so the file becomes a 256×256 grid: position i maps to (i mod 256, i ÷ 256) via the mixed-radix φ function. The value at each cell is its height and colour.

Distance map

Gaps between populated points

Only the cells that actually hold data are kept. Along each axis Cubrim records the distance from one populated point to the next — the gaps — instead of absolute coordinates. Sparse data turns into a short list of gaps.

Run-length encode

Collapse repeated gaps

The gap streams are run-length encoded: a run of identical gaps collapses into a single (value, count) pair. Regular, clustered layouts shrink dramatically here.

Code the values

bitpack · RLE-codes · Huffman

The value stream is packed. The default packs each value into the fewest fixed bits; clustered streams use run-length codes; and the entropy scheme assigns short canonical-Huffman codewords to frequent values and long ones to rare values.

Assemble & safeguard

header + blob, or raw-store

A compact header (the parameters needed to rebuild everything) is prepended and the streams are concatenated into the final archive. If the cube path would not beat storing the input verbatim, Cubrim falls back to raw-store — so output never grows. Decoding reverses every step exactly: lossless, byte for byte.

▬

Byte values 0–255 as coloured cells

Height = value magnitude

low mid high

⊞

Flat 1D ribbon folds into a 2D grid

24×24 cells shown (scaled-down illustration)

low mid high

↔

Empty cells dimmed; arrows show gaps

Only populated points kept

low mid high

⊟

Repeated gap bars collapse to one block

(value × count) = compact pair

low mid high

◈

Frequent values → short bit codes

Rare values → long codes (Huffman)

low mid high

▣

Streams assembled into one archive

Raw-store fallback if cube > input

low mid high

Lossless, always

Every stage is reversible. Decoding reads the header and walks the pipeline backwards to reconstruct the original bytes exactly — there is no lossy mode.