# Codec Roadmap

What `compressors` currently supports, how each codec is plumbed in, and
what's still on the to-do list. ✅ marks codecs with a working
evaluation harness in this repo and at least one rate-distortion +
encode-complexity sweep on Kodak under `results/<codec>/`.

## Supported

### Conventional, via Pillow built-ins

The `compressors.pillow` harness (`src/compressors/pillow/`) is a thin
wrapper around `PIL.Image.save(format=..., quality=q)`. It accepts any
PIL-registered format and routes per-format kwargs through
`_quality_dispatch.QUALITY_DISPATCH`.

- ✅ **JPEG** — libjpeg-turbo backend.
- ✅ **AVIF** — libavif (Pillow's built-in or `pillow-avif-plugin`).
  Three speed settings evaluated (default, `speed=0`, `speed=10`) to
  characterize the full speed-quality envelope.

The harness also supports — but hasn't been swept on Kodak yet — every
other format Pillow ships, including:

- ⏳ **WebP** — libwebp (lossy + lossless).
- ⏳ **JPEG 2000** — OpenJPEG. Note: in PIL the `quality_layers` kwarg
  is *inverted* (higher value = lower output quality); already wired
  in `_quality_dispatch.py`.
- ⏳ **PNG / TIFF Adobe Deflate** — lossless baselines.

### Pillow plugins (third-party PIL backends)

- ✅ **JPEG-XL** — via `pillow-jxl-plugin` (libjxl). Lives in
  `src/compressors/jxl/`, which is a thin wrapper that imports the
  plugin and forwards to the pillow harness with `--format JXL`.
- ✅ **JPEG-LS** — via `pillow_jpls` (CharLS). Used internally by
  FRAPPE, WaLLoC, and LiVeAction to entropy-code their integer
  latents. No standalone harness yet, but adding one is one line of
  glue.

### Diffusers VQ autoencoders

`src/compressors/diffusers/` evaluates the four pretrained quantized
autoencoders shipped in upstream `diffusers`. None of these has a
native rate knob; quality is implemented as a target-pixel-ratio
resize-down inside the encoder pipeline. See
`src/compressors/diffusers/_codec.py` for the registry.

- ✅ **LDM-SR** (`CompVis/ldm-super-resolution-4x-openimages`) —
  `VQModel`, 4× downsample, codebook 8192.
- ✅ **VQ-Diffusion ITHQ** (`microsoft/vq-diffusion-ithq`) — `VQModel`,
  8× downsample, codebook 4096.
- ✅ **Kandinsky 2.1 MoVQ** (`kandinsky-community/kandinsky-2-1`) —
  `VQModel`, 8× downsample, codebook 16384.
- ✅ **Stable Cascade VQGAN** (`stabilityai/stable-cascade`) —
  `PaellaVQModel`, 4× downsample, codebook 8192.
- ❌ **Kandinsky 3 MoVQ** — intentionally absent. The published
  `kandinsky-community/kandinsky-3` `movq/diffusion_pytorch_model.fp16.safetensors`
  ships with `quantize.embedding.weight` all zeros; every index decodes
  to the zero vector and PSNR is pinned at ~12 dB. Re-add if HF fixes
  the upload.

### Vendored learned codecs

Upstream `compressai` pins legacy torch versions, so the minimal model
+ entropy-coding closure is vendored under
`src/compressors/compressai_baselines/` (BSD-3-Clause-Clear; see
`src/compressors/compressai_baselines/LICENSE`). The CompressAI
baselines compute bpp from forward-pass likelihoods
(`-log2(likelihoods).sum() / n_pixels`); no `_CXX` rANS coder needed.
MCUCoder reuses the same vendored CompressAI layers (`AttentionBlock`,
`conv`, `deconv`) but ships its own model architecture and a
per-channel min/max + Huffman entropy coder.

- ✅ **cheng2020-anchor** — Cheng, Sun, Takeuchi, Katto, CVPR 2020,
  GMM hyperprior without attention. 6 quality levels.
- ⏳ **mbt2018** — Minnen, Ballé, Toderici, NeurIPS 2018, joint
  autoregressive + hierarchical priors. 8 quality levels. Sweep
  in flight (q=7,8 are slow on CPU due to the autoregressive context
  model).
- ✅ **MCUCoder** — Hojjat, Haberer, Landsiedel, DAGM-GCPR 2025.
  Asymmetric variable-rate codec with a tiny (~3-conv) encoder and a
  heavy `AttentionBlock`-based decoder; 12 native quality levels via
  `model.p = used_filter / 12`. MIT, Kiel University. MS-SSIM-trained
  checkpoint pulled from `zenodo:14988203` and cached under
  `~/.cache/compressors/mcucoder/`. fp32 PyTorch only — the upstream
  INT8 / TFLite / CMSIS-NN deployment path is out of scope for this
  baseline.

### In-house / authored, via external packages

- ✅ **FRAPPE** — progressive multi-scale autoencoder. Weights on the
  Hub at `danjacobellis/FRAPPE` (`config.json` +
  `FRAPPE_pytorch_model.safetensors`). `compressors.frappe.load_from_hub()`
  is the canonical loader.
- ✅ **WaLLoC** — wavelet-domain learned codec, DCC 2025. Installed via
  `pip install walloc`; checkpoint `RGB_16x.pth` from the Hub
  (`danjacobellis/walloc`).
- ✅ **LiVeAction** — DCC 2026 (in press). Installed via
  `pip install livecodec`; checkpoint `lsdir_f16c48.pth` from the Hub
  (`danjacobellis/liveaction`).

## Planned (not yet implemented)

- ⏳ **VVC intra (H.266)** — currently the strongest conventional still-
  image codec. Candidate implementations:
  - **VVenC + VVdeC** (Fraunhofer HHI) — production C++; BSD-3-Clear.
    Likely path. Open question: Pillow plugin vs subprocess wrapper.
  - **VTM** (reference) — slow, research-only; BSD-3-Clear.
  - **uvg266** — fast research encoder; BSD-3.

### Pluggable entropy coding

For learned codecs whose encoder/decoder are essentially **lossy
analysis/synthesis transforms followed by a separate entropy coding
stage** — currently FRAPPE, WaLLoC, and LiVeAction — the goal is to
**mix and match the entropy coder independently of the analysis
transform**. The same trained transform should be combinable with
multiple entropy coders (range / arithmetic with various probability
models, ANS, learned context models) so that both the rate-distortion
curve and the encoder's runtime cost can be studied in isolation.

Implications:

- Treat the analysis transform and the entropy coder as separate,
  swappable components rather than baking them together inside a
  single codec class.
- The entropy-coder interface should be generic enough to be reused
  across analysis transforms.
- Evaluation (RD + complexity) should report per (transform × entropy
  coder) combination.

The FRAPPE harness already has a four-function pluggable contract
(`compressors.frappe.entropy_coding`: `arrange_latents` /
`unarrange_latents` / `encode_latents` / `decode_latents`), with
candidate alternative entropy coders living under
`experiments/encoder_optimization/`. Open questions: which broader
set of entropy coders to support out of the gate; whether
`constriction` / `torchac` / CompressAI's coders can be reused as-is
or need wrapping.

## Evaluation

- ✅ **Rate-distortion** — bpp / PSNR / SSIM / LPIPS / DISTS on Kodak
  via `piq.SSIMLoss`, `piq.LPIPS`, `piq.DISTS`. Same metric stack
  across every codec module above.
- ✅ **Encode-complexity** — wallclock timing on 512×512 Kodak center
  crops, decomposed into per-stage timings (`resize` / `analysis` /
  `transfer` / `store` for codecs that internally resize-down for
  quality control; single `encode` stage for native-rate codecs). Uses
  the `throughput.image.wallclock` singleton.
- ⏳ **Static cost (FLOPs / MACs)** — not implemented yet. Candidate
  libraries: `thop`, `fvcore`, `ptflops`, `calflops`. Hardware-
  independent first-order proxy for cost.
- ⏳ **Decode-complexity** — out of scope so far; deferred along with
  FLOP counting.

## In planning

Once enough measurements have been collected across codecs × testbeds
× resolutions, package the results as a public leaderboard in the
spirit of the [timm leaderboard](https://huggingface.co/spaces/timm/leaderboard)
— a Hugging Face Space combining rate-distortion and computational
metrics for side-by-side codec comparison. Available testbeds: Mac
mini (Apple Silicon), Raspberry Pi, Intel GPU, NVIDIA GPU, various
x86 CPUs.
