Codec Roadmap#

What compressors currently supports, how each codec is plumbed in, and what’s still on the to-do list. ✅ marks codecs with a working evaluation harness in this repo and at least one rate-distortion + encode-complexity sweep on Kodak under results/<codec>/.

Supported#

Conventional, via Pillow built-ins#

The compressors.pillow harness (src/compressors/pillow/) is a thin wrapper around PIL.Image.save(format=..., quality=q). It accepts any PIL-registered format and routes per-format kwargs through _quality_dispatch.QUALITY_DISPATCH.

  • JPEG — libjpeg-turbo backend.

  • AVIF — libavif (Pillow’s built-in or pillow-avif-plugin). Three speed settings evaluated (default, speed=0, speed=10) to characterize the full speed-quality envelope.

The harness also supports — but hasn’t been swept on Kodak yet — every other format Pillow ships, including:

  • WebP — libwebp (lossy + lossless).

  • JPEG 2000 — OpenJPEG. Note: in PIL the quality_layers kwarg is inverted (higher value = lower output quality); already wired in _quality_dispatch.py.

  • PNG / TIFF Adobe Deflate — lossless baselines.

Pillow plugins (third-party PIL backends)#

  • JPEG-XL — via pillow-jxl-plugin (libjxl). Lives in src/compressors/jxl/, which is a thin wrapper that imports the plugin and forwards to the pillow harness with --format JXL.

  • JPEG-LS — via pillow_jpls (CharLS). Used internally by FRAPPE, WaLLoC, and LiVeAction to entropy-code their integer latents. No standalone harness yet, but adding one is one line of glue.

Diffusers VQ autoencoders#

src/compressors/diffusers/ evaluates the four pretrained quantized autoencoders shipped in upstream diffusers. None of these has a native rate knob; quality is implemented as a target-pixel-ratio resize-down inside the encoder pipeline. See src/compressors/diffusers/_codec.py for the registry.

  • LDM-SR (CompVis/ldm-super-resolution-4x-openimages) — VQModel, 4× downsample, codebook 8192.

  • VQ-Diffusion ITHQ (microsoft/vq-diffusion-ithq) — VQModel, 8× downsample, codebook 4096.

  • Kandinsky 2.1 MoVQ (kandinsky-community/kandinsky-2-1) — VQModel, 8× downsample, codebook 16384.

  • Stable Cascade VQGAN (stabilityai/stable-cascade) — PaellaVQModel, 4× downsample, codebook 8192.

  • Kandinsky 3 MoVQ — intentionally absent. The published kandinsky-community/kandinsky-3 movq/diffusion_pytorch_model.fp16.safetensors ships with quantize.embedding.weight all zeros; every index decodes to the zero vector and PSNR is pinned at ~12 dB. Re-add if HF fixes the upload.

Vendored learned codecs#

Upstream compressai pins legacy torch versions, so the minimal model

  • entropy-coding closure is vendored under src/compressors/compressai_baselines/ (BSD-3-Clause-Clear; see src/compressors/compressai_baselines/LICENSE). The CompressAI baselines compute bpp from forward-pass likelihoods (-log2(likelihoods).sum() / n_pixels); no _CXX rANS coder needed. MCUCoder reuses the same vendored CompressAI layers (AttentionBlock, conv, deconv) but ships its own model architecture and a per-channel min/max + Huffman entropy coder.

  • cheng2020-anchor — Cheng, Sun, Takeuchi, Katto, CVPR 2020, GMM hyperprior without attention. 6 quality levels.

  • mbt2018 — Minnen, Ballé, Toderici, NeurIPS 2018, joint autoregressive + hierarchical priors. 8 quality levels. Sweep in flight (q=7,8 are slow on CPU due to the autoregressive context model).

  • MCUCoder — Hojjat, Haberer, Landsiedel, DAGM-GCPR 2025. Asymmetric variable-rate codec with a tiny (~3-conv) encoder and a heavy AttentionBlock-based decoder; 12 native quality levels via model.p = used_filter / 12. MIT, Kiel University. MS-SSIM-trained checkpoint pulled from zenodo:14988203 and cached under ~/.cache/compressors/mcucoder/. fp32 PyTorch only — the upstream INT8 / TFLite / CMSIS-NN deployment path is out of scope for this baseline.

In-house / authored, via external packages#

  • FRAPPE — progressive multi-scale autoencoder. Weights on the Hub at danjacobellis/FRAPPE (config.json + FRAPPE_pytorch_model.safetensors). compressors.frappe.load_from_hub() is the canonical loader.

  • WaLLoC — wavelet-domain learned codec, DCC 2025. Installed via pip install walloc; checkpoint RGB_16x.pth from the Hub (danjacobellis/walloc).

  • LiVeAction — DCC 2026 (in press). Installed via pip install livecodec; checkpoint lsdir_f16c48.pth from the Hub (danjacobellis/liveaction).

Planned (not yet implemented)#

  • VVC intra (H.266) — currently the strongest conventional still- image codec. Candidate implementations:

    • VVenC + VVdeC (Fraunhofer HHI) — production C++; BSD-3-Clear. Likely path. Open question: Pillow plugin vs subprocess wrapper.

    • VTM (reference) — slow, research-only; BSD-3-Clear.

    • uvg266 — fast research encoder; BSD-3.

Pluggable entropy coding#

For learned codecs whose encoder/decoder are essentially lossy analysis/synthesis transforms followed by a separate entropy coding stage — currently FRAPPE, WaLLoC, and LiVeAction — the goal is to mix and match the entropy coder independently of the analysis transform. The same trained transform should be combinable with multiple entropy coders (range / arithmetic with various probability models, ANS, learned context models) so that both the rate-distortion curve and the encoder’s runtime cost can be studied in isolation.

Implications:

  • Treat the analysis transform and the entropy coder as separate, swappable components rather than baking them together inside a single codec class.

  • The entropy-coder interface should be generic enough to be reused across analysis transforms.

  • Evaluation (RD + complexity) should report per (transform × entropy coder) combination.

The FRAPPE harness already has a four-function pluggable contract (compressors.frappe.entropy_coding: arrange_latents / unarrange_latents / encode_latents / decode_latents), with candidate alternative entropy coders living under experiments/encoder_optimization/. Open questions: which broader set of entropy coders to support out of the gate; whether constriction / torchac / CompressAI’s coders can be reused as-is or need wrapping.

Evaluation#

  • Rate-distortion — bpp / PSNR / SSIM / LPIPS / DISTS on Kodak via piq.SSIMLoss, piq.LPIPS, piq.DISTS. Same metric stack across every codec module above.

  • Encode-complexity — wallclock timing on 512×512 Kodak center crops, decomposed into per-stage timings (resize / analysis / transfer / store for codecs that internally resize-down for quality control; single encode stage for native-rate codecs). Uses the throughput.image.wallclock singleton.

  • Static cost (FLOPs / MACs) — not implemented yet. Candidate libraries: thop, fvcore, ptflops, calflops. Hardware- independent first-order proxy for cost.

  • Decode-complexity — out of scope so far; deferred along with FLOP counting.

In planning#

Once enough measurements have been collected across codecs × testbeds × resolutions, package the results as a public leaderboard in the spirit of the timm leaderboard — a Hugging Face Space combining rate-distortion and computational metrics for side-by-side codec comparison. Available testbeds: Mac mini (Apple Silicon), Raspberry Pi, Intel GPU, NVIDIA GPU, various x86 CPUs.