Codec Roadmap#
What compressors currently supports, how each codec is plumbed in, and
what’s still on the to-do list. ✅ marks codecs with a working
evaluation harness in this repo and at least one rate-distortion +
encode-complexity sweep on Kodak under results/<codec>/.
Supported#
Conventional, via Pillow built-ins#
The compressors.pillow harness (src/compressors/pillow/) is a thin
wrapper around PIL.Image.save(format=..., quality=q). It accepts any
PIL-registered format and routes per-format kwargs through
_quality_dispatch.QUALITY_DISPATCH.
✅ JPEG — libjpeg-turbo backend.
✅ AVIF — libavif (Pillow’s built-in or
pillow-avif-plugin). Three speed settings evaluated (default,speed=0,speed=10) to characterize the full speed-quality envelope.
The harness also supports — but hasn’t been swept on Kodak yet — every other format Pillow ships, including:
⏳ WebP — libwebp (lossy + lossless).
⏳ JPEG 2000 — OpenJPEG. Note: in PIL the
quality_layerskwarg is inverted (higher value = lower output quality); already wired in_quality_dispatch.py.⏳ PNG / TIFF Adobe Deflate — lossless baselines.
Pillow plugins (third-party PIL backends)#
✅ JPEG-XL — via
pillow-jxl-plugin(libjxl). Lives insrc/compressors/jxl/, which is a thin wrapper that imports the plugin and forwards to the pillow harness with--format JXL.✅ JPEG-LS — via
pillow_jpls(CharLS). Used internally by FRAPPE, WaLLoC, and LiVeAction to entropy-code their integer latents. No standalone harness yet, but adding one is one line of glue.
Diffusers VQ autoencoders#
src/compressors/diffusers/ evaluates the four pretrained quantized
autoencoders shipped in upstream diffusers. None of these has a
native rate knob; quality is implemented as a target-pixel-ratio
resize-down inside the encoder pipeline. See
src/compressors/diffusers/_codec.py for the registry.
✅ LDM-SR (
CompVis/ldm-super-resolution-4x-openimages) —VQModel, 4× downsample, codebook 8192.✅ VQ-Diffusion ITHQ (
microsoft/vq-diffusion-ithq) —VQModel, 8× downsample, codebook 4096.✅ Kandinsky 2.1 MoVQ (
kandinsky-community/kandinsky-2-1) —VQModel, 8× downsample, codebook 16384.✅ Stable Cascade VQGAN (
stabilityai/stable-cascade) —PaellaVQModel, 4× downsample, codebook 8192.❌ Kandinsky 3 MoVQ — intentionally absent. The published
kandinsky-community/kandinsky-3movq/diffusion_pytorch_model.fp16.safetensorsships withquantize.embedding.weightall zeros; every index decodes to the zero vector and PSNR is pinned at ~12 dB. Re-add if HF fixes the upload.
Vendored learned codecs#
Upstream compressai pins legacy torch versions, so the minimal model
entropy-coding closure is vendored under
src/compressors/compressai_baselines/(BSD-3-Clause-Clear; seesrc/compressors/compressai_baselines/LICENSE). The CompressAI baselines compute bpp from forward-pass likelihoods (-log2(likelihoods).sum() / n_pixels); no_CXXrANS coder needed. MCUCoder reuses the same vendored CompressAI layers (AttentionBlock,conv,deconv) but ships its own model architecture and a per-channel min/max + Huffman entropy coder.
✅ cheng2020-anchor — Cheng, Sun, Takeuchi, Katto, CVPR 2020, GMM hyperprior without attention. 6 quality levels.
⏳ mbt2018 — Minnen, Ballé, Toderici, NeurIPS 2018, joint autoregressive + hierarchical priors. 8 quality levels. Sweep in flight (q=7,8 are slow on CPU due to the autoregressive context model).
✅ MCUCoder — Hojjat, Haberer, Landsiedel, DAGM-GCPR 2025. Asymmetric variable-rate codec with a tiny (~3-conv) encoder and a heavy
AttentionBlock-based decoder; 12 native quality levels viamodel.p = used_filter / 12. MIT, Kiel University. MS-SSIM-trained checkpoint pulled fromzenodo:14988203and cached under~/.cache/compressors/mcucoder/. fp32 PyTorch only — the upstream INT8 / TFLite / CMSIS-NN deployment path is out of scope for this baseline.
Planned (not yet implemented)#
⏳ VVC intra (H.266) — currently the strongest conventional still- image codec. Candidate implementations:
VVenC + VVdeC (Fraunhofer HHI) — production C++; BSD-3-Clear. Likely path. Open question: Pillow plugin vs subprocess wrapper.
VTM (reference) — slow, research-only; BSD-3-Clear.
uvg266 — fast research encoder; BSD-3.
Pluggable entropy coding#
For learned codecs whose encoder/decoder are essentially lossy analysis/synthesis transforms followed by a separate entropy coding stage — currently FRAPPE, WaLLoC, and LiVeAction — the goal is to mix and match the entropy coder independently of the analysis transform. The same trained transform should be combinable with multiple entropy coders (range / arithmetic with various probability models, ANS, learned context models) so that both the rate-distortion curve and the encoder’s runtime cost can be studied in isolation.
Implications:
Treat the analysis transform and the entropy coder as separate, swappable components rather than baking them together inside a single codec class.
The entropy-coder interface should be generic enough to be reused across analysis transforms.
Evaluation (RD + complexity) should report per (transform × entropy coder) combination.
The FRAPPE harness already has a four-function pluggable contract
(compressors.frappe.entropy_coding: arrange_latents /
unarrange_latents / encode_latents / decode_latents), with
candidate alternative entropy coders living under
experiments/encoder_optimization/. Open questions: which broader
set of entropy coders to support out of the gate; whether
constriction / torchac / CompressAI’s coders can be reused as-is
or need wrapping.
Evaluation#
✅ Rate-distortion — bpp / PSNR / SSIM / LPIPS / DISTS on Kodak via
piq.SSIMLoss,piq.LPIPS,piq.DISTS. Same metric stack across every codec module above.✅ Encode-complexity — wallclock timing on 512×512 Kodak center crops, decomposed into per-stage timings (
resize/analysis/transfer/storefor codecs that internally resize-down for quality control; singleencodestage for native-rate codecs). Uses thethroughput.image.wallclocksingleton.⏳ Static cost (FLOPs / MACs) — not implemented yet. Candidate libraries:
thop,fvcore,ptflops,calflops. Hardware- independent first-order proxy for cost.⏳ Decode-complexity — out of scope so far; deferred along with FLOP counting.
In planning#
Once enough measurements have been collected across codecs × testbeds × resolutions, package the results as a public leaderboard in the spirit of the timm leaderboard — a Hugging Face Space combining rate-distortion and computational metrics for side-by-side codec comparison. Available testbeds: Mac mini (Apple Silicon), Raspberry Pi, Intel GPU, NVIDIA GPU, various x86 CPUs.