Free Hunch: Denoiser Covariance Estimation for Diffusion Models Without Extra Costs

Aalto University

ICLR 2025

T L D R:

  • Covariance for Diffusion Models. We reuse information already present in training data and the generative trajectory to estimate the denoiser covariance—no retraining, no score Hessians.
  • Two lightweight updates. A time update transfers covariance across noise levels; a space update performs BFGS-like low-rank corrections along the sampler path.
  • Works effectively for Reconstruction Guidance. Better covariance → stable reconstruction guidance for linear inverse problems → sharper details at low step counts.
Deblurring comparison across methods with few solver steps

Teaser. With few sampler steps, accurate denoiser covariance turns out to be crucial for high-fidelity reconstruction guidance for linear inverse problems.

Abstract

The conditional score for inverse problems needs the denoiser mean and covariance of \(p(x_0 \mid x_t)\). Prior work either adds heavy test-time compute, modifies training/architecture, or uses crude (often diagonal) covariances. Free Hunch (FH) integrates two free sources: (i) data covariance (DCT-diagonal for images) and (ii) curvature observed along the generative trajectory via a BFGS-style online update. A simple time-transfer rule moves covariance between noise levels. On ImageNet inverse problems (deblurring, inpainting, super-resolution), FH improves quality—especially LPIPS—at small step counts, while staying training-free and architecture-agnostic.

Method at a glance

Tweedie link (2nd order). The Tweedie identities connect the score to the denoiser mean and covariance; the covariance involves the Hessian of \(\log p(x_t)\). Instead of computing Hessians, FH approximates the covariance via:

  1. Time update. Transfer \(\Sigma_{0\mid t}(x_t)\) → \(\Sigma_{0\mid t+\Delta t}(x_t)\) analytically using a local Gaussian approximation of \(p(x_t)\) and the forward SDE evolution.
  2. Space update. When the sampler moves from \(x_t\) to \(x_t+\Delta x\) at the same time \(t\), use a BFGS-like low-rank correction based on finite differences of the denoiser mean \(\mu_{0\mid t}\).
  3. Efficient representation. Maintain \(\Sigma\) as D + U UT − V VT so both updates and inverses stay cheap (Woodbury on small \(k \times k\) systems).
  4. Initialization. For images, start from DCT-diagonal data covariance; it’s a strong prior and avoids early over/under-scaling.
Posterior geometry and effect of covariance choice

Geometry. Poor (diagonal) covariances distort guidance geometry; FH aligns the local posterior shape with the sampler’s trajectory.

Why it helps (guidance scale)

Diagonal/identity covariances can over-amplify the conditional term, especially at high noise levels and in high dimensions—forcing post-hoc clipping or ad-hoc scaling. With FH, the guidance magnitude is naturally calibrated, reducing the need for such tricks.

LPIPS vs guidance strength; FH needs little or no scaling

Less tuning, more fidelity. With better covariance, the optimal guidance is close to 1 (no scaling).

Results snapshot

Across four linear inverse problems on ImageNet 256×256—Gaussian/motion deblurring, random inpainting, and 4× super-resolution—FH (and FH+Online) consistently improves perceptual quality at low step counts (e.g., 15/30-step Heun), with strong LPIPS and crisp details. See the paper for full tables, ablations, and more qualitative results.

Qualitative Results from the Paper

FH improves reconstruction guidance quality over competing covariance approximation methods.

Get the code