CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos

Tejas Panambur^🎓,★,† Ishan Rajendrakumar Dave^,★

Chongjian Ge Ersin Yumer Xue Bai

^🎓University of Massachusetts, Amherst Adobe

★ Equal contribution ^† Work done as an intern at Adobe, USA.

Paper Video (Coming soon)

1 / 3

Input

CreativeVR (Ours)

Abstract

Modern text-to-video (T2V) diffusion models can synthesize visually compelling clips, yet they remain brittle at fine-scale structure: even state-of-the-art generators often produce distorted faces and hands, warped backgrounds, and temporally inconsistent motion. Such severe structural artifacts also appear in very low-quality real-world videos. Classical video restoration and super-resolution (VR/VSR) methods, in contrast, are tuned for synthetic degradations such as blur and downsampling and tend to stabilize these artifacts rather than repair them, while diffusion-prior restorers are usually trained on photometric noise and offer little control over the trade-off between perceptual quality and fidelity.

We introduce CreativeVR, a diffusion-prior-guided video restoration framework for AI-generated (AIGC) and real videos with severe structural and temporal artifacts. Our deep-adapter-based method exposes a single precision knob that controls how strongly the model follows the input, smoothly trading off between precise restoration on standard degradations and stronger structure- and motion-corrective behavior on challenging content. Our key novelty is a temporally coherent degradation module used during training, which applies carefully designed transformations that produce realistic structural failures.

To evaluate AIGC-artifact restoration, we propose the AIGC54 benchmark with FIQA, semantic and perceptual metrics, and multi-aspect scoring. CreativeVR achieves state-of-the-art results on videos with severe artifacts and performs competitively on standard video restoration benchmarks, while running at practical throughput (~13 FPS @ 720p on a single 80 GB A100).

Figure 1. Precise Restoration vs Structural-corrective Video Restoration.

Traditional Video Restoration represents the precision regime: recover detail under synthetic degradations when geometry and motion are intact.
AIGC clips exemplify the prior-guided spatiotemporal correction regime, where the goal is to fix geometric errors and restore temporal coherence while preserving semantics and identity.
Real-world archival footage lies between these extremes and benefits from both precision and prior-guided correction.

Our method modulates this trade-off via a precision knob, behaving precise in case (a) and like a prior-guided corrector in cases (b) and (c).

Method

Figure 2. CreativeVR Overview.

A clean input clip is synthetically degraded via a temporally smooth module and encoded by a frozen VAE. The degraded latents condition a lightweight adapter DiT, which is injected into a frozen T2V backbone via a precision knob γ_ℓ.
The degradation module composes coherent morphing, directional motion blur, and grid-based warping to mimic realistic artifacts, guiding the adapter to learn robust structure- and motion-corrective behavior without temporal flicker.

Qualitative Results

1 / 5

Figure 3. Qualitative Comparison. We compare CreativeVR against a wide range of restoration competitors. Prior methods often fail to remove AIGC or real-world artifacts, leaving blurred facial profiles, smeared hands, and distorted signage. CreativeVR consistently restores clean, geometrically plausible faces, fingers, and fine details while faithfully preserving the original pose, identity, and scene layout.

Controllable Restoration

Input

Precision: 1.0

Precision Knob:

✨ Creative
Output

0.1 0.2 0.4 0.6 1.0

🎯 Faithful
Output

Precision: 1.0

Figure 4. Inference Precision Knob. The single precision control parameter allows a smooth trade-off between fidelity and structural correction. High precision settings strongly preserve input details for standard restoration, while lower precision enables stronger corrective synthesis to repair severe structural defects, providing flexibility across diverse degradation regimes.

Interactive Comparisons

Compare against:

1 / 4

SeedVR2

CreativeVR (Ours)

Video Source: AIGC

BibTeX

@misc{panambur2025creativevr,
  title        = {CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos},
  author       = {Tejas Panambur and Ishan Rajendrakumar Dave and Chongjian Ge and Ersin Yumer and Xue Bai},
  year         = {2025},
  eprint       = {2512.12060},
  archivePrefix= {arXiv},
  primaryClass = {cs.CV},
  url          = {https://arxiv.org/abs/2512.12060}
}