CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos

🎓University of Massachusetts, Amherst AdobeAdobe
★ Equal contribution Work done as an intern at Adobe, USA.
1 / 3
Input
CreativeVR (Ours)

Abstract

Modern text-to-video (T2V) diffusion models can synthesize visually compelling clips, yet they remain brittle at fine-scale structure: even state-of-the-art generators often produce distorted faces and hands, warped backgrounds, and temporally inconsistent motion. Such severe structural artifacts also appear in very low-quality real-world videos. Classical video restoration and super-resolution (VR/VSR) methods, in contrast, are tuned for synthetic degradations such as blur and downsampling and tend to stabilize these artifacts rather than repair them, while diffusion-prior restorers are usually trained on photometric noise and offer little control over the trade-off between perceptual quality and fidelity.

We introduce CreativeVR, a diffusion-prior-guided video restoration framework for AI-generated (AIGC) and real videos with severe structural and temporal artifacts. Our deep-adapter-based method exposes a single precision knob that controls how strongly the model follows the input, smoothly trading off between precise restoration on standard degradations and stronger structure- and motion-corrective behavior on challenging content. Our key novelty is a temporally coherent degradation module used during training, which applies carefully designed transformations that produce realistic structural failures.

To evaluate AIGC-artifact restoration, we propose the AIGC54 benchmark with FIQA, semantic and perceptual metrics, and multi-aspect scoring. CreativeVR achieves state-of-the-art results on videos with severe artifacts and performs competitively on standard video restoration benchmarks, while running at practical throughput (~13 FPS @ 720p on a single 80 GB A100).

Teaser Figure
Figure 1. Precise Restoration vs Structural-corrective Video Restoration.
  • Traditional Video Restoration represents the precision regime: recover detail under synthetic degradations when geometry and motion are intact.
  • AIGC clips exemplify the prior-guided spatiotemporal correction regime, where the goal is to fix geometric errors and restore temporal coherence while preserving semantics and identity.
  • Real-world archival footage lies between these extremes and benefits from both precision and prior-guided correction.
Our method modulates this trade-off via a precision knob, behaving precise in case (a) and like a prior-guided corrector in cases (b) and (c).

Method

Method Overview
Figure 2. CreativeVR Overview.
  • A clean input clip is synthetically degraded via a temporally smooth module and encoded by a frozen VAE. The degraded latents condition a lightweight adapter DiT, which is injected into a frozen T2V backbone via a precision knob γ.
  • The degradation module composes coherent morphing, directional motion blur, and grid-based warping to mimic realistic artifacts, guiding the adapter to learn robust structure- and motion-corrective behavior without temporal flicker.

Qualitative Results

Input Frame Method Comparisons
1 / 5
Figure 3. Qualitative Comparison. We compare CreativeVR against a wide range of restoration competitors. Prior methods often fail to remove AIGC or real-world artifacts, leaving blurred facial profiles, smeared hands, and distorted signage. CreativeVR consistently restores clean, geometrically plausible faces, fingers, and fine details while faithfully preserving the original pose, identity, and scene layout.

Controllable Restoration

Input
Input
Output
Precision: 1.0
Precision Knob:
Creative
Output
0.1 0.2 0.4 0.6 1.0
🎯 Faithful
Output
Precision: 1.0
1 / 5
Figure 4. Inference Precision Knob. The single precision control parameter allows a smooth trade-off between fidelity and structural correction. High precision settings strongly preserve input details for standard restoration, while lower precision enables stronger corrective synthesis to repair severe structural defects, providing flexibility across diverse degradation regimes.

Interactive Comparisons

1 / 4
SeedVR2
CreativeVR (Ours)
Video Source: AIGC

BibTeX

@misc{panambur2025creativevr,
  title        = {CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos},
  author       = {Tejas Panambur and Ishan Rajendrakumar Dave and Chongjian Ge and Ersin Yumer and Xue Bai},
  year         = {2025},
  eprint       = {2512.12060},
  archivePrefix= {arXiv},
  primaryClass = {cs.CV},
  url          = {https://arxiv.org/abs/2512.12060}
}