Denoise (DDIM step)
- scheduler.config.prediction_type == "v_prediction” (bake_texture, modified pipeline)
[Consistency] left: modified, right: ctrl-adapter
[Failed] scheduler.config.prediction_type == "epsilon”
Denoise Code
Denoise step (rewrite)
First frame guidance
White background
(In SyncMVD, they use random background to shuffle pixels as latents input)
[Debug] Video results w/o bake_texture (single view 32 frames)
(a) modified pipeline, (b) original pipeline, (c) generated background, (d) composited first frame
[Debug] Video results w/o bake_texture (8 views 8 frames, modified pipeline, target_fps=4, guidance_scale=9.0)
More Results
- First line, Ctrl-Adapter
- Second line, Ctrl-Adapter w/ noise initialized by uv projection.
Code
Path:
./VideoMVD/i2vgen_xl/pipelines/i2vgen_xl_controlnet_adapter_pipeline_latent.pyDebug:
inference_latent.pyContent:
noise init w/ uv projection(self.uvp.prepare_latents): outputs/2024-07-16_05-57-40→2024-07-16_23-35-18
MeshRasterizer.raster_settings.RasterizationSettings.faces_per_pixel: outputs/2024-07-17_09-07-33
self.uvp.load_anim(): DELETED
success_test_case: outputs/2024-07-17_09-25-33→2024-07-17_10-50-18: monster*2,mech_man*2
Integration to our pipeline(multiview video): data/monster/MVD_15Jul2024-175650→
Results
- Generate the first frame by SDXL. (left)
- Generate the short video (right) based on depth map sequence (mid), w/ I2V-GenXL.