AI Toolbox | AI Art Weekly

MimicMotion

MimicMotion can generate high-quality videos of arbitrary length mimicking specific motion guidance. The method is able to produce videos of up to 10,000 frames with acceptable resource consumption.

28.06.24 · Project Page · Code · Video Editing

AnyControl

AnyControl is a new text-to-image guidance method that can generate images from diverse control signals, such as color, shape, texture, and layout.

27.06.24 · Project Page · Code · Text-to-Image

Text-Animator

Text-Animator can depict the structures of visual text in generated videos. It supports camera control and text refinement to improve the stability of the generated visual text.

25.06.24 · Project Page · Code · Text-to-Video

Fast and Uncertainty-Aware SVBRDF Recovery from Multi-View Capture using Frequency Domain Analysis

BRDF-Uncertainty can estimate the properties of the materials on an object’s surface in seconds given its geometry and a lighting environment.

25.06.24 · Project Page · Code · 3D Object Generation

MotionBooth

MotionBooth can generate videos of customized subjects from a few images and a text prompt with precise control over both object and camera movements.

25.06.24 · Project Page · Code · Text-to-Video · Video Editing

Director3D

Director3D can generate real-world 3D scenes and adaptive camera trajectories from text prompts. The method is able to generate pixel-aligned 3D Gaussians as an immediate 3D scene representation for consistent denoising.

25.06.24 · Project Page · Code · Text-to-3D

Disentangled Motion Modeling for Video Frame Interpolation

MoMo is a new video frame interpolation method that is able to generate intermediate frames with high visual quality and reduced computational demands.

25.06.24 · Code · Video Inpainting

FreeTraj

FreeTraj is a tuning-free approach that enables trajectory control in video diffusion models by modifying noise sampling and attention mechanisms.

24.06.24 · Project Page · Code · Video Analysis

Portrait3D

Portrait3D can generate high-quality 3D heads with accurate geometry and texture from a single in-the-wild portrait image.

24.06.24 · Project Page · Code · Image-to-3D · 3D Object Generation

MIRReS

MIRReS can reconstruct and optimize the explicit geometry, material, and lighting of objects from multi-view images. The resulting 3D models can be edited and relit in modern graphics engines or CAD software.

24.06.24 · Project Page · Code · 3D Object Generation · 3D Scene Generation · 3D Editing

LiveScene

LiveScene can identify and control multiple objects in complex scenes. It is able to locate individual objects in different states and enables control of them using natural language.

23.06.24 · Project Page · Code · 3D Scene Generation

MVOC

MVOC is a training-free multiple video object composition method with diffusion models. The method can be used to composite multiple video objects into a single video while maintaining motion and identity consistency.

22.06.24 · Project Page · Code · Image-to-Video · Video Editing

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

Conditional Image Leakage can be used to generate videos with more dynamic and natural motion from image prompts.

22.06.24 · Project Page · Code · Image-to-Video

Image Conductor

Image Conductor can generate video assets from a single image with precise control over camera transitions and object movements.

21.06.24 · Project Page · Code · Image-to-Video

Mora

Mora can enable generalist video generation through a multi-agent framework. It supports text-to-video generation, video editing, and digital world simulation, achieving performance similar to the Sora model.

21.06.24 · Code · Text-to-Video · Image-to-Video · Video Editing

Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps

iCD can be used for zero-shot text-guided image editing with diffusion models. The method is able to encode real images into their latent space in only 3-4 inference steps and can then be used to edit the image with a text prompt.

20.06.24 · Project Page · Code · Text-to-Image · Image Editing

EvTexture

EvTexture is a video super-resolution upscaling method that utilizes event signals for texture enhancement for more accurate texture and high-resolution detail recovery.

19.06.24 · Project Page · Code · Video Restoration · Video Upscaling

Make It Count

Make It Count can generate images with the exact number of objects specified in the prompt while keeping a natural layout. It uses the diffusion model to accurately count and separate objects during the image creation process.

14.06.24 · Project Page · Code · Text-to-Image

Glyph-ByT5-v2

Glyph-ByT5-v2 is a new SDXL model that can generate high-quality visual layouts with text in 10 different languages.

14.06.24 · Project Page · Code · Image-to-Text · Image Editing

MeshAnything

MeshAnything can convert 3D assets in any 3D representation into meshes. This can be used to enhance various 3D asset production methods and significantly improve storage, rendering, and simulation efficiencies.

14.06.24 · Project Page · Code · 3D Object Generation · 3D Mesh Generation