AI Toolbox | AI Art Weekly

Customizing Motion in Text-to-Video Diffusion Models

On the other hand, Customizing Motion can learn and generalize input motion patterns from input videos and apply them to new and unseen contexts.

09.12.24 · Project Page · Code · Text-to-Video

MEMO

MEMO can generate talking videos from images and audio. It keeps the person’s identity consistent and matches lip movements to the audio, producing natural expressions.

06.12.24 · Project Page · Code · Audio-to-Video · Talking Head Generation

MV-Adapter

MV-Adapter can generate images from multiple views while keeping them consistent across views. It enhances text-to-image models like Stable Diffusion XL, supporting both text and image inputs, and achieves high-resolution outputs at 768x768.

06.12.24 · Project Page · Code · Text-to-Image · Image-to-Image

Context-Aware Video Instance Segmentation

CAVIS can do instance segmentation on videos. It’s able to better track objects and improve instance matching accuracy, resulting in more accurate and stable instance segmentation.

05.12.24 · Project Page · Code · Video Object Detection · Video Object Tracking

VideoRepair

VideoRepair can improve text-to-video generation by finding and fixing small mismatches between text prompts and videos.

05.12.24 · Project Page · Code · Text-to-Video

Trellis 3D

Trellis 3D generates high-quality 3D assets in formats like Radiance Fields, 3D Gaussians, and meshes. It supports text and image conditioning, offering flexible output format selection and local 3D editing capabilities.

04.12.24 · Project Page · Code · Demo · 3D Object Generation · Text-to-3D · Image-to-3D · 3D Editing

Anagram-MTL

Anagram-MTL can generate visual anagrams that change appearance with transformations like flipping or rotating.

04.12.24 · Code · Text-to-Image

Dessie

Dessie can estimate the 3D shape and pose of horses from single images. It also works with other large animals like zebras and cows.

03.12.24 · Project Page · Code · 3D Object Generation · 3D Object Detection

Negative Token Merging

Negative Token Merging can improve image diversity by pushing apart similar features during the reverse diffusion process. It reduces visual similarity with copyrighted content by 34.57% and works well with Stable Diffusion as well as Flux.

02.12.24 · Project Page · Code · Text-to-Image · Controllable Image Generation

FlowEdit

FlowEdit can edit images using only text prompts with Flux and Stable Diffusion 3.

02.12.24 · Project Page · Code · Image Editing · Text-to-Image

L4GM

L4GM is a 4D Large Reconstruction Model that can turn a single-view video into an animated 3D object.

02.12.24 · Project Page · Code · 3D Object Generation · 3D Scene Generation

Drivable 3D Gaussian Avatars

D3GA is the first 3D controllable model for human bodies rendered with Gaussian splats in real-time. This lets us turn ourselves or others with a multi-cam setup into a Gaussian splat which can be animated, even allowing to decompose the avatar into its different cloth layers.

30.11.24 · Project Page · Code · 3D Object Generation · 3D Editing

SOEDiff

You ever tried to inpaint smaller objects and details into an image? Can be kind of a hit or miss. SOEDiff has been specifically trained to handle these cases and can do a pretty good job at it.

30.11.24 · Project Page · Code · Image Inpainting · Image Editing

Material Anything

Material Anything can generate realistic materials for 3D objects, including those without textures. It adapts to different lighting and uses confidence masks to improve material quality, ensuring outputs are ready for UV mapping.

29.11.24 · Project Page · Code · 3D Texture Generation

Inverse Painting

Inverse Painting can generate time-lapse videos of the painting process from a target artwork. It uses a diffusion-based renderer to learn from real artists’ techniques, producing realistic results across different artistic styles.

28.11.24 · Project Page · Code · Image-to-Video

MegaFusion

MegaFusion can extend existing diffusion models for high-resolution image generation. It achieves images up to 2048x2048 with only 40% of the original computational cost by enhancing denoising processes across different resolutions.

28.11.24 · Project Page · Code · Text-to-Image

CAT4D

CAT4D can create dynamic 4D scenes from single videos. It uses a multi-view video diffusion model to generate videos from different angles, allowing for strong 4D reconstruction and high-quality images.

27.11.24 · Project Page · Code · Video Reconstruction · Video Editing · Video-to-4D

SuperMat

SuperMat can quickly break down images of materials into three important maps: albedo, metallic, and roughness. It does this in about 3 seconds while keeping high quality, making it efficient for 3D object material estimation.

27.11.24 · Project Page · Code · Image-to-Texture · 3D Texture Generation

SelfSplat

SelfSplat can create 3D models from multiple images without needing specific poses. It uses self-supervised methods for depth and pose estimation, resulting in high-quality appearance and geometry from real-world data.

27.11.24 · Project Page · Code · 3D Scene Generation

DreamMix

DreamMix is a inpainting method based on the Fooocus model that can add objects from reference images and change their features using text.

27.11.24 · Code · Image Inpainting · Image Editing