AI Toolbox | AI Art Weekly

GradeADreamer

GradeADreamer is yet another text-to-3D method. This one is capable of producing high-quality assets with a total generation time of under 30 minutes using only a single RTX 3090 GPU.

14.06.24 · Code · Text-to-3D

HairFastGAN

HairFastGAN can transfer hairstyles from one image to another in near real-time. It handles different poses and colors well, achieving high quality in under a second on an Nvidia V100.

11.06.24 · Code · Image-to-Image · Image Editing

MM-Diffusion

MM-Diffusion can generate high-quality audio-video pairs using a multi-modal diffusion model with two coupled denoising autoencoders.

05.06.24 · Code · Audio-to-Video · Video-to-Audio

Improved Distribution Matching Distillation for Fast Image Synthesis

DMD2 is a new improved distillation method that can turn diffusion models into efficient one-step image generators.

23.05.24 · Project Page · Code · Controllable Image Generation · Personalized Image Generation

EditWorld

EditWorld can simulate world dynamics and edit images based on instructions that are grounded in various world scenarios. The method is able to add, replace, delete, and move objects in images, as well as change their attributes and perform other operations.

23.05.24 · Code · Image Editing

RectifID

RectifID is yet another personalization method from user-provided reference images of human faces, live subjects, and certain objects for diffusion models.

23.05.24 · Code · Personalized Image Generation · Image Editing

MagicPose4D

MagicPose4D can generate 3D objects from text or images and transfer precise motions and trajectories from objects and characters in a video or mesh sequence.

22.05.24 · Project Page · Code · 3D Object Generation · Text-to-Motion

ReVideo

ReVideo can change video content in specific areas while keeping the motion intact. It allows users to customize motion paths and uses a three-stage training method for precise video editing.

22.05.24 · Project Page · Code · Video Editing

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Face Adapter is a new face swapping method that can generate facial detail and handle face shape changes with fine-grained control over attributes like identity, pose, and expression.

21.05.24 · Project Page · Code · Image Editing

RemoCap

RemoCap can reconstruct 3D human bodies from motion sequences. It’s able to capture occluded body parts with greater fidelity, resulting in less model penetration and distorted motion.

21.05.24 · Project Page · Code · 3D Object Generation

NOVA-3D

NOVA-3D can generate 3D anime characters from non-overlapped front and back views.

21.05.24 · Project Page · Code · 3D Object Generation

Images that Sound

Images that Sound can generate spectrograms that look like natural images and produce matching audio when played. It uses pre-trained diffusion models to create these spectrograms based on specific audio and visual prompts.

20.05.24 · Project Page · Code · Image-to-Audio

Slicedit

Slicedit can edit videos with a simple text prompt that retains the structure and motion of the original video while adhering to the target text.

20.05.24 · Project Page · Code · Text-to-Video · Video Editing

ViViD

ViViD can transfer a clothing item onto the video of a target person. The method is able to capture garment details and human posture, resulting in more coherent and lifelike videos.

20.05.24 · Project Page · Code · Virtual Video Try-On

FIFO-Diffusion

FIFO-Diffusion can generate infinitely long videos from text without extra training. It uses a unique method that keeps memory use constant, no matter the video length, and works well on multiple GPUs.

19.05.24 · Project Page · Code · Text-to-Video

Flexible Motion In-betweening with Diffusion Models

CondMDI can generate precise and diverse motions that conform to flexible user-specified spatial constraints and text descriptions. This enables the creation of high-quality animations from just text prompts and inpainting between keyframes.

17.05.24 · Project Page · Code · Text-to-Motion · Motion Generation

SignLLM

SignLLM is the first multilingual Sign Language Production (SLP) model. It can generate sign language gestures from input text or prompts and achieve state-of-the-art performance on SLP tasks across eight sign languages.

17.05.24 · Project Page · Code · Text-to-Video

Toon3D

Toon3D can generate 3D scenes from two or more cartoon drawings. It’s far from perfect, but still pretty cool!

16.05.24 · Project Page · Code · Image-to-3D · 3D Scene Generation

Analogist

Analogist can enhance images by colorizing, deblurring, denoising, improving low-light quality, and transferring styles using a text-to-image diffusion model. It uses both visual and text prompts without needing extra training, making it a flexible tool for learning with few examples.

16.05.24 · Project Page · Code · Image Inpainting · Image-to-Text

Dual3D

Dual3D is yet another text-to-3D method that can generate high-quality 3D assets from text prompts in only 1 minute.

16.05.24 · Project Page · Code · Text-to-3D