AI Toolbox | AI Art Weekly

Garment3DGen

Garment3DGen can stylize the geometry and textures from 2D image and 3D mesh garments! These can be fitted on top of parametric bodies and simulated. Could be used for hand-garment interaction in VR or to turn sketches into 3D garments.

27.03.24 · Project Page · Code · Text-to-3D · 3D Object Generation

MonoHair

MonoHair can create high-quality 3D hair from a single video. It uses a two-step process for detailed hair reconstruction and achieves top performance across various hairstyles.

27.03.24 · Project Page · Code · 3D Mesh Generation · Video-to-3D · 3D Hair Generation

Inclusion Matching

Learning Inclusion Matching for Animation Paint Bucket Colorization can colorize line art in animations by allowing artists to colorize just one frame. The algorithm then automatically applies the color to the rest of the frames, using a learning-based inclusion matching pipeline for more accurate results.

27.03.24 · Project Page · Code · Image Colorization

AiOS

AiOS can estimate human poses and shapes in one step, combining body, hand, and facial expression recovery.

26.03.24 · Project Page · Code · 3D Mesh Generation · Motion Capture · 3D Avatar Generation

AID

PAID is a method that enables smooth high consistency image interpolation for diffusion models. GANs have been the king in that field so far, but this method shows promising results for diffusion models.

26.03.24 · Project Page · Code · Text-to-Image

TC4D

TC4D can animate 3D scenes generated from text along arbitrary trajectories. I can see this being useful for generating 3D effects for movies or games.

26.03.24 · Project Page · Code · Text-to-4D

TRAM

TRAM can reconstruct human motion and camera movement from videos in dynamic settings. It reduces global motion errors by 60% and uses a video transformer model to accurately track body motion.

26.03.24 · Project Page · Code · Video Analysis

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

Attribute Control enables fine-grained control over attributes of specific subjects in text-to-image models. This lets you modify attributes like age, width, makeup, smile and more for each subject independently.

25.03.24 · Project Page · Code · Text-to-Image

FlashFace

FlashFace can personalize photos by using one or a few reference face images and a text prompt. It keeps important details like scars and tattoos while balancing text and image guidance, making it useful for face swapping and turning virtual characters into real people.

25.03.24 · Project Page · Code · Personalized Image Generation · Image Editing

TRIP

TRIP is a new approach to image-to-video generation with better temporal coherence.

25.03.24 · Project Page · Code · Image-to-Video

Make-It-Vivid

Make-It-Vivid generates high-quality texture maps for 3D biped cartoon characters from text instructions, making it possible to dress and animate characters based on prompts.

25.03.24 · Project Page · Code · Text-to-Texture 3D Object Generation

ThemeStation

ThemeStation can generate a variety of 3D assets that match a specific theme from just a few examples. It uses a two-stage process to improve the quality and diversity of the models, allowing users to create 3D assets based on their own text prompts.

22.03.24 · Project Page · Code · 3D Object Generation · Controllable 3D Generation

Spectral Motion Alignment for Video Motion Transfer using Diffusion Models

Spectral Motion Alignment is a framework that can capture complex and long-range motion patterns within videos and transfer them to video-to-video frameworks like MotionDirector, VMC, Tune-A-Video, and ControlVideo.

22.03.24 · Project Page · Code · Video Analysis

StreamingT2V

StreamingT2V enables long text-to-video generations featuring rich motion dynamics without any stagnation. It ensures temporal consistency throughout the video, aligns closely with the descriptive text, and maintains high frame-level image quality. Videos can be up to 1200 frames, spanning 2 minutes, and can be extended for even longer durations.

21.03.24 · Project Page · Code · Text-to-Video

ReNoise

ReNoise can be used to reconstruct an input image that can be edited using text prompts.

21.03.24 · Project Page · Code · Image Editing

AnyV2V

AnyV2V can edit videos using prompt-based editing and style transfer without fine-tuning. It modifies the first frame of a video and generates the edited video while keeping high visual quality.

21.03.24 · Project Page · Code · Video Editing

FouriScale

FouriScale can generate high-resolution images from pre-trained diffusion models with various aspect ratios and achieve an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.

19.03.24 · Code · Text-to-Image · Image Restoration · Image Upscaling

FRESCO

FRESCO combines ControlNet with Ebsynth for zero-shot video translation that focuses on preserving the spatial and temporal consistency of the input frames.

19.03.24 · Project Page · Code · Video Editing

You Only Sample Once

You Only Sample Once can quickly create high-quality images from text in one step. It combines diffusion processes with GANs, allows fine-tuning of pre-trained models, and works well at higher resolutions without extra training.

19.03.24 · Code · Text-to-Image

TexDreamer

TexDreamer can generate high-quality 3D human textures from text and images. It uses a smart fine-tuning method and a unique translator module to create realistic textures quickly while keeping important details intact.

19.03.24 · Project Page · Code · Text-to-3D · Image-to-3D · 3D Object Generation