AI Toolbox | AI Art Weekly

Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors

ST-AVSR can enhance video resolution at any size while keeping details clear and smooth. It uses a pre-trained VGG network to improve quality and speed, making it better than other methods.

13.07.24 · Code · Video Restoration · Video Upscaling

Live2Diff

Live2Diff can translate live video streams using a special attention method in video diffusion models. It maintains smooth motion by linking each frame to previous ones and can achieve 16 frames per second on an RTX 4090 GPU, making it great for real-time use.

11.07.24 · Project Page · Code · Video Summarization

WildGaussians

WildGaussians is a new 3D Gaussian Splatting method that can handle occlusions and appearance changes. The method is able to achieve real-time rendering speeds and is able to handle in-the-wild data better than other methods.

11.07.24 · Project Page · Code · 3D Scene Generation · 3D Object Detection

Stable Audio Open

Stable Audio Open can generate up to 47 seconds of stereo audio at 44.1kHz from text prompts. It uses a transformer-based diffusion model for high-quality sound, making it useful for artists and researchers.

10.07.24 · Project Page · Code · Weights · Text-to-Audio

ColorPeel

ColorPeel can generate objects in images with specific colors and shapes.

09.07.24 · Project Page · Code · Text-to-Image

HumanRefiner

HumanRefiner can improve human hand and limb quality in images! The method is able to detect and correct issues related to both abnormal human poses.

09.07.24 · Code · Image Restoration · Image Editing

Tailor3D

Tailor3D can create customized 3D assets from text or single and dual-side images. The method also supports adding changes to the inputs through additional text prompts.

08.07.24 · Project Page · Code · Demo · Image-to-3D · Text-to-3D

GeneFace

GeneFace can generate high-quality 3D talking face videos from any speech audio. It solves the head-torso separation problem and provides better lip synchronization and image quality than earlier methods.

08.07.24 · Project Page · Code · Audio-to-3D · 3D Object Generation

Minutes to Seconds

Minutes to Seconds can efficiently fill in missing parts of images using a Denoising Diffusion Probabilistic Model (DDPM) that is about 60 times faster than other methods. It uses a Light-Weight Diffusion Model and smart sampling techniques to keep the image quality high.

08.07.24 · Code · Image Inpainting

PartCraft

PartCraft can generate customized and photorealistic virtual creatures by mixing visual parts from existing images. This tool allows users to create unique hybrids and make detailed changes, which is useful for digital asset creation and studying biodiversity.

05.07.24 · Code · Image Generation · Image Editing · Controllable Image Generation

LivePortrait

LivePortrait can animate a single source image with motion from a driving video. The method is able to generate high-quality videos at 60fps and is able to retarget the motion to other characters.

03.07.24 · Project Page · Code · Image-to-Video

PicoAudio

PicoAudio is a temporal controlled audio generation framework. The model is able to generate audio with precise timestamp and occurrence frequency control.

03.07.24 · Project Page · Code · Text-to-Audio · Audio Editing

PartGLEE

PartGLEE can locate and identify objects and their parts in images. The method uses a unified framework that enables detection, segmentation, and grounding at any granularity.

02.07.24 · Project Page · Code · Image Segmentation · Image Object Detection · Image Classification

MIGC++

MIGC++ is a plug-and-play controller that enables Stable Diffusion with precise position control while ensuring the correctness of various attributes like color, shape, material, texture, and style. It can also control the number of instances and improve interaction between instances.

02.07.24 · Project Page · Code · Demo · Image Editing · Image Segmentation

AniPortrait

AniPortrait can generate high-quality portrait animations driven by audio and a reference portrait image. It also supports face reenactment from a reference video.

02.07.24 · Code · Audio-to-Video

DiffIR2VR-Zero

DiffIR2VR-Zero is a zero-shot video restoration method that can be used with any 2D image restoration diffusion model. The method is able to do 8x super-resolution and high-standard deviation video denoising.

01.07.24 · Project Page · Code · Demo · Video Restoration · Video Upscaling

E.T. the Exceptional Trajectories

DIRECTOR can generate complex camera trajectories from text that describe the relation and synchronization between the camera and characters.

01.07.24 · Project Page · Code · Demo · Text-to-Motion · Text-to-3D

FoleyCrafter

FoleyCrafter can generate high-quality sound effects for videos! Results aim to be semantically relevant and temporally synchronized with a video. It also supports text prompts to better control the video-to-audio generation.

01.07.24 · Project Page · Code · Demo · Text-to-Audio

Motion Prompting

Motion Prompting can control video generation using motion paths. It allows for camera control, motion transfer, and drag-based image editing, producing realistic movements and physics.

01.07.24 · Project Page · Controllable Video Generation · Image Editing · Image-to-Video

StyleShot

StyleShot can mimic and style transfer various styles from an image, such as 3D, flat, abstract or even fine-grained styles, without tuning.

01.07.24 · Project Page · Code · Demo · Image Style Transfer · Image Editing