AI Toolbox | AI Art Weekly

Repositioning the Subject within Image

SEELE can move around objects within an image. It does so by removing it, inpainting occluded portions and harmonizing the appearance of the repositioned object with the surrounding areas.

30.01.24 · Project Page · Code · Image Inpainting · Image Editing

Motion-I2V

Motion-I2V can generate videos from images with clear and controlled motion. It uses a two-stage process with a motion field predictor and temporal attention, allowing for precise control over how things move and enabling video-to-video translation without needing extra training.

29.01.24 · Project Page · Code · Image-to-Video

StableIdentity

StableIdentity is a method that can generate diverse customized images in various contexts from a single input image. The cool thing about this method is, that it is able to combine the learned identity with ControlNet and even inject it into video (ModelScope) and 3D (LucidDreamer) generation.

29.01.24 · Project Page · Code · Personalized Image Generation · Image Editing

pix2gestalt

pix2gestalt is able to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.

25.01.24 · Project Page · Code · Image Segmentation · Image Restoration · Image Object Detection

GALA

GALA can turn a single-layer clothed 3D human mesh and decompose it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create new clothed human avatars with any pose.

23.01.24 · Project Page · Code · 3D Object Generation · 3D Editing · 3D Avatar Generation

Depth Anything

Depth Anything is a new monocular depth estimation method. The model is trained on 1.5M labeled images and 62M+ unlabeled images, which results in impressive generalization ability.

19.01.24 · Project Page · Code · Image-to-Depth

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Language-Driven Video Inpainting can guide the video inpainting process using natural language instructions, which removes the need for manual mask labeling.

18.01.24 · Project Page · Code · Video Inpainting

GARField

GARField can break down 3D scenes into meaningful groups. It improves the accuracy of object clustering and allows for better extraction of individual objects and their parts.

17.01.24 · Project Page · Code · 3D Object Generation · 3D Segmentation

VideoCrafter2

VideoCrafter2 can generate high-quality videos from text prompts. It uses low-quality video data and high-quality images to improve visual quality and motion, overcoming data limitations of earlier models.

17.01.24 · Project Page · Code · Demo · Text-to-Video

RoHM

RoHM can reconstruct complete, plausible 3D human motions from monocular videos with support for recognizing occluded joints! So, basically motion tracking on steroids but without the need for an expensive setup.

16.01.24 · Project Page · Code · 3D Object Generation · Motion Generation

Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

Motion tracking is one thing, generating motion from text another. STMC is a method that can generate 3D human motion from text with multi-track timeline control. This means that instead of a single text prompt, users can specify a timeline of multiple prompts with defined durations and overlaps to create more complex and precise animations.

16.01.24 · Project Page · Code · Text-to-3D

Real3D-Portrait

Real3D-Portrait is a one-shot 3D talking portrait generation method. This one is able to generate realistic videos with natural torso movement and switchable backgrounds.

16.01.24 · Project Page · Code · Audio-to-3D · 3D Object Generation

InstantID

InstantID is a ID embedding-based method that can be used to personalize images in various styles using just a single facial image, while ensuring high fidelity.

15.01.24 · Code · Personalized Image Generation · Image Editing · Image Style Transfer

FMA-Net

FMA-Net can turn blurry, low-quality videos into clear, high-quality ones by accurately predicting the degradation and restoration processes, considering the movement in the video through advanced learning of motion patterns.

08.01.24 · Project Page · Code · Video Restoration · Video Upscaling

MagicDriveDiT

MagicDriveDiT can generate high-resolution street scene videos for self-driving cars.

08.01.24 · Project Page · Code · Controllable Video Generation

From Audio to Photoreal Embodiment

Audio2Photoreal can generate full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction. Given speech audio, the model is able to output multiple possibilities of gestural motion for an individual, including face, body, and hands. The results are highly photorealistic avatars that can express crucial nuances in gestures such as sneers and smirks.

03.01.24 · Project Page · Code · Audio-to-3D · 3D Object Generation · 3D Motion Generation

Moonshot

MoonShot is a video generation model that can condition on both image and text inputs. The model is also able to integrate with pre-trained image ControlNet modules for geometry visual conditions, making it possible to generate videos with specific visual appearances and structures.

03.01.24 · Project Page · Code · Personalized Video Generation · Video Editing

SIGNeRF

SIGNeRF is a new approach for fast and controllable NeRF scene editing and scene-integrated object generation. The method is able to generate new objects into an existing NeRF scene or edit existing objects within the scene in a controllable manner by either proxy object placement or shape selection.

03.01.24 · Project Page · Code · 3D Object Generation · 3D Editing

En3D

En3D can generate high-quality 3D human avatars from 2D images without needing existing assets.

02.01.24 · Project Page · Code · 3D Avatar Generation · Image-to-3D · Text-to-3D

Auffusion

Auffusion is a Text-to-Audio system that is able to generate audio from natural language prompts. The model is able to control various aspects of the audio, such as acoustic environment, material, pitch, and temporal order. It can also generate audio based on labels or be combined with an LLM model to generate descriptive audio prompts.

02.01.24 · Project Page · Code · Text-to-Audio · Audio Inpainting