AI Toolbox | AI Art Weekly

StableMoFusion

StableMoFusion is a method for human motion generation that is able to eliminate foot-skating and create stable and efficient animations. The method is based on diffusion models and can be used for real-time scenarios such as virtual characters and humanoid robots.

09.05.24 · Project Page · Code · Motion Generation

SwapTalk

SwapTalk can transfer a user’s avatar’s facial features onto a video while lip-syncing to chosen audio. It improves video quality and lip-sync accuracy, making the results more consistent than other methods.

09.05.24 · Project Page · Code · Talking Head Generation

An Empty Room is All We Want

An Empty Room is All We Want can remove furniture from indoor panorama images even Jordan Peterson would be proud. Perfect to see how your or the apartment you’re looking at would look like without all the clutter.

06.05.24 · Project Page · Code · Image Inpainting · Image Restoration

DreamScene4D

DreamScene4D can generate dynamic 4D scenes from single videos. It tracks object motion and handles complex movements, allowing for accurate 2D point tracking by converting 3D paths to 2D.

03.05.24 · Project Page · Code · Video-to-3D · 3D Scene Generation · 3D Object Generation

Pair Customization

Pair Customization can customize text-to-image models by learning style differences from a single image pair. It separates style and content into different weight spaces, allowing for effective style application without overfitting to specific images.

02.05.24 · Project Page · Code · Image Style Transfer · Image Editing

StoryDiffusion

StoryDiffusion can generate long-range images and videos that are able to maintain consistent content across a series of generated frames. The method is able to convert a text-based story into a video with smooth transitions and consistent subjects.

02.05.24 · Project Page · Code · Text-to-Video · Video Editing

X-Oscar

X-Oscar can generate high-quality 3D avatars from text prompts. It uses a step-by-step process for geometry, texture, and animation, while addressing issues like low quality and oversaturation through advanced techniques.

02.05.24 · Project Page · Code · Text-to-3D · 3D Object Generation

Invisible Stitch

Invisible Stitch can inpaint missing depth information in a 3D scene, resulting in improved geometric coherence and smoother transitions between frames.

30.04.24 · Project Page · Code · 3D Scene Generation · 3D Object Generation

VimTS

VimTS can extract text from images and videos, improving how well it works across different types of media.

30.04.24 · Project Page · Code · Video Object Detection · Image-to-Video

DGE

DGE is a Gaussian Splatting method that can be used to edit 3D objects and scenes based on text prompts.

29.04.24 · Project Page · Code · 3D Editing

Anywhere

Anywhere can place any object from an input image into any suitable and diverse location in an output image. Perfect for product placement.

29.04.24 · Project Page · Code · Image Inpainting Image Editing

Make-it-Real

Make-it-Real can recognize and describe materials using GPT-4V, helping to build a detailed material library. It aligns materials with 3D object parts and creates SVBRDF materials from albedo maps, improving the realism of 3D assets.

25.04.24 · Project Page · Code · 3D Editing · 3D Texture Generation

ConsistentID

ConsistentID can generate diverse personalized ID images from text prompts using just one reference image. It improves identity preservation with a facial prompt generator and an ID-preservation network, ensuring high quality and variety in the generated images.

25.04.24 · Project Page · Code · Personalized Image Generation · Image Restoration

TokenHMR

And on the pose reconstruction front we have had TokenHMR, which can extract human poses and shapes from a single image.

25.04.24 · Project Page · Code · 3D Object Generation

Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model

SVA can generate sound effects and background music for videos based on a single key frame and a text prompt.

25.04.24 · Project Page · Code · Text-to-SFX

MaGGIe

MaGGIe can efficiently predict high-quality human instance mattes from coarse binary masks for both image and video input. The method is able to output all instance mattes simultaneously without exploding memory and latency, making it suitable for real-time applications.

24.04.24 · Project Page · Code · Image Segmentation · Image Restoration

PuLID

Similar to ConsistentID, PuLID is a tuning-free ID customization method for text-to-image generation. This one can also be used to edit images generated by diffusion models by adding or changing the text prompt.

24.04.24 · Code · Text-to-Image · Image Editing

CharacterFactory

CharacterFactory can generate endless characters that look the same across different images and videos. It uses GANs and word embeddings from celebrity names to ensure characters stay consistent, making it easy to integrate with other models.

24.04.24 · Project Page · Code · Image-to-Image · Image Editing · Personalized Image Generation

From Parts to Whole

Parts2Whole can generate customized human portraits from multiple reference images, including pose images and various aspects of human appearance. The method is able to generate human images conditioned on selected parts from different humans as control conditions, allowing you to create images with specific combinations of facial features, hair, clothes, etc.

23.04.24 · Project Page · Code · Personalized Image Generation · Controllable Image Generation

PhysDreamer

PhysDreamer is a physics-based approach that enables you to poke, push, pull and throw objects in a virtual 3D environment and they will react in a physically plausible manner.

19.04.24 · Project Page · Code · 3D Object Generation · Motion Generation