AI Toolbox | AI Art Weekly

Matryoshka Diffusion Models

[Matryoshka Diffusion Models] can generate high-quality images and videos using a NestedUNet architecture that denoises inputs at different resolutions. This method allows for strong performance at resolutions up to 1024x1024 pixels and supports effective training without needing specific examples.

14.08.24 · Project Page · Code · Text-to-Video · Text-to-Image

DiffComplete

DiffComplete can complete 3D shapes from incomplete scans using a diffusion-based method.

13.08.24 · Project Page · Code · 3D Object Generation

Puppet-Master

Puppet-Master can create realistic motion in videos from a single image using simple drag controls. It uses a fine-tuned video diffusion model and all-to-first attention method to make high-quality videos.

09.08.24 · Project Page · Code · Image-to-Video

Generative Camera Dolly

Generative Camera Dolly can regenerate a video from any chosen perspective. Still very early, but imagine being able to change any shot or angle in a video after it’s been recorded!

07.08.24 · Project Page · Code · Video-to-Video

Fast Sprite Decomposition from Animated Graphics

Sprite-Decompose can break down animated graphics into sprites using videos and box outlines.

07.08.24 · Project Page · Code · Image Segmentation

MILS

MILS can generate captions for images, videos, and audio without any training. It achieves top performance in zero-shot captioning and improves text-to-image generation, allowing for creative uses across different media types.

06.08.24 · Code · Image Captioning · Video Captioning · Audio Captioning · Text-to-Image · Image-to-Image

IPAdapter-Instruct

IPAdapter-Instruct can efficiently combine natural-image conditioning with “Instruct” prompts! It enables users to switch between various interpretations of the same image, such as style transfer and object extraction.

06.08.24 · Project Page · Code · Image Style Transfer · Image Editing

MeshAvatar

MeshAvatar can generate high-quality triangular human avatars from multi-view videos. The avatars can be edited, manipulated, and relit.

06.08.24 · Project Page · Code · 3D Avatar Generation · Video-to-3D

MeshAnything V2

MeshAnything V2 can generate 3D meshes from point clouds, meshes, images, text and more.

06.08.24 · Project Page · Code · 3D Mesh Generation

Lumina-mGPT

Lumina-mGPT can create photorealistic images from text and handle different visual and language tasks! It uses a special transformer model, making it possible to control image generation, do segmentation, estimate depth, and answer visual questions in multiple steps.

06.08.24 · Code · Demo · Text-to-Image

Feature Splatting

And talking about Splats, Feature Splatting can manipulate both the appearance and the physical properties of objects in a 3D scene using text prompts.

05.08.24 · Project Page · Code · Text-to-3D · 3D Scene Generation

VAR-CLIP

VAR-CLIP creates detailed fantasy images that match text descriptions closely by combining Visual Auto-Regressive techniques with CLIP! It uses text embeddings to guide image creation, ensuring strong results by training on a large image-text dataset.

05.08.24 · Code · Text-to-Image

CityGaussian

CityGaussian can render large-scale 3D scenes in real-time using a divide-and-conquer training approach and Level-of-Detail strategy. It achieves high-quality rendering at an average speed of 36 FPS on an A100 GPU.

04.08.24 · Project Page · Code · 3D Scene Generation · 3D Object Detection

Perm

Perm can generate and manipulate 3D hairstyles. It enables applications such as 3D hair parameterization, hairstyle interpolation, single-view hair reconstruction, and hair-conditioned image generation.

03.08.24 · Project Page · Code · 3D Hair Generation

SV4D 2.0

SV4D 2.0 can generate high-quality 4D models and videos from a reference video.

03.08.24 · Project Page · Code · Video-to-4D

Smoothed Energy Guidance

SEG improves image generation for SDXL by smoothing the self-attention energy landscape! This boosts quality without needing guidance scale, using a query blurring method that adjusts attention weights, leading to better results with fewer drawbacks.

02.08.24 · Code · Controllable Image Generation

SMooDi

SMooDi can generate stylized motion from text prompts and style motion sequences.

01.08.24 · Project Page · Code · Text-to-Motion

Interactive3D

Interactive3D can generate high-quality 3D objects that users can easily modify. It allows for adding and removing parts, dragging objects, and changing shapes.

01.08.24 · Project Page · Code · 3D Object Generation

XHand

XHand can generate high-fidelity hand shapes and textures in real-time, enabling expressive hand avatars for virtual environments.

31.07.24 · Project Page · Code · 3D Mesh Generation · Video-to-3D

DreamMover

DreamMover can generate high-quality intermediate images and short videos from image pairs with large motion. It uses a flow estimator based on diffusion models to keep details and ensure consistency between frames and input images.

31.07.24 · Project Page · Code · Image-to-Image