AI Toolbox | AI Art Weekly

DAWN

DAWN can generate talking head videos from a single portrait and audio clip. It produces lip movements and head poses quickly, making it effective for creating long video sequences.

09.11.24 · Project Page · Code · Talking Head Generation

DimensionX

DimensionX can generate photorealistic 3D and 4D scenes from a single image using controllable video diffusion.

08.11.24 · Project Page · Code · Image-to-3D · Image-to-Video

SG-I2V

SG-I2V can control object and camera motion in image-to-video generation using bounding boxes and trajectories

08.11.24 · Project Page · Code · Image-to-Video · Controllable Video Generation

RayGauss

RayGauss can create realistic new views of 3D scenes, using Gaussian-based ray casting! It produces high-quality images quickly, running at 25 frames per second, and avoids common picture problems that older methods had.

06.11.24 · Project Page · Code · Image-to-Image · Image Restoration

CLoSD

CLoSD can control characters in physics-based simulations using text prompts. It can navigate to goals, strike objects, and switch between sitting and standing, all guided by simple instructions.

06.11.24 · Project Page · Code · Text-to-Motion · Motion Generation

Generalizable Implicit Motion Modeling for Video Frame Interpolation

GIMM is a new video interpolation method that uses motion modelling to predict motion between frames.

06.11.24 · Project Page · Code · Video Analysis

Regional-Prompting-FLUX

Regional-Prompting-FLUX adds regional prompting capabilities to diffusion transformers like FLUX. It effectively manages complex prompts and works well with tools like LoRA and ControlNet.

05.11.24 · Code · Text-to-Image

AutoVFX

AutoVFX can automatically create realistic visual effects in videos from a single image and text instructions.

05.11.24 · Project Page · Code · Video Editing

Adaptive Caching

Adaptive Caching can speed up video generation with Diffusion Transformers by caching important calculations. It can achieve up to 4.7 times faster video creation at 720p without losing quality.

05.11.24 · Project Page · Code · Text-to-Video

ZIM

ZIM can generate precise matte masks from segmentation labels, enabling zero-shot image matting.

05.11.24 · Project Page · Code · Image Segmentation

Face Anon

Face Anon can anonymize faces in images while keeping original facial expressions and head positions. It uses diffusion models to achieve high-quality image results and can also perform face swapping tasks.

04.11.24 · Code · Image Editing · Image Inpainting

CityGaussianV2

CityGaussianV2 can reconstruct large-scale scenes from multi-view RGB images with high accuracy.

04.11.24 · Project Page · Code · 3D Scene Generation

GMRW

Self-Supervised Any-Point Tracking by Contrastive Random Walks can track any point in a video using a self-supervised global matching transformer.

03.11.24 · Project Page · Code · Video Object Tracking

Video Diffusion Models are Training-free Motion Interpreter and Controller

MOFT is a training-free video motion interpreter and controller. It can be used to extract motion information from video diffusion models and guide the motion of generated videos without the need for retraining.

02.11.24 · Project Page · Code · Video Object Tracking

PF3plat

PF3plat can generate photorealistic images and accurate camera positions from uncalibrated image collections.

01.11.24 · Project Page · Code · 3D Scene Generation

ScalinConcept

ScalingConcept can enhance or suppress existing concepts in images and audio without adding new elements. It can generate poses, enhance object stitching and reduce fuzziness in anime productions.

01.11.24 · Project Page · Code · Image Inpainting · Image-to-Image

No Pose, No Problem

NoPoSplat can reconstruct 3D Gaussian scenes from multi-view. It achieves real-time reconstruction and high-quality images, especially when there are few input images.

01.11.24 · Project Page · Code · 3D Object Generation · 3D Scene Generation

ControlAR

ControlAR adds controls like edges, depths, and segmentation masks to autoregressive models like LlamaGen.

31.10.24 · Code · Controllable Image Generation · Image Segmentation · Image Editing

FiT

State of the art diffusion models are trained on square images. FiT is a new transformer architecture specifically designed for generating images with unrestricted resolutions and aspect ratios (similar to what Sora does). This enables a flexible training strategy that effortlessly adapts to diverse aspect ratios during both training and inference phases, thus promoting resolution generalization and eliminating biases induced by image cropping.

31.10.24 · Code · Image Generation · Image Upscaling · Image Editing

From Text to Pose to Image

From Text to Pose to Image can generate high-quality images from text prompts by first creating poses and then using them to guide image generation. This method improves control over human poses and enhances image fidelity in diffusion models.

30.10.24 · Code · Model · Text-to-Image