AI Toolbox | AI Art Weekly

Text2Place

Text2Place can place any human or object realistically into diverse backgrounds. This enables scene hallucination by generating compatible scenes for the given pose of the human, text-based editing of the human and placing multiple persons into a scene.

10.09.24 · Project Page · Code · Image-to-Image · Image Segmentation

One-Shot Diffusion Mimicker for Handwritten Text Generation

One-DM can generate handwritten text from a single reference sample, mimicking the style of the input. It captures unique writing patterns and works well across multiple languages.

09.09.24 · Code · Text-to-Image

FlexiClip

FlexiClip can generate smooth animations from clipart images while keeping key points in the right place.

08.09.24 · Project Page · Code · Image-to-Video

Distilling Diffusion Models into Conditional GANs

Diffusion2GAN is a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference while preserving image quality. This enables one-step 512px/1024px image generation at an interactive speed of 0.09/0.16 second as well as 4k image upscaling!

05.09.24 · Project Page · Code · Image-to-Image

LinFusion

LinFusion can generate high-resolution images up to 16K in just one minute using a single GPU. It improves performance on various Stable Diffusion versions and works with pre-trained components like ControlNet and IP-Adapter.

04.09.24 · Project Page · Code · Demo · Controllable Image Generation

ViewCrafter

ViewCrafter can generate high-quality 3D views from single or few images using a video diffusion model. It allows for precise camera control and is useful for real-time rendering and turning text into 3D scenes.

04.09.24 · Project Page · Code · Text-to-Video

CSGO

CSGO can perform image-driven style transfer and text-driven stylized synthesis. It uses a large dataset with 210k image triplets to improve style control in image generation.

03.09.24 · Project Page · Code · Image Style Transfer · Text-to-Image · Image Editing

HumanVid

HumanVid can generate videos from a character photo while allowing users to control both human and camera motions. It introduces a large-scale dataset that combines high-quality real-world and synthetic data, achieving state-of-the-art performance in camera-controllable human image animation.

02.09.24 · Project Page · Code · Controllable Video Generation

Follow-Your-Canvas

Follow-Your-Canvas can outpaint videos at higher resolutions, from 512x512 to 1152x2048.

01.09.24 · Project Page · Code · Video Outpainting

LogoMotion

LogoMotion can turn logos from layered PDF files into content-aware animated HTML canvas animations. Very cool!

27.08.24 · Project Page · Code · Text-to-Motion · 3D Object Generation · 3D Editing

KEEP

KEEP can enhance video face super-resolution by maintaining consistency across frames. It uses Kalman filtering to improve facial details, working well on both synthetic and real-world videos.

27.08.24 · Project Page · Code · Video Restoration

Thin-Plate Spline-based Interpolation for Animation Line Inbetweening

tps-inbetween can generate high-quality intermediate frames for animation line art. It effectively connects lines and fills in missing details, even during fast movements, using a method that models keypoint relationships between frames.

25.08.24 · Code · Image-to-Image · Image Inpainting

STA-V2A

STA-V2A can generate high-quality audio from videos by extracting important features and using text for guidance. It uses a Latent Diffusion Model for audio creation and a new metric called Audio-Audio Align to measure how well the audio matches the video timing.

20.08.24 · Project Page · Code · Text-to-Audio

TVG

TVG can create smooth transition videos between two images without needing training. It uses diffusion models and Gaussian Process Regression for high-quality results and adds controls for better timing.

19.08.24 · Project Page · Code · Video Generation

Iterative Object Count Optimization for Text-to-image Diffusion Models

Iterative Object Count Optimization can improve object counting accuracy in text-to-image diffusion models.

18.08.24 · Project Page · Code · Text-to-Image

SparseCraft

SparseCraft can reconstruct 3D shapes and appearances from just three colored images. It uses a Signed Distance Function (SDF) and a radiance field, achieving fast training times of under 10 minutes without needing pretrained models.

15.08.24 · Project Page · Code · 3D Object Generation · Image-to-3D

MagicFace

MagicFace can generate high-quality images of people in any style without needing extra training.

15.08.24 · Project Page · Code · Personalized Image Generation · Image Editing

MagicFace

MagicFace can generate high-quality images of people in any style without needing training. It uses special attention methods for precise attribute alignment and feature injection, working for both single and multi-concept customization.

15.08.24 · Project Page · Code · Personalized Image Generation · Image Editing

Generative Photomontage

Generative Photomontage can combine parts of multiple AI-generated images using a brush tool. It enables the creation of new appearance combinations, correct shapes and artifacts, and improve prompt alignment, outperforming existing image blending methods.

15.08.24 · Project Page · Code · Image Editing · Image Inpainting

Filtered-Guided Diffusion

Filtered Guided Diffusion shows that image-to-image translation and editing doesn’t necessarily require additional training. FGD simply applies a filter to the input of each diffusion step based on the output of the previous step in an adaptive manner which makes this approach easy to implement.

14.08.24 · Code · Image-to-Image · Image Editing