AI Toolbox | AI Art Weekly

Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing

TF-GPH can blend images with disparate visual elements together stylistically!

19.04.24 · Code · Image Editing

FlowSAM can discover and segment moving objects in videos by combining the Segment Anything Model (SAM) with optical flow. It outperforms previous methods, achieving better object identity and sequence-level segmentation for both single and multi-object scenarios.

18.04.24 · Project Page · Code · Video Object Detection

Dynamic Gaussians Mesh

DG-Mesh is able to reconstruct high-quality and time-consistent 3D meshes from a single video. The method is also able to track the mesh vertices over time, which enables texture editing on dynamic objects.

18.04.24 · Project Page · Code · 3D Mesh Generation · Video-to-3D

AniClipart

AniClipart can turn static clipart images into high-quality animations. It uses Bézier curves for smooth motion and aligns movements with text prompts, improving how well the animation matches the text and maintains visual style.

18.04.24 · Project Page · Code · Text-to-Video

Customizing Text-to-Image Diffusion with Camera Viewpoint Control

CustomDiffusion360 brings camera viewpoint control to text-to-image models. Only caveat: it requires a 360 degree multi-view dataset of around 50 images per object to work.

18.04.24 · Project Page · Code · Text-to-Image · Image Editing

StyleBooth

StyleBooth is a unified style editing method supporting text-based, exemplar-based and compositional style editing. So basically, you can take an image and change its style by either giving it a text prompt or an example image.

18.04.24 · Project Page · Code · Image Editing · Image Style Transfer

InFusion

InFusion can inpaint 3D Gaussian point clouds to restore missing 3D points for better visuals. It lets users change textures and add new objects, achieving high quality and efficiency.

17.04.24 · Project Page · Code · 3D Object Generation · 3D Editing

IntrinsicAnything

IntrinsicAnything is able to recover object materials from any images and enable single-view image relighting.

17.04.24 · Project Page · Code · Image Segmentation

VQ-Diffusion

VQ-Diffusion can generate high-quality images from text prompts using a vector quantized variational autoencoder and a conditional denoising diffusion model. It is up to fifteen times faster than traditional methods and handles complex scenes effectively.

17.04.24 · Code · Demo · Text-to-Image

MOWA

MOWA is a multiple-in-one image warping model that can be used for various tasks such as rectangling panoramic images, unrolling shutter images, rotating images, fisheye images, and image retargeting.

16.04.24 · Project Page · Code · Image-to-Image · Image Editing

PEEKABOO

Speaking about video, more research is being conducted on motion control. Peekaboo allows to control the position, size and trajectory of an object very precisely through bounding boxes.

15.04.24 · Project Page · Code · Personalized Video Generation

in2IN

in2IN is a motion generation model that factors in both the overall interaction’s textual description and individual action descriptions of each person involved. This enhances motion diversity and enables better control over each person’s actions while preserving interaction coherence.

15.04.24 · Project Page · Code · Text-to-Motion · 3D Object Generation

Ctrl-Adapter

Ctrl-Adapter is a new framework that can be used to add diverse controls to any image or video diffusion model, enabling things like video control with sparse frames, multi-condition control, and video editing.

15.04.24 · Project Page · Code · Video Editing · Video Style Transfer

Video2Game

Video2Game can turn real-world videos into interactive game environments. It uses a neural radiance fields (NeRF) module for capturing scenes, a mesh module for faster rendering, and a physics module for realistic object interactions.

15.04.24 · Project Page · Code · Demo · Video-to-4D · 3D Scene Generation · 3D Object Generation

LoopGaussian

LoopGaussian can convert multi-view images of a stationary scene into authentic 3D cinemagraphs. The 3D cinemagraphs can be rendered from a novel viewpoint to obtain a natural seamless loopable video.

13.04.24 · Project Page · Code · 3D Scene Generation · 3D Object Generation

ControlNet++

[ControlNet++] can improve image generation by ensuring that generated images match the given controls, like segmentation masks and depth maps. It shows better performance than its predecessor, ControlNet, with improvements of 7.9% in mIoU, 13.4% in SSIM, and 7.6% in RMSE.

11.04.24 · Project Page · Code · Text-to-Image · Image-to-Image

Taming Stable Diffusion for Text to 360° Panorama Image Generation

PanFusion can generate 360-degree panorama images from a text prompt. The model is able to integrate additional constraints like room layout for customized panorama outputs.

11.04.24 · Project Page · Code · Text-to-Image

MindBridge

MindBridge can reconstruct images from fMRI brain signals using a single model that works for different people. It achieves high accuracy even with limited data, making it effective for new subjects.

11.04.24 · Project Page · Code · Brain-to-Image

GoodDrag

GoodDrag can improve the stability and image quality of drag editing with diffusion models. It reduces distortions by alternating between drag and denoising operations and introduces a new dataset, Drag100, for better quality assessment.

10.04.24 · Project Page · Code · Image Editing · Image Restoration

InstantMesh

InstantMesh can generate high-quality 3D meshes from a single image in under 10 seconds. It uses advanced methods like multiview diffusion and sparse-view reconstruction, and it significantly outperforms other tools in both quality and speed.

10.04.24 · Code · Image-to-3D