AI Art Weekly #90

Hello there, my fellow dreamers, and welcome to issue #90 of AI Art Weekly! πŸ‘‹

Looks like AI research isn’t slowing down anytime soon. I skimmed through 244 papers for you this week and picked the 19 most interesting ones.

With that much research, it gets hard keeping track of everything, so I’m working on a searchable index of all past papers with the option to filter by category and code availability. Stay tuned for that!

In this issue:

  • 3D: WildGaussians, Tailor3D, MeshAvatar, 3D Gaussian Ray Tracing, RodinHD, PICA
  • Motion: Infinite Motion, CrowdMoGen
  • 4D: 4DiM, Segment Any 4D Gaussians
  • Image: AuraFlow v0.1, ColorPeel, HumanRefiner, Minutes to Seconds, PartCraft, Still-Moving
  • Video: Live2Diff, GIMM
  • Audio: ReWaS, MuseBarControl
  • and more!

Cover Challenge 🎨

Theme: hybrids
42 submissions by 27 artists
AI Art Weekly Cover Art Challenge hybrids submission by onchainsherpa
πŸ† 1st: @onchainsherpa
AI Art Weekly Cover Art Challenge hybrids submission by amorvobiscum
πŸ₯ˆ 2nd: @amorvobiscum
AI Art Weekly Cover Art Challenge hybrids submission by dolma33
πŸ₯‰ 3rd: @dolma33
AI Art Weekly Cover Art Challenge hybrids submission by elfearsfoxsox
🧑 4th: @elfearsfoxsox

News & Papers

3D

WildGaussians: 3D Gaussian Splatting in the Wild

WildGaussians is a new 3D Gaussian Splatting method that can handle occlusions and appearance changes. The method is able to achieve real-time rendering speeds and is able to handle in-the-wild data better than other methods.

WildGaussians example

Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

Tailor3D can create customized 3D assets from text or single and dual-side images. The method also supports adding changes to the inputs through additional text prompts.

Tailor3D examples

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

MeshAvatar can generate high-quality triangular human avatars from multi-view videos. The avatars can be edited, manipulated, and relit.

MeshAvatar applications

3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes

3D Gaussian Ray Tracing brings ray tracing support to 3D Gaussian Splats. The method is able to handle large numbers of semi-transparent particles and is well-suited for rendering from highly-distorted cameras, making it a great fit for robotics.

3D Gaussian Ray Tracing example

RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

RodinHD can generate high-fidelity 3D avatars from a portrait image. The method is able to capture intricate details such as hairstyles and can generalize to in-the-wild portrait input.

RodinHD examples

PICA: Physics-Integrated Clothed Avatar

PICA can generate high-fidelity animatable clothed human avatars with physics-accurate dynamics, even for loose clothing, from multi-view videos.

PICA examples

Motion

Infinite Motion: Extended Motion Generation via Long Text Instructions

Infinite Motion can generate long-duration motion from arbitrary lengths of text! The model also supports precise editing of local segments within the generated sequences, offering unparalleled control and flexibility in motion synthesis.

Infinite Motion examples

CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

CrowdMoGen can generate crowd motions based on a text prompt! The model is able to efficiently synthesize the required collective motions based on the holistic plans and can handle a wide range of scenarios and crowd sizes.

CrowdMoGen example

4D

4DiM: Controlling Space and Time with Diffusion Models

Google DeepMind has been researching 4DiM, a cascaded diffusion model for 4D novel view synthesis. It can generate 3D scenes with temporal dynamics from a single image and a set of camera poses and timestamps.

4DiM example

Segment Any 4D Gaussians

SA4D is a framework that can segment anything in the 4D digital world based on 4D Gaussians. The method is able to remove, recolor, compose, and render high-quality masks of objects within seconds.

Segment Any 4D Gaussians examples

Image

AuraFlow v0.1

fal.ai has released AuraFlow v0.1 this week, the first release of a new text-to-image open-source foundation model series.

With Stability AI becoming unstable, it has been a while since we’ve seen a new open source text-to-image model, so this is great news for the open-source community.

You can test the model directly on the fal.ai playground. The weights can be found on HuggingFace.

AuraFlow v0.1 examples

ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement

ColorPeel can generate objects in images with specific colors and shapes.

ColorPeel examples

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

HumanRefiner can improve human hand and limb quality in images! The method is able to detect and correct issues related to both abnormal human poses.

HumanRefiner examples

Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

M2S is a new DDPM-based image inpainting method that is 60 times faster than RePaint! πŸ”₯

Minutes to Seconds comparison with other methods

PartCraft: Crafting Creative Objects by Parts

PartCraft can generate objects by parts! Perfect for crafting new types of animal, robot and human hybrids πŸ‘Œ

PartCraft examples

Still-Moving: Customized Video Generation without Customized Video Data

Still-Moving can customize video models with the spatial prior of a customized text-to-image model and a motion prior of a text-to-video model. This enables personalized, stylized, and conditional video generation.

Still-Moving example

Video

Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models

Live2Diff is the first attempt that enables uni-directional attention modeling to video diffusion models for live video steam processing. It achieves 16FPS on RTX 4090 GPU πŸ”₯

Live2Diff example

Generalizable Implicit Motion Modeling for Video Frame Interpolation

GIMM is a new video interpolation method that uses motion modelling to predict motion between frames.

GIMM example

Audio

Read, Watch and Scream! Sound Generation from Text and Video

ReWaS can generate sound effects from text and video. The method is able to estimate the structural information of audio from the video while receiving key content cues from a user prompt.

Check the ReWaS project page for examples with sound

MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss

MuseBarControl enables fine-grained control over individual bars in symbolic music generation. This makes it possible to specify the number of notes, pitch range, and harmony for each bar, as well as the overall structure of the composition.

The two-strategies framework of MuseBarControl to improve the controllability of the network.

Also interesting

β€œGait of Gold” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it πŸ™β€οΈ
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday πŸ˜…)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa