AI Art Weekly #87

Hello there, my fellow dreamers, and welcome to issue #87 of AI Art Weekly! πŸ‘‹

With CVPR 2024 happening this week, it’s usually a quieter period for new papers. However, I still had to skim through over 120 papers for this issue πŸ˜…

Let’s jump in:

  • Highlights: Runway Gen-3 is coming, Advanced Midjourney style blending
  • 3D: MeshAnything, High-Fidelity Facial Albedo Estimation, GaussianSR, GradeADreamer, Holistic-Motion2D
  • 4D: Splatter a Video, 4K4DGen, L4GM, D-NPC
  • Image: iCD, CountGen, Glyph-ByT5-v2
  • Video: EvTexture, CamTrol
  • and more!

Cover Challenge 🎨

Theme: weird
143 submissions by 94 artists
AI Art Weekly Cover Art Challenge weird submission by plasm0
πŸ† 1st: @plasm0
AI Art Weekly Cover Art Challenge weird submission by aest_artificial
πŸ₯ˆ 2nd: @aest_artificial
AI Art Weekly Cover Art Challenge weird submission by EternalSunrise7
πŸ₯‰ 3rd: @EternalSunrise7
AI Art Weekly Cover Art Challenge weird submission by soynando__o
🧑 4th: @soynando__o

News & Papers

Highlights

Runway Gen-3 Alpha is coming

After last weeks release of Luma AI’s new video generation tool, Runway is now teasing the release of their Gen-3 model.

The new model offers significant improvements in fidelity, consistency, and motion over Gen-2 and according to Runway a step towards building General World Models. And it shows.

The model will support their existing control modes such as Motion Brush and Director Mode as well as upcoming tools for more fine-grained control over structure, style, and motion. Heck, the thing can even do text.

They said access is coming this week, so keep two eyes out for it!

Over the shoulder shot of a woman running and watching a rocket in the distance. generated with Gen-3 Alpha

Advanced Midjourney style blending

Midjourney released new advanced options for style references and model personalization blending this week. You can now:

  • Blend multiple --sref codes together (--sref 123 456).
  • Combine style reference image URLs and random codes (--sref 123 url).
  • Assign weights to individual codes or URLs (--sref 123::2 456::1).
  • Blend multiple model personalization codes (--p ab12ad3 cd34gl).
  • Use weighted blending with the same notation (--p ab12ad3::2 cd34gl::1).

summer tale --ar 3:2 --style raw --personalize 8vbarpz 9zk3pun --stylize 1000

3D

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

MeshAnything can convert 3D assets in any 3D representation into meshes. This can be used to enhance various 3D asset production methods and significantly improve storage, rendering, and simulation efficiencies.

MeshAnything example

High-Fidelity Facial Albedo Estimation via Texture Quantization

HiFiAlbedo is a method that can recover high-fidelity facial albedo maps from a single image without the need for captured albedo data.

High-Fidelity Facial Albedo Estimation via Texture Quantization example

GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

GaussianSR can generate high-quality 3D Gaussians from low-resolution images and is able to render them faster compared to previous methods.

GaussianSR example

GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion

GradeADreamer is yet another text-to-3D method. This one is capable of producing high-quality assets with a total generation time of under 30 minutes using only a single RTX 3090 GPU.

GradeADreamer example

Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space

Tender can generate diverse and realistic motions from text prompts in 2D space. The results can be used for pose guidance in video generation or be lifted into 3D for character animation.

Holistic-Motion2D example

4D

Splatter a Video: Video Gaussian Representation for Versatile Processing

Splatter a Video can turn a video into a 3D Gaussian representation, allowing for enhanced video tracking, depth prediction, motion and appearance editing, and stereoscopic video generation.

Splatter a Video example

4K4DGen: Panoramic 4D Generation at 4K Resolution

4K4DGen can turn a single panorama image into an immersive 4D environment with 360-degree views at 4K resolution. The method is able to animate the scene and optimize a set of 4D Gaussians using efficient splatting techniques for real-time exploration.

4K4DGen example

L4GM: Large 4D Gaussian Reconstruction Model

L4GM is a 4D Large Reconstruction Model that can turn a single-view video into an animated 3D object.

L4GM example

D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video

Cyberpunk brain dances are becoming a thing! D-NPC can turn videos into dynamic neural point clouds aka 4D scenes which makes it possible to watch a scene from another perspective.

D-NPC example

Image

Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps

iCD can be used for zero-shot text-guided image editing with diffusion models. The method is able to encode real images into their latent space in only 3-4 inference steps and can then be used to edit the image with a text prompt.

Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps example

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Diffusion models can’t count, or can they? CountGen can generate the correct number of objects specified in the input prompt while maintaining a natural layout that aligns with the prompt.

Make It Count example

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

Glyph-ByT5-v2 is a new SDXL model that can generate high-quality visual layouts with text in 10 different languages.

Glyph-ByT5-v2 example

Video

EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

EvTexture is a video super-resolution upscaling method that utilizes event signals for texture enhancement for more accurate texture and high-resolution detail recovery.

EvTexture example

Training-free Camera Control for Video Generation

CamTrol can produce high-dynamic videos with controllable camera moves. No fine-tuning required.

Training-free Camera Control for Video Generation example

Also interesting

β€œDo you want to live forever?” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it πŸ™β€οΈ
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday πŸ˜…)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa