AI Art Weekly #61
Hello there, my fellow dreamers, and welcome to issue #61 of AI Art Weekly! 👋
This week has been the craziest week regarding AI related research since I started writing this newsletter. Usually I skim through roughly 50-80 papers per week to summarize the most interesting advancements for you. This week my paper collection script churned out 186.
On top of that, companies are releasing and announcing new products built on top of that research almost by the day. I already felt how things were picking up during the last few weeks, but this is nuts.
We’re truly living through historical times, so it makes me happy that we cracked the 3’000 subscribers milestone this week. Thank you all for subscribing to my little weekly write-up 🧡
The highlights of the week are:
- Stable Diffusion Turbo is here
- Pika 1.0 announced
- Adobe’s DMD generates images in 90ms
- Sketch Video Synthesis turns videos into sketches
- SparseCtrl adds sparse controls to text-to-video models
- Diffusion Motion Transfer can edit videos with a text prompt
- MVControl brings ControlNet to 3D generation
- Material Palette can extract materials from a single image
- 4D-fy and Dream-in-4D can generate 4D videos
- LucidDreamer turns a single image into a 3D scene
- Control4D lets you edit avatars in 4D
- GAIA generates realistic talking people from a single image and speech clip
- A new image upscaler called CoSeR
- LEDITS++ can edit images fast with a text prompt
- and more tutorials, tools and gems!
Happy to announce that I’m part of the new limited AI Surrealism group drop with 60 new artworks on Foundation. My piece is available until the 5th of December 6pm CET. Each purchase helps to support yours truly 🙏
Cover Challenge 🎨
Let’s get weird. For next weeks cover I’m looking for “eggs”, show me what you got! The reward is $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here. I’m looking forward to your submissions 🙏
News & Papers
Stable Diffusion goes Turbo with Adversarial Diffusion Distillation
Stability AI released the 2nd of their 5 major releases this week. Adversarial Diffusion Distillation is the driver behind the new SD Turbo and SDXL Turbo models which are able to generate high-quality 512x512 images in near real-time at ~200ms.
Faster image generation means new possibilities for interactive and creative applications and people are already exploring different options. If you’re GPU poor you can give Turbo a try on Clipdrop. If not, you can run it locally using Pinokio and ComfyUI.
Pika 1.0
Pika, one of the most popular video generation tools in the AI art community announced their new 1.0 release this week. Aside from a more accessible web interface, the trailer hints at some new upcoming features:
- Video inpainting to edit and customize scenes and subjects in videos
- Video outpainting to adjust video aspect ratios
- Video-to-video to re-style videos similar to Gen-1
Sign up for the waitlist at pika.art to get access when it’s ready.
DMD: One-step Diffusion with Distribution Matching Distillation
Adobe is not sitting idle besides Stability. DMD is yet another real-time method that is able to generate high-quality images with just a single step in only 90ms, claiming it can generate images at 20 FPS. Unfortunately, it’s not open-source (yet), so no way to confirm nor test it.
Sketch Video Synthesis
After LiveSketch last week, Sketch Video Synthesis is another personal favourite of mine. This one can turn subjects of videos into SVG sketches, enabling various rendering techniques, including resizing, color filling, and overlaying doodles on the original background images.
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
SparseCtrl is a image-to-video method with some cool new capabilities. With its RGB, depth and sketch encoder and one or few input images, it can animate images, interpolate between keyframes, extend videos as well as guide video generation with only depth maps or a few sketches. Especially in love with how scene transitions look like.
Diffusion Motion Transfer
Diffusion Motion Transfer is able to translate videos with a text prompt while maintaining the input video’s motion and scene layout.
MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation
ControlNet for generative 3D anyone? MVControl is bringing edge & depth map guidance to pre-trained multi-view 2D diffusion models. I can see a pipeline where image-to-edge-map-to-3D might even further improve output.
Material Palette: Extraction of Materials from a Single Image
Material Palette can extract a palette of PBR materials (albedo, normals, and roughness) from a single real-world image. Looks very useful for creating new materials for 3D scenes or even for generating textures for 2D art.
4D-fy & Dream-in-4D
4D-fy and Dream-in-4D are two new methods to generate 4D dynamic content from a text prompt or image. They results from both approaches are a bit noisy. Personally, I like it! The funkyness adds a lot of charm to it.
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
LucidDreamer can generate navigatable 3D Gaussian Splat scenes out of a single text prompt of a single image. Text prompts can also be chained for more output control. Can’t wait until they can also be animated.
Control4D: Efficient 4D Portrait Editing with Text
Speaking about Gaussian Splatting. Control4D proposes GaussianPlanes, which makes Gaussian Splatting more structured and enhances 4D editing of videos, in this case avatars.
GAIA, SyncTalk, Diffusion Avatars, Portrait4D and CosAvatar
And speaking of avatars, there is even more. It’s been a while since we saw any significant updates when it comes to avatar generations, but this week we not only one or two updates, no, five 🤯! Crazy times ahead.
- GAIA generates talking heads from a single portrait image and speech clip
- SyncTalk aims to optimize lip and head motion synchronisation
- DiffusionAvatars generates high-fidelity 3D avatars which offer pose and expression control.
- Portrait4D can turn portait images into photorealistic 4D head avatars
- CosAvatar provides both global style editing and local attribute editing while ensuring strong consistency
CoSeR: Bridging Image and Language for Cognitive Super-Resolution
CoSeR is a Stable Diffusion based upscaling method that can comprehend low-resolution images and generates a high-quality reference image to optimize the super-resolution process. Results look like the best I’ve seen yet.
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Remember last week? Animate Anyone, just as MagicDance, turns single images and pose guidance into dancing or moving videos, just already much better.
More papers & gems
- ParaDiffusionPage: Paragraph-to-Image Generation with Information-Enriched Diffusion Model
- Wired Perspectives: Multi-View Wire Art Embraces Generative AI
- Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
- SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
- GenZI: Zero-Shot 3D Human-Scene Interaction Generation
@pegasus_vfx edited this shot of Dunkirk by Christopher Nolan with Stable Diffusion and a custom LCM model. Love the vibe!
@phqakl created an interactive experience using SDXL Turbo that allows to generate visuals by using hand gestures to direct that happens on the display. Super cool!
@c0nsumption_ showcased how SDXL Turbo can be leveraged for prompt exploration and SD1.5 for adding more details to the results.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
LEDITS++ is an image editing method that can edit images with only words. It supports multiple edits at once and runs in just a few diffusion steps. No fine-tuning required.
GaussianEditor is an interactive high-resolution 3D editing editor for Gaussian Splats. Only runs on GPUs with 10-20GB of VRAM though.
HumanGaussian is a framework that can create Gaussian Splats 3D humans from text prompts only.
Visual Anagrams can generate multi-view optical illusions and supports flipping, jigsaw, inner circle, color inversion, patch & pixel permutation, skew and three view illusions.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa