AI Art Weekly #71
Hello there, my fellow dreamers, and welcome to issue #71 of AI Art Weekly! 👋
I was so close to releasing a preview of Shortie this week, but then OpenAI dropped Sora and I had to cover that instead 😅 So without further ado, let’s jump into this weeks issue. The highlights are:
- OpenAI’s Sora
- Stable Cascade, a faster and more efficient text-to-image model
- Magic-Me generates videos with a specified subject identity
- Continuous 3D Words controls attributes in images
- GALA3D generates complex 3D scenes from text
- HeadStudio generates animatable head avatars
- AudioEditing allows for zero-shot and text-based audio editing
- Sophia-in-Audition uses a robot performer in virtual production
- Interview with artist Grebenshyo
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for alternate history inspired submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
OpenAI’s Sora
OpenAI shook the world again this week. They presented Sora, a generative video AI model that can create realistic and imaginative scenes from text prompts. Just reading that doesn’t sound like anything new, until you see the results. Like holy smokes.
I posted a summary of all its capabilities over on X.
Besides the mind-blowing results, the most interesting aspect to me is that the model learned to simulate some aspects of people, animals and environments from the physical world without explicitly being trained for 3D and objects. The more data it got fed, the more it learned about the world. It even learned how a player behaves when generating Minecraft videos 🤯 Gonna be interesting to see how this is going to evolve!
Stable Cascade: Stable Diffusion meets Würstchen
Stable Cascade is a new text-to-image model by Stability AI that is built upon the Würstchen architecture. Due to its more compressed latent space, it can be trained quicker and generate images faster compared to models like SDXL. Best of all: all known extensions like finetuning, LoRA, ControlNet, IP-Adapter, LCM are possible. Seems like a great time to get into Stable Diffusion.
Magic-Me
It’s hard to follow-up with video models after Sora, but this is where we are at until the rest of the world catches up.
Magic-Me is a video generation model that is able to generate videos with a specified subject identity defined by a few images. The model is also able to deblur faces and upscale videos for higher resolution.
Continuous 3D Words for Text-to-Image Generation
Continuous 3D Words is a new control method that can modify attributes in images with a slider based approach. This allows for more control over illumination, non-rigid shape changes (like wings), and camera orientation for instance.
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
GALA3D is a text-to-3D method that can generate complex scenes with multiple objects and control their placement and interaction. The method uses large language models to generate initial layout descriptions and then optimizes the 3D scene with conditioned diffusion to make it more realistic.
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
HeadStudio is another text-to-3D avatar model that can generate animatable head avatars. The method is able to produce high-fidelity avatars with smooth expression deformation and real-time rendering.
AudioEditing: Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
AudioEditing are two new methods for editing audio. The first technique allows for text-based editing, while the second is an approach for discovering semantically meaningful editing directions without supervision.
Sophia-in-Audition: Virtual Production with a Robot Performer
Sophia-in-Audition is a system uses the humanoid robot Sophia as a virtual performer inside an UltraStage, which is a controllable lighting dome coupled with multiple cameras. The result is a virtual actor that can replicate iconic film segments, follow real performers, and perform a variety of motions and expressions, all while being able to control lighting and camera movements.
First of all, Sophia creeps me out, second of all, I think with all the progress in AI motion capturing and now with world-models like Sora on the horizon, this is already obsolete again. At least for movie production.
But still fun to learn about.
Also interesting
- ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents
Sora has brains like @DrJimFan putting some interesting thoughts out there regarding simulation theory.
Interview
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
Unofficial Google Colab notebook for Stable Cascade. There is also a Hugging Face Demo.
A guide on how to use the AnimateDiff LCM model in ComfyUI to create video-to-video style transfers like with Gen-1.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa