AI Art Weekly #74
Hello there, my fellow dreamers, and welcome to issue #74 of AI Art Weekly! 👋
Lots of cool stuff got published this week, so let’s dive right in!
- Stable Diffusion 3 Research Paper
- TripoSR fast image-to-3D
- MagicClay can do mesh editing
- PixArt-Σ supports native 4K text-to-image generation
- ResAdapter enables better multi-resolution support
- PeRFlow speeds up diffusion models
- RealCustom for real-time text-to-image customization
- ViewDiff generates multi-view consistent images
- UniCtrl improves text-to-video models
- Pix2Gif generates GIFs from a single image
- Interview with Chissweetart
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for submissions that you can hear without sound! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Stable Diffusion 3 Research Paper
Stability released the Stable Diffusion 3 research paper this week with some additional image output examples. The prompt coherence is pretty cool.
TripoSR: Fast 3D Object Reconstruction from a Single Image
Stability (together with Tripo AI) also released TripoSR this week, a 3D reconstruction model that can generate a 3D mesh from a single image in under 0.5 seconds.
MagicClay: Sculpting Meshes With Generative Neural Fields
While TripoSR can generate meshes from an image, MagicClay can edit them. It’s an artist-friendly tool that allows you to sculpt regions of a mesh with text prompts while keeping other regions untouched.
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
The PixArt model family got a new addition with PixArt-Σ. The model is capable of directly generating images at 4K resolution. Compared to its predecessor, PixArt-α, it offers images of higher fidelity and improved alignment with text prompts.
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Remember the old days where it was a PITA to generate images with a resolution other than 512x512? ResAdapter fixes that. It’s a domain-consistent adapter designed for diffusion models to generate images with unrestricted resolutions and aspect ratios. This enables the efficient inference of multi-resolution images without the need for repeat denoising steps and complex post-processing operations.
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator
ByteDance published a new low-step method called PeRFlow which accelerates diffusion models like Stable Diffusion to generate images faster. PeRFlow is compatible with various fine-tuned stylized SD models as well as SD-based generation/editing pipelines such as ControlNet, Wonder3D and more.
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
RealCustom is yet another image personalization method. This one is able to generate realistic images that consistently adhere to the given text and any subject from a single image in real-time.
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
ViewDiff is a method that can generate high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings from a single text prompt or a single posed image.
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
In the video department we had UniCtrl this week. A method that can be used to improve the semantic consistency and motion quality of videos generated by text-to-video models without additional training. The method is universally applicable and can be used to enhance various text-to-video models.
Pix2Gif: Motion-Guided Diffusion for GIF Generation
And last but not least, Microsoft published Pix2Gif this week. A image-to-video model that is able to generate GIFs from a single image and a text prompt. They claim that the model is able to understand motion, but we’re not talking Sora levels here. But it’s certainly a step up motion wise compared to the slow-motion videos we’re used to.
Also interesting
- Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
- DATTT: Depth-aware Test-Time Training for Zero-shot Video Object Segmentatin
@Norod78 shared an example workflow of bringing a Midjourney created character into AR using TripoSR, MeshLab, Mixamo and Reality Converter.
It’s been a while since I’ve seen AI enhanced NPCs in a game. Mantella is a Skyrim Mod that lets you talk to NPCs using Whisper for speech-to-text, LLMs for text generation, and xVASynth for text-to-speech. Wish this existed when I was a kid.
FaceChain is a deep-learning toolchain for generating Digital-Twins. With a minimum of 1 portrait-photo, you can create a Digital-Twin of your own and start generating personal portraits in different settings.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa