AI Art Weekly #72
Hello there, my fellow dreamers, and welcome to issue #72 of AI Art Weekly! ๐
Been out cold from the flu this week, so Iโm keeping it short Today. Letโs jump right into this weeks highlights:
- Stable Diffusion 3 goes into early preview
- SDXL got a Lightning upgrade
- FiT is a new transformer architecture for unrestricted image aspect ratios
- Snap Video is a new video model by Snapchat
- Binary Opacity Grids renders high-quality meshes in real-time
- Argus3D generates 3D meshes from images and text prompts
- FlashTex is a new method for fast mesh texturing
- Visual Style Prompting is a new SOTA for style transfer
- SCG can help you compose and improvise new piano pieces
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do ๐
Cover Challenge ๐จ
For next weeks cover Iโm looking for steampunk inspired submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Stable Diffusion 3
Stable Diffusion 3 went into the early preview stage this week. The model is not yet available, but the waitlist for an early preview is open. Stable Diffusion 3 is said to have greatly improved performance in multi-subject prompts, image quality, and spelling abilities.
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
ByteDance (the TikTok company) found a way to generate high-quality 1024px images in only a few steps which they call SDXL-Lightning. There is also a demo on HuggingFace and fastsdxl.ai.
FiT: Flexible Vision Transformer for Diffusion Model
State of the art diffusion models are trained on square images. FiT is a new transformer architecture specifically designed for generating images with unrestricted resolutions and aspect ratios (similar to what Sora does). This enables a flexible training strategy that effortlessly adapts to diverse aspect ratios during both training and inference phases, thus promoting resolution generalization and eliminating biases induced by image cropping.
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Itโs hard to follow-up with video models after Sora. But Snap Video is an interesting one! The model by the Snapchat company is a model that addresses redundancy in pixel image generation which leads to videos with substantially higher quality, temporal consistency, and motion complexity compared to other methods. Like FiT, it also utilizes a new transformer architecture that trains 3.31x and inferences ~4.5x faster compared to U-Nets.
Binary Opacity Grids
Binary Opacity Grids is a new method for mesh-based view synthesis that is able to capture fine geometric detail. The resulting meshes can be rendered in real-time on mobile devices and achieve significantly higher quality compared to existing approaches.
Argus3D: Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability
Argus3D is another model that is able to generate 3D meshes from images and text prompts as well as unique textures for its generated shapes. Just imagine composing a 3D scene and fill it with objects by pointing at a space and using natural language to describe what you want to place there.
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Roblox published FlashTex this week. The method can texture an input 3D mesh given a user-provided text prompt. These generated textures can also be relit properly in different lighting environments.
Visual Style Prompting with Swapping Self-Attention
Visual Style Prompting can generate images with a specific style from a reference image. Compared to other methods like IP-Adapter and LoRAs, Visual Style Prompting is better at retainining the style of the referenced image while avoiding style leakage from text prompts.
SCG: Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion
SCG can be used by musicians to compose and improvise new piano pieces. It allows musicians to guide music generation by using rules like following a simple I-V chord progression in C major. Pretty cool.
Also interesting
- UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
- MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single to Sparse-view 3D Object Reconstruction
- Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
- GaussianPro: 3D Gaussian Splatting with Progressive Propagation
Historic Crypto created a 40+ minutes reimagined reality TV shows that spans 8 episodes and is set in ancient Rome during the tumultuous period of the fall of the Roman Republic.
Vadim Epsteinโs project THE POEM which is based on H.P. Lovecraftโs โThe Poe-etโs Nightmareโ got allived! Definitely worth a look!
Indierock legend J Mascis (DINOSAUR JR) recently released a AI generated music video for his song โOld Friendsโ. Thanks for the tip @Peloquin1977.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it ๐โค๏ธ
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday ๐ )
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
โ dreamingtulpa