AI Art Weekly #86
Hello there, my fellow dreamers, and welcome to issue #86 of AI Art Weekly! 👋
What an insane week it has been! People say AI research is slowing down and we’re reaching a capabilities plateau, but I beg to differ. This week I’ve gone through another 180+ papers and projects from the world of computer vision and AI art, lets jump right into it!
In this weeks issue:
- Highlights: Luma AI: Dream Machine, Stable Diffusion 3 Medium, Midjourney Personalization
- 3D: M-LRM, Human 3Diffusion, WonderWorld, LE3D, AvatarPopUp, IllumiNeRF, GGHead, StableMaterials
- Image: Eye-for-an-eye, Image Neural Field Diffusion Models, Neural Gaffer, Ctrl-X, FontStudio, Layered Image Vectorization, LLamaGen, MimicBrush, CFG++, AsyncDiff, EMMA
- Video: HOI-Swap, T2S-GPT
- Audio: Action2Sound
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For the next cover I’m looking for weird submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Highlights
Luma AI: Dream Machine
Luma AI had the AI art community buzzing this week with their new video genration model called Dream Machine. Compared to Sora, it’s the most advanced video generation model that you can access today. It is able to generate 5 second clips with 120 frames in 120 seconds. I’ve compiled a list of some cool creations from the community over on X.
Stable Diffusion 3 Medium
Stability AI finally released Stable Diffusion 3 weights, well, kind of. They released a new model called Stable Diffusion 3 Medium which is a smaller version of the original model they showcased a few weeks back. The new model also supports overall quality improvements in photorealism, prompt understanding and can generate text in images. Although human anatomy is still an issue apparently.
Midjourney Personalization
Midjourney released a new personalization feature that completely changes the way MJ interprets your prompts. For it to work, you have to rank at least 200 images and then add the --p
flag to the end of your prompts. I’m already having a ton of fun with it.
3D
M-LRM: Multi-view Large Reconstruction Model
M-LRM is yet another model that can reconstruct high-quality 3D shapes from either a single or multiple images.
Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models
Human 3Diffusion can reconstruct realistic avatars from a single RGB image, achieving high-fidelity in both geometry and appearance.
WonderWorld: Interactive 3D Scene Generation from a Single Image
WonderWorld can generate interactive 3D scenes from a single image and a text prompt in less than 10 seconds on a single A6000 GPU.
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis
LE3D can turn noisy RAW images into a Gaussian Splat and perform real-time novel view synthesis, HDR rendering, refocusing, and tone-mapping changes.
Instant 3D Human Avatar Generation using Image Diffusion Models
GoogleMind’s AvatarPopUp can generate high-quality rigged 3D human avatars from a single image or text prompt in as few as 2 seconds.
IllumiNeRF: 3D Relighting without Inverse Rendering
Also by Google, IllumiNeRF can relight images. The methods uses an image diffusion model conditioned on lighting and then reconstructs a NeRF with these relit images, from which it can render novel views under the target lighting.
GGHead: Fast and Generalizable 3D Gaussian Heads
GGHead can generate and render 3D heads at 1K resolution in real-time.
StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning
StableMaterials an generate high-resolution tileable PBR materials from text prompts or input images in just 4 diffusion steps.
Image
Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models
Eye-for-an-eye makes it possible for diffusion models to transfer the appearance of objects from a reference image to a target image.
Image Neural Field Diffusion Models
Image Neural Field Diffusion Models can be used to train diffusion models on image neural fields, which can be rendered at any resolution. This makes it possible to train diffusion models using mixed-resolution image datasets.
Neural Gaffer: Relighting Any Object via Diffusion
Neural Gaffer can relight any object in an image under any novel environmental lighting condition by simply conditioning an image generator on a target environment map.
Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance
Ctrl-X enables structure and appearance control for text-to-image and text-to-video models that with any image as input! This makes it possible to generate images and videos with the structure of one image and the appearance of another.
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
FontStudio can generate text effects for multilingual fonts. The model is able to interpret the given shape of a font and strategically plan pixel distributions within the irregular canvas.
Layered Image Vectorization via Semantic Simplification
Layered Vectorization can turn images into layered vectors that represent the original image from coarse to fine detail levels.
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
LLamaGen is a new family of image generation models that are based on the same approach as LLMs. The largest model has 3.1B parameters and is able to generate 256x256 images.
Zero-shot Image Editing with Reference Imitation
MimicBrush can edit an image region of interest by drawing inspiration from a reference image by capturing the semantic correspondence between separate images in a self-supervised manner.
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
CFG++ fixes CFG’s issues with lower guidance scales, improving text-to-image quality and invertibility.
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
AsyncDiff brings parallelism to diffusion model which results in a significant boost in inference latency while minimally impacting the generative quality.
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
EMMA is a new image generation model that can generate images from text prompts and additional modalities such as reference images or portraits. It especially shines at preserving individual identities.
Video
HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness
HOI-Swap can swap objects in videos with a focus on those interacted with by hands, given one user-provided reference object image.
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text
T2S-GPT can generate sign language videos from text and is able to control the speed of the signing.
Audio
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Action2Sound can generate realistic action sounds for human interactions in videos. The model is able to disentangle foreground action sounds from the ambient background sounds and can even generate ambient sounds for silent videos.
Also interesting
- Visual Words: Understanding Visual Concepts Across Models
- CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models
- Weights2Weights: Interpreting the Weight Space of Customized Diffusion Models
- CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion
- AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
- MCM: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
My piece “Stop Breathing and Contemplate Your Insignificance” as part of “Polyptych 1” for the Eternal Peace collection is being exhibited in Basel, Switzerland this weekend 🧡
@spiritform made an AI short using Luma’s Dream Machine and Stable Audio for sound effects using only text.
@juliewdesign_ also created an AI short by using personalized Midjourney v6 images as input.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa