AI Art Weekly #68
Hello there, my fellow dreamers, and welcome to issue #68 of AI Art Weekly! ๐
Iโve begun diving my teeth into a new project. Hopefully I can tell you all more about it next week ๐คซ But until then, lets check out this weeks AI art news!
- Google announced Lumiere โ a new video model
- ActAnywhere generates video backgrounds
- 3DHM animates people with 3D camera control
- Depth Anything is a new monocular depth estimation method
- pix2gestalt estimates and inpaints the shape and appearance of occluded objects
- UltrAvatar generates realistic and animatable 3D avatars
- Diffuse to Choose allows to virtually place any item in any setting
- GALA splits single-layer clothed 3D human meshes into multi-layered 3D assets
- CreativeSynth is a new SOTA method for style retention in image generation
- Interview with AI power house AInigma
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do ๐
Cover Challenge ๐จ
Weโre approaching issue 69, so for next weeks cover Iโm looking for something spicy! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Lumiere: A Space-Time Diffusion Model for Video Generation
Lumiere is Googleโs latest video model and it looks wild! The model was trained on a dataset of 30 million videos, along with their text captions, and is capable of generating 80 frames at 16 fps. It supports text-to-video, image-to-video, video inpainting and stylization. Unfortunately, Google has a track record of not releasing their models, but one can still hope ๐ฅน
ActAnywhere: Subject-Aware Video Background Generation
Given a subject sequence and a background image, ActAnywhere can generate video backgrounds that match the foreground motions. Pretty cool!
3DHM: Synthezing Moving People with 3D Control
3DHM can animate people with 3D camera control from a single image and a given target video motion sequence.
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Depth Anything is a new monocular depth estimation method. The model is trained on 1.5M labeled images and 62M+ unlabeled images, which results in impressive generalization ability.
pix2gestalt: Amodal Segmentation by Synthesizing Wholes
pix2gestalt is able to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.
UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures
UltrAvatar can generate realistic and animatable 3D avatars with PBR textures from a text prompt or a single image. The framework is also capable of texture editing, allowing you to change eye and hair colors, add aging effects, and even tattoos to your avatars.
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
Shooting product images for e-commerce shops is a time intensive task, but Diffuse to Choose by Amazon can help with that. The inpainting model allows to virtually place any item in any setting with detailed and semantically coherent blending as well as realistic lighting and shadows.
GALA: Generating Animatable Layered Assets from a Single Scan
GALA can turn a single-layer clothed 3D human mesh and decompose it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create new clothed human avatars with any pose.
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion
Retaining the style of a reference image when editing and blending images is a continued challenge. CreativeSynth outperforms other methods in those tasks. Itโs not open-source yet, but the example images look promising.
Also interesting
- DITTO: Diffusion Inference-Time T-Optimization for Music Generation
- Hourglass Diffusion Transformers: Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
- RL Diffusion: Large-scale Reinforcement Learning for Diffusion Models
- GenMoStyle: Generative Human Motion Stylization in Latent Space
- EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
@blizaine created a 360-degree, stereoscopic 3D depth, immersive video for VR / Spatial Computing.
@christian_brinkmann_ created an interactive brush with StreamDiffusion and LeapMotion inside of TouchDesigner.
Interview
This week weโre talking to the AI power house that is AInigma ๐ฅ
Tools & Tutorials
These are some of the most interesting resources Iโve come across this week.
Comfy Textures is an Unreal Engine plugin which integrates the editor with ComfyUI. It allows you to quickly create and refine textures for your scene using generative diffusion models.
SUPIR is a high-fidelity general image restoration model based on large-scale diffusion generative prior.
A HuggingFace space based on DDColor that lets you color old black and white photos.
moondream1 is a tiny (1.6B parameter) vision language model trained by @vikhyatk that performs on par with models twice its size.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it ๐โค๏ธ
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday ๐ )
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
โ dreamingtulpa