AI Art Weekly #81
Hello there, my fellow dreamers, and welcome to issue #81 of AI Art Weekly! π
Progress is unavoidable, so we march into a world where robots assert dominance while delivering packages and learn how to execute our daily chores. All the while, the veil of what is real becomes blurrier by the day. Still, we march on.
Iβm personally taking a break from marching on. Iβll be back in two weeks with the next issue. Until then, enjoy this packed one!
In this issue:
- 3D: PhysDreamer, GScream, NeRF-XL, Interactive3D, Make-it-Real, TELA, TokenHMR
- Image: Midjourney, Hyper-SD, ConsistentID, PuLID, MultiBooth, ID-Aligner, CharacterFactory, TF-GPH, Editable Image Elements, IDM-VTON
- Video: MaGGIe, MotionMaster, SVA
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do π
Cover Challenge π¨
For next weeks cover Iβm looking for fever dream submissions! Reward is again $100 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
3D
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
PhysDreamer is a physics-based approach that enables you to poke, push, pull and throw objects in a virtual 3D environment and they will react in a physically plausible manner.
GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
GScream is yet another method for object removal in 3D scenes. This one uses Gaussian Splatting to update the radiance field and is able to preserve geometric consistency and texture coherence.
NeRF-XL: NeRF at Any Scale with Multi-GPU
But enough about Gaussians. NeRF-XL by NVIDIA is a new method for distributing NeRFs across multiple GPUs, enabling training and rendering 3D scenes of arbitrarily large capacity.
Interactive3Dπͺ: Create What You Want by Interactive 3D Generation
Of course we arenβt short of 3D object generation methods this week. Interactive3D allows users to interactively modify and guide the generative process of 3D objects. This includes adding and removing components, deforming and rigid dragging, geometric transformations, and semantic editing.
Make-it-Real: Unleashing Large Multimodal Modelβs Ability for Painting 3D Objects with Realistic Materials
AI will make material creation a breeze for 3D artists! Make-it-Real utilizes GPT-4V to recognize and describe materials, allowing the construction of a detailed material library. The model can then precisely identify and align materials with the corresponding components of 3D objects and apply them as reference for new SVBRDF material generation, significantly enhancing their visual authenticity.
TELA: Text to Layer-wise 3D Clothed Human Generation
TELA can create 3D models of people wearing clothes based on text descriptions. It allows you to precisely control how the clothes appear on the model, including which layers go on first.
TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation
And on the pose reconstruction front we have had TokenHMR, which can extract human poses and shapes from a single image.
Image
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
ByteDance released Hyper-SD this week, yet another diffusion-aware distillation algorithm that brings high-quality image generation down to one inference step.
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
This week weβve been blessed with not only image personalization method, but 4. We begin with ConsistentID which can generate diverse personalized ID images based on text prompts using only a single image.
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Similar to ConsistentID, PuLID is a tuning-free ID customization method for text-to-image generation. This one can also be used to edit images generated by diffusion models by adding or changing the text prompt.
MultiBooth: Towards Generating All Your Concepts in an Image from Text
MultiBooth on the other hand can generate images that include any number of concepts in various styles, contexts, and layout relationships as specified by given text prompts.
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
ID-Aligner is able to improve identity preservation and the visual appeal of generated images and can be applied to both LoRA and Adapter models.
CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models
The GAN CharacterFactory is coming. The thing can create infinite identity-consistent new characters and is compatible with models across multiple modalities like ControlNet for images, ModelScope for videos as well as LucidDreamer for 3D objects.
TF-GPH: Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing
TF-GPH can blend images with disparate visual elements together stylistically!
Editable Image Elements for Controllable Synthesis
Editable Image Elements can edit the location and size of objects in an input image and then generate a new image that respects the modifications. This can be used to resize, rearrange, drag, remove, and create variations of objects in an image, as well as compose multiple images together.
IDM-VTON: A New Baseline for Virtual Try-On
IDM-VTON can generate high-quality images of people wearing clothes that are not only realistic, but also preserve the original design of the garment. The method can be used to create virtual fitting rooms, improve online shopping experiences, and even generate fashion designs.
Video
MaGGIe: Mask Guided Gradual Human Instance Matting
MaGGIe can efficiently predict high-quality human instance mattes from coarse binary masks for both image and video input. The method is able to output all instance mattes simultaneously without exploding memory and latency, making it suitable for real-time applications.
MotionMaster: Training-free Camera Motion Transfer For Video Generation
MotionMaster can extract camera motions from a single source video or multiple videos and apply them to new videos. This enables the model to control camera motions in a more flexible and controllable way, resulting in videos with variable-speed zoom, pan left, pan right, dolly zoom in, dolly zoom out and more.
SVA: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
SVA can generate sound effects and background music for videos based on a single key frame and a text prompt.
Also interesting
- DMesh: A Differentiable Representation for General Meshes
- GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting
Remember the Sora βair headβ video? @fxguidenews published a making-of article from shykids, the creators of the viral video.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it πβ€οΈ
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday π )
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
β dreamingtulpa