AI Art Weekly #70
Hello there, my fellow dreamers, and welcome to issue #70 of AI Art Weekly! π
Was extremely busy this week experimenting with detection and tracking models for Shortie and found a solution that is fast and accurate enough. If things go well, I have an MVP up next week. Wish me luck! π€
In the meantime, letβs see whatβs new in the world of Generative AI art!
- Video-LaVIT: a multi-modal LLM that can generate images and videos
- ConsistI2V generates image-to-video with more consistency
- Direct-a-Video controls camera movement and object motion for text-to-video
- Boximator generates rich and controllable motions for image-to-video
- ConsiStory maintains subject consistency in text-to-image
- LGM generates high-resolution 3D mesh objects
- Holo-Gen generates PBR material properties for 3D objects
- Stability AI has been working on a text-to-speech model
- EmoSpeaker generates talking-head videos
- Interview with AI artist blanq
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do π
Cover Challenge π¨
For next weeks cover Iβm looking for cabal inspired submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Video-LaVIT is a multi-modal video-language method that can comprehend and generate image and video content and supports long video generation.
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
ConsistI2V is an image-to-video method with enhanced visual consistency. Compared to other methods, this one is able to better maintain the subject, background, and style from the first frame, as well as ensure a fluid and logical progression while supporting long video generation as well as camera motion control.
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
In the controllability department we got Direct-a-Video. The framework can individually or jointly control camera movement and object motion in text-to-video generations. This means you can generate a video and tell the model to move the camera from left to right, zoom in or out and move objects around in the scene.
Boximator: Generating Rich and Controllable Motions for Video Synthesis
As usual, one paper seldom comes alone. Boximator is a method that can generate rich and controllable motions for image-to-video generations by drawing box constraints and motion paths onto the image.
ConsiStory: Training-Free Consistent Text-to-Image Generation
First InstantID, then StableIdentity and now ConsiStory, the third paper in 4 weeks that tries to consistent subject identity without fine-tuning. Compared to other methods, ConsiStory is able to successfully follow text prompts while maintaining subject consistency. The model also supports multi-subject scenarios and even enable training-free personalization for common objects.
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
LGM can generate high-resolution 3D mesh objects from text prompts or a single image. The model is able to generate 3D objects within 5 seconds while boosting the training resolution to 512, resulting in high-fidelity and efficient 3D content creation. There is a HuggingFace demo if you want to give it a try. Itβs still not good enough to turn my PFP into a 3D model though π’
Holo-Gen: Collaborative Control for Geometry-Conditioned PBR Image Generation
Now we got meshes, but what if we want to re-texture them? Unity has published Holo-Gen this week. The method can generate physically-based rendering (PBR) material properties for 3D objects.
Natural language guidance of high-fidelity text-to-speech models with synthetic annotations
Stability has been researching text-to-speech capabilities that let you control speaker identity and style with natural language text prompts. Their trained model is able to generate high-fidelity speech with a diverse range of accents, prosodic styles, channel conditions, and acoustic conditions. It hasnβt been open-sourced yet, but Iβm sure it will at some point.
EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation
EmoSpeaker is yet another talking-head model. This one is able to generate talking-head videos with input audio, emotion, and a source image. It can also generate talking-heads of different emotional intensities by adjusting the fine-grained emotion.
Also interesting
- Ξ»-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
- Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application
- NerfEmitter: NeRF as Non-Distant Environment Emitter in Physics-based Inverse Rendering
- Denoising Diffusion via Image-Based Rendering
- Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos
- InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior
@DaveJWVillalva takes a critical look at the current state of AI video film makers. Recommended read.
@InnerRefle11312 has been exploring LCM AnimateDiff, especially itβs capabilities of video-to-video realism.
Interview
Tools & Tutorials
These are some of the most interesting resources Iβve come across this week.
DyanmiCrafter is a video generation model that can generate videos from images. It also supports motion-control using text prompts, looping video generation and generative frame interpolation. HuggingFace demo.
MetaVoice-1B is an open-source 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech).
RMBG v1.4 is our state-of-the-art background removal model, designed to effectively separate foreground from background in a range of categories and image types. HuggingFace demo.
An extensive node suite that enables ComfyUI to process 3D inputs (Mesh & UV Texture, etc) using cutting edge algorithms (3DGS, NeRF, etc.)
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it πβ€οΈ
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday π )
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
β dreamingtulpa