AI Art Weekly #66
Hello there, my fellow dreamers, and welcome to issue #66 of AI Art Weekly! π
This week, OpenAI introduced their GPT Store, featuring an upcoming revenue program for US creators, while Rabbit unveiled the r1 pocket companion, a new mobile device that, with the aid of Large Action Models (LAM), aims to help you achieve more with fewer apps. Both have been met with considerable hype and skepticism. Meanwhile, reality is shifting, and the line between what is real and fake is becoming increasingly blurred. Letβs dive in:
- A new text-to-video model by ByteDance (TikTok)
- ReplaceAnything can, well, replace anything (in images)
- PALP is a new text-to-image fine-tuning approach by Google
- Dubbing for Everyone is a new method for visual dubbing
- FMA-Net can turn blurry, low-quality videos into clear, high-quality ones
- Audio2Photoreal can generate gesturing photorealistic avatars from sound clips
- 3 different 3D NeRF scene editing methods
- SonicVisionLM generates sound effects for silent videos
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do π
Cover Challenge π¨
For the next cover weβre doing something different. Iβm looking for images of braces that are spelling βAIβ which will be combined into a collage. Selected pieces are going to be part of the first AI Art Weekly group drop. Thank you @manuW_atx for the idea and @ai_s_a_m for providing the announcement image. Rulebook can be found here and images can be submitted here π
News & Papers
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
ByteDance (the TikTok company) announced a new text-to-video model called MagicVideo-V2. Their model is able to generate videos with up to 94 frames, resulting in a 1048Γ1048 resolution video that exhibits both high aesthetic quality and temporal smoothness. Definitely interesting to see where ByteDance is going with this, as they have one of the biggest datasets to train video models.
ReplaceAnything as you want: Ultra-high quality content replacement
ReplaceAnything is an βinpaintingβ framework that can be used for human replacement, clothing replacement, background replacement, and more. The results look crazy good. Code hasnβt been released yet, but there is a demo on HuggingFace.
PALP: Prompt Aligned Personalization of Text-to-Image Models
PALP is a new text-to-image fine-tuning approach by Google which focuses on personalization methods for a single prompt. The results compared to other methods look great and it supports art inspired, single-image and multi-subjects personalization.
Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors
Dubbing for Everyone is a new method for visual dubbing that is able to generate lip motions of an actor in a video to synchronize with given audio using as little as 4 seconds of data. The method is able to dub any video to any audio without further training and is able to capture person-specific characteristics and reduce visual artifacts.
FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
FMA-Net can turn blurry, low-quality videos into clear, high-quality ones by accurately predicting the degradation and restoration processes, considering the movement in the video through advanced learning of motion patterns.
Audio2Photoreal: From Audio to Photoreal Embodiment
Audio2Photoreal can generate full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction. Given speech audio, the model is able to output multiple possibilities of gestural motion for an individual, including face, body, and hands. The results are highly photorealistic avatars that can express crucial nuances in gestures such as sneers and smirks.
InseRF and GO-Nerf: Inserting 3D Objects into Neural Radiance Fields
Even though Gaussian Splats have seen a lot of love, NeRFs havenβt been abandoned. This week we got three different NeRF editing papers. The first two are about inpainting. InseRF and GO-NeRF are both methods to insert 3D objects into NeRF scenes.
FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance Fields
The third is about style transfering. FPRF is able to stylize large-scale 3D NeRF scenes with multiple reference images without additional optimization while preserving multi-view appearance consistency.
SonicVisionLM: Playing Sound with Vision Language Models
SonicVisionLM can generate sound effects for videos, but compared to other methods, it uses vision language models (VLMs) to identify events within videos and generate sounds that match the video content.
Also interesting
- Jump Cut Smoothing for Talking Heads
- MAGNeT: Masked Audio Generation using a Single Non-Autoregressive Transformer
@batmatt_ai shared some beautiful Midjourney v6 prompts which you can try out: absurd, Magnetic Resonance Imaging brain photography of a girl deer, in the style of light-pink, for a wallpaper --ar 3:4 --v 6.0 --style raw
Tools & Tutorials
These are some of the most interesting resources Iβve come across this week.
ComfyDeploy by @BennyKokMusic allows you to deploy your ComfyUI workflows as APIs and connect them either through your local or cloud machines.
The code for DragNUWA (issue 47) got released. It uses Stable Video Diffusion as a backbone to animate an image according to specified paths.
PixArt LCM (or PixArt-Delta) is a text-to-image synthesis framework that integrates the Latent Consistency Model (LCM) and ControlNet into the advanced PixArt-Alpha (issue 53) model.
CCSR is yet another method for image super-resolution that is able to generate more stable and content-consistent results compared to existing diffusion model-based methods like StableSR.
MirrorDiffusion is a method for zero-shot image-to-image translation, aka editing images with text prompts.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it πβ€οΈ
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday π )
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
β dreamingtulpa