AI Toolbox
A curated collection of 759 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.





Garment3DGen can stylize the geometry and textures from 2D image and 3D mesh garments! These can be fitted on top of parametric bodies and simulated. Could be used for hand-garment interaction in VR or to turn sketches into 3D garments.
MonoHair can create high-quality 3D hair from a single video. It uses a two-step process for detailed hair reconstruction and achieves top performance across various hairstyles.
Learning Inclusion Matching for Animation Paint Bucket Colorization can colorize line art in animations by allowing artists to colorize just one frame. The algorithm then automatically applies the color to the rest of the frames, using a learning-based inclusion matching pipeline for more accurate results.
AiOS can estimate human poses and shapes in one step, combining body, hand, and facial expression recovery.
PAID is a method that enables smooth high consistency image interpolation for diffusion models. GANs have been the king in that field so far, but this method shows promising results for diffusion models.
TC4D can animate 3D scenes generated from text along arbitrary trajectories. I can see this being useful for generating 3D effects for movies or games.
TRAM can reconstruct human motion and camera movement from videos in dynamic settings. It reduces global motion errors by 60% and uses a video transformer model to accurately track body motion.
Attribute Control enables fine-grained control over attributes of specific subjects in text-to-image models. This lets you modify attributes like age, width, makeup, smile and more for each subject independently.
FlashFace can personalize photos by using one or a few reference face images and a text prompt. It keeps important details like scars and tattoos while balancing text and image guidance, making it useful for face swapping and turning virtual characters into real people.
TRIP is a new approach to image-to-video generation with better temporal coherence.
Make-It-Vivid generates high-quality texture maps for 3D biped cartoon characters from text instructions, making it possible to dress and animate characters based on prompts.
ThemeStation can generate a variety of 3D assets that match a specific theme from just a few examples. It uses a two-stage process to improve the quality and diversity of the models, allowing users to create 3D assets based on their own text prompts.
Spectral Motion Alignment is a framework that can capture complex and long-range motion patterns within videos and transfer them to video-to-video frameworks like MotionDirector, VMC, Tune-A-Video, and ControlVideo.
StreamingT2V enables long text-to-video generations featuring rich motion dynamics without any stagnation. It ensures temporal consistency throughout the video, aligns closely with the descriptive text, and maintains high frame-level image quality. Videos can be up to 1200 frames, spanning 2 minutes, and can be extended for even longer durations.
ReNoise can be used to reconstruct an input image that can be edited using text prompts.
AnyV2V can edit videos using prompt-based editing and style transfer without fine-tuning. It modifies the first frame of a video and generates the edited video while keeping high visual quality.
FouriScale can generate high-resolution images from pre-trained diffusion models with various aspect ratios and achieve an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
FRESCO combines ControlNet with Ebsynth for zero-shot video translation that focuses on preserving the spatial and temporal consistency of the input frames.
You Only Sample Once can quickly create high-quality images from text in one step. It combines diffusion processes with GANs, allows fine-tuning of pre-trained models, and works well at higher resolutions without extra training.
TexDreamer can generate high-quality 3D human textures from text and images. It uses a smart fine-tuning method and a unique translator module to create realistic textures quickly while keeping important details intact.