Posts by Tags

Multimodal Generation

Summaries and Thoughts on Multimodal Alignment and Generation

5 minute read

Published: April 27, 2025

The multimodal generation blog covers innovative models such as LAFITE, CAFE, ARTIST, and LLaVA-Reward, which aim to improve text-to-image generation through methods on generalization ability, better multimodal alginment and enhanced text rendering.

Multimodal Understanding

Summaries on Multimodal LLMs for Text-rich Image Understanding

4 minute read