Summaries on Multimodal LLMs for Text-rich Image Understanding
Published:
We summarize multimodal understanding papers and delve into models like LLaVAR, TRINS, LaRA, LLaVA-Read, and SV-RAG, which focus on enhancing text-rich image comprehension.
Published:
We summarize multimodal understanding papers and delve into models like LLaVAR, TRINS, LaRA, LLaVA-Read, and SV-RAG, which focus on enhancing text-rich image comprehension.
Published:
The multimodal generation blog covers innovative models such as LAFITE, CAFE, ARTIST, and LLaVA-Reward, which aim to improve text-to-image generation through methods on generalization ability, better multimodal alginment and enhanced text rendering.