Publications
You can also find my full list on my Google Scholar profile.
*indicates Ph.D. students / interns that I primarily mentored.
#Text-Rich Image
#MLLM
#RL
#TextGen
#Agent
#Document
#ImageGen
#Recsys
#Uncertainty
#Diffusion
#RAG
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
International Conference on Learning Representations (ICLR) 2025
The first multimodal LLM handling thousands of pages using itself for retrieval.
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Arxiv 2024
Customized Multimodal LLMs as Reward Models for Text-to-Image Generation
In submission to International Conference on Computer Vision (ICCV) 2025
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation
International Conference on Computational Linguistics (COLING) 2025
TRINS: Towards Multimodal Language Models that Can Read
Conference on Computer Vision and Pattern Recognition (CVPR) 2024
An instruction dataset for text-rich images with human annotations.
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation
Arxiv 2024
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
Arxiv 2024
LLaVA-Read was the SoTA method on text-rich image understanding benchmark (OCR-Bench) before July 2024.
Customization Assistant for Text-to-Image Generation
Conference on Computer Vision and Pattern Recognition (CVPR) 2024
ADoPD: A Large-Scale Document Page Decomposition Dataset
International Conference on Learning Representations (ICLR) 2024
AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models
Conference on Language Modeling (COLM) 2024
The first gradient-based adversarial attack to generate readable and malicious prompts.
TextLap: Customizing Language Models for Text-to-Layout Planning
Empirical Methods in Natural Language Processing (EMNLP) 2024
Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints
International Conference on Learning Representations (ICLR) 2024
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Conference on Language Modeling (COLM) 2024
Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances
North American Chapter of the Association for Computational Linguistics (NAACL) 2024
Towards building the federatedGPT: Federated instruction tuning
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024
Improve Temporal Awareness of LLMs for Sequential Recommendation
Arxiv 2024
ARTIST: Improving Generation of Text-rich Image by Disentanglement
Winter Conference on Applications of Computer Vision (WACV) 2025
LLaVAR: Enhanced Visual Instruction Tuning for Text-rich Image Understanding
Neural Information Processing Systems (NeurIPS) 2023
LLaVAR is the first multimodal LLM for text-rich image understanding.
Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels
Neural Information Processing Systems (NeurIPS) 2023
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
Arxiv 2023
LAFITE: Towards Language-Free Training for Text-to-Image Generation
Conference on Computer Vision and Pattern Recognition (CVPR) 2022
Was the state-of-the-art model with 1% of DALL·E size. It serves as a baseline in the DALL·E 2 technical report.
TiGAN: Text-Based Interactive Image Generation and Manipulation
Association for the Advancement of Artificial Intelligence (AAAI) 2022
Offline Interactive Recommendation with Natural-language Feedback
Association for the Advancement of Artificial Intelligence (AAAI) 2022
Dynamics-Aware Adaptation for Reinforcement Learning Based Cross-Domain Interactive Recommendation
Special Interest Group on Information Retrieval (SIGIR) 2022
Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach
Arxiv 2024
Robustness of Demonstration-based Learning Under Limited Data Scenario
Empirical Methods in Natural Language Processing (EMNLP) 2022
Information-Theoretic Representation Disentanglement for Zero-Shot Voice Style Transfer
International Conference on Learning Representations (ICLR) 2021
Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes
Association for the Advancement of Artificial Intelligence (AAAI) 2021
GenDICE: Generalized Offline Estimation of Stationary Values
International Conference on Learning Representations (ICLR) 2020
The first off-policy policy evaluation algorithm for arbitrary behavior policies, handling both average and discounted reward criteria.
Nested-Wasserstein Self-imitation Learning for Sequence Generation
International Conference on Artificial Intelligence and Statistics (AISTATS) 2020
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
International Conference on Machine Learning (ICML) 2020
Semantic Matching for Sequence-to-Sequence Learning
Empirical Methods in Natural Language Processing (EMNLP) 2020
Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference
Empirical Methods in Natural Language Processing (EMNLP) 2020
Improving Adversarial Text Generation via Modeling Distant Future
Association for Computational Linguistics (ACL) 2020
Text-based Interactive Recommendation via Constraint-Augmented Reinforcement Learning
Neural Information Processing Systems (NeurIPS) 2019
Scalable Thompson Sampling via Optimal Transport
International Conference on Artificial Intelligence and Statistics (AISTATS) 2019
The first neural Thompson sampling algorithm.
Improving Sequence-to-Sequence model via Optimal Transport
International Conference on Learning Representations (ICLR) 2019
Policy Optimization as Wasserstein Gradient Flows
International Conference on Machine Learning (ICML) 2018
Topic-Guided Variational Autoencoders for Text Generation
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 2019
Knowledge Graph Prompting for Multi-Document Question Answering
AAAI Conference on Artificial Intelligence (AAAI) 2024
Topology-aware Retrieval Augmentation for Text Generation
ACM International Conference on Information and Knowledge Management (CIKM) 2024
Understanding and Accelerating Particle-Based Variational Inference
International Conference on Machine Learning (ICML) 2019
A Unified Particle-Optimization Framework for Scalable Bayesian Sampling
Uncertainty in Artificial Intelligence (UAI) 2018
Figure Captioning with Reasoning and Sequence-Level Training
Winter Conference on Applications of Computer Vision (WACV) 2020
Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions
Association for the Advancement of Artificial Intelligence (AAAI) 2020
Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory
Artificial Intelligence and Statistics (AISTATS) 2020
Numerical pruning for efficient autoregressive models
AAAI Conference on Artificial Intelligence (AAAI) 2024
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Arxiv 2024
VipAct: Visual-perception enhancement via specialized vlm agent collaboration and tool-use
Arxiv 2024
Variational Inference and Model Selection with Generalized Evidence Bounds
International Conference on Machine Learning (ICML) 2018