Publications

You can also find my full list on my Google Scholar profile.

*indicates Ph.D. students / interns that I primarily mentored.

#Text-Rich Image
#MLLM
#RL
#TextGen
#Agent
#Document
#ImageGen
#Recsys
#Uncertainty
#Diffusion
#RAG

SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding

Jian Chen*, Ruiyi Zhang, Yufan Zhou, Tong Yu, Jiuxiang Gu, Ryan A. Rossi, Changyou Chen, Tong Sun
International Conference on Learning Representations (ICLR) 2025
The first multimodal LLM handling thousands of pages using itself for retrieval.

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Siwei Han, Peng Xia, Ruiyi Zhang, Tong Sun, Yun Li, Hongtu Zhu, Huaxiu Yao
Arxiv 2025

GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration

Yue Fan*, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, Gang Wu
Arxiv 2024

Customized Multimodal LLMs as Reward Models for Text-to-Image Generation

Shijie Zhou*, Ruiyi Zhang, Branislav Kveton, Yufan Zhou, Jiuxiang Gu, Jian Chen, Changyou Chen
In submission to International Conference on Computer Vision (ICCV) 2025

Towards Visual Text Grounding of Multimodal Large Language Model

Ming Li*, Ruiyi Zhang, Jian Chen, Franck Dernoncourt, Wanrong Zhu, Tianyi Zhou, Tong Sun
Arxiv 2025

A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation

Shijie Zhou*, Ruiyi Zhang, Yufan Zhou, Changyou Chen
International Conference on Computational Linguistics (COLING) 2025

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Nanxuan Zhao, Jing Shi, Tong Sun
Arxiv 2024

TRINS: Towards Multimodal Language Models that Can Read

Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun
Conference on Computer Vision and Pattern Recognition (CVPR) 2024
An instruction dataset for text-rich images with human annotations.

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun
Arxiv 2024

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models

Ruiyi Zhang, Yufan Zhou, Jian Chen, Jiuxiang Gu, Changyou Chen, Tong Sun
Arxiv 2024
LLaVA-Read was the SoTA method on text-rich image understanding benchmark (OCR-Bench) before July 2024.

Customization Assistant for Text-to-Image Generation

Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Tong Sun
Conference on Computer Vision and Pattern Recognition (CVPR) 2024

ADoPD: A Large-Scale Document Page Decomposition Dataset

Jiuxiang Gu, Xiangxi Shi, Jason Kuen, Lu Qi, Ruiyi Zhang, Anqi Liu, Ani Nenkova, Tong Sun
International Conference on Learning Representations (ICLR) 2024

AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

Sicheng Zhu*, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun
Conference on Language Modeling (COLM) 2024
The first gradient-based adversarial attack to generate readable and malicious prompts.

TextLap: Customizing Language Models for Text-to-Layout Planning

Jian Chen*, Ruiyi Zhang, Yufan Zhou, Jiuxiang Gu, Jennifer Healey, Changyou Chen
Empirical Methods in Natural Language Processing (EMNLP) 2024

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

Jian Chen*, Ruiyi Zhang, Yufan Zhou, Rajiv Jain, Ryan Rossi, Changyou Chen
International Conference on Learning Representations (ICLR) 2024

Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models

Sicheng Zhu*, Bang An*, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Furong Huang
Conference on Language Modeling (COLM) 2024

Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances

Zhendong Chu*, Ruiyi Zhang, Tong Yu, Rajiv Jain, Vlad I Morariu, Jiuxiang Gu, Ani Nenkova
North American Chapter of the Association for Computational Linguistics (NAACL) 2024

Towards building the federatedGPT: Federated instruction tuning

Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Tong Yu, Guoyin Wang, Yiran Chen
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024

Improve Temporal Awareness of LLMs for Sequential Recommendation

Zhendond Chu*, Zichao Wang, Ruiyi Zhang, Yangfeng Ji, Hongning Wang, Tong Sun
Arxiv 2024

ARTIST: Improving Generation of Text-rich Image by Disentanglement

Jianyi Zhang*, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang
Winter Conference on Applications of Computer Vision (WACV) 2025

LLaVAR: Enhanced Visual Instruction Tuning for Text-rich Image Understanding

Yanzhe Zhang*, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun
Neural Information Processing Systems (NeurIPS) 2023
LLaVAR is the first multimodal LLM for text-rich image understanding.

Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels

Jian Chen*, Ruiyi Zhang, Tong Yu, Rohan Sharma, Zhiqiang Xu, Tong Sun, Changyou Chen
Neural Information Processing Systems (NeurIPS) 2023

VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding

Yizhou Wang*, Ruiyi Zhang, Haoliang Wang, Uttaran Bhattacharya, Yun Fu, Gang Wu
Arxiv 2023

LAFITE: Towards Language-Free Training for Text-to-Image Generation

Yufan Zhou*, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun
Conference on Computer Vision and Pattern Recognition (CVPR) 2022
Was the state-of-the-art model with 1% of DALL·E size. It serves as a baseline in the DALL·E 2 technical report.

TiGAN: Text-Based Interactive Image Generation and Manipulation

Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Chris Tensmeyer, Tong Yu, Changyou Chen, Jinhui Xu, Tong Sun
Association for the Advancement of Artificial Intelligence (AAAI) 2022

Offline Interactive Recommendation with Natural-language Feedback

Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin
Association for the Advancement of Artificial Intelligence (AAAI) 2022

Dynamics-Aware Adaptation for Reinforcement Learning Based Cross-Domain Interactive Recommendation

Junda Wu, Zhihui Xie, Tong Yu, Handong Zhao, Ruiyi Zhang, Shuai Li
Special Interest Group on Information Retrieval (SIGIR) 2022

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

Yufan Zhou, Ruiyi Zhang, Tong Sun, Jinhui Xu
Arxiv 2024

Robustness of Demonstration-based Learning Under Limited Data Scenario

Hongxin Zhang*, Yanzhe Zhang*, Ruiyi Zhang, Diyi Yang
Empirical Methods in Natural Language Processing (EMNLP) 2022

Information-Theoretic Representation Disentanglement for Zero-Shot Voice Style Transfer

Siyang Yuan*, Pengyu Cheng*, Ruiyi Zhang, Zhe Gan, Weituo Hao, Lawrence Carin
International Conference on Learning Representations (ICLR) 2021

Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes

Xinyuan Zhang, Ruiyi Zhang, Manzil Zaheer, Amr Ahmed
Association for the Advancement of Artificial Intelligence (AAAI) 2021

GenDICE: Generalized Offline Estimation of Stationary Values

Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans
International Conference on Learning Representations (ICLR) 2020
The first off-policy policy evaluation algorithm for arbitrary behavior policies, handling both average and discounted reward criteria.

Nested-Wasserstein Self-imitation Learning for Sequence Generation

Ruiyi Zhang, Changyou Chen, Zhe Gan, Zheng Wen, Wenlin Wang, Lawrence Carin
International Conference on Artificial Intelligence and Statistics (AISTATS) 2020

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel
International Conference on Machine Learning (ICML) 2020

Semantic Matching for Sequence-to-Sequence Learning

Ruiyi Zhang, Xinyuan Zhang, Ke Bai, Changyou Chen, Lawrence Carin
Empirical Methods in Natural Language Processing (EMNLP) 2020

Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference

Bang An, Jie Lyu, Zhenyi Wang, Chunyuan Li, Ruiyi Zhang, Changwei Hu, Changyou Chen
Empirical Methods in Natural Language Processing (EMNLP) 2020

Improving Adversarial Text Generation via Modeling Distant Future

Ruiyi Zhang, Changyou Chen, Zhe Gan, Wenlin Wang, Liqun Chen, Dinghan Shen, Guoyin Wang, Lawrence Carin
Association for Computational Linguistics (ACL) 2020

Text-based Interactive Recommendation via Constraint-Augmented Reinforcement Learning

Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin, Lawrence Carin
Neural Information Processing Systems (NeurIPS) 2019

Scalable Thompson Sampling via Optimal Transport

Ruiyi Zhang, Zheng Wen, Changyou Chen, Chen Fang, Tong Yu, Lawrence Carin
International Conference on Artificial Intelligence and Statistics (AISTATS) 2019
The first neural Thompson sampling algorithm.

Improving Sequence-to-Sequence model via Optimal Transport

Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Dinghan Shen, Lawrence Carin
International Conference on Learning Representations (ICLR) 2019

Policy Optimization as Wasserstein Gradient Flows

Ruiyi Zhang, Changyou Chen, Chunyuan Li, Lawrence Carin
International Conference on Machine Learning (ICML) 2018

Topic-Guided Variational Autoencoders for Text Generation

Wenlin Wang, Zhe Gan, Hongteng Xu, Ruiyi Zhang, Guoyin Wang, Dinghan Shen, Changyou Chen, Lawrence Carin
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 2019

Knowledge Graph Prompting for Multi-Document Question Answering

Yu Wang, Nedim Lipka, Ryan A. Rossi, Alexa Siu, Ruiyi Zhang, Tyler Derr
AAAI Conference on Artificial Intelligence (AAAI) 2024

Topology-aware Retrieval Augmentation for Text Generation

Yu Wang, Nedim Lipka, Ruiyi Zhang, Alexa Siu, Yuying Zhao, Bo Ni, Xin Wang, Ryan Rossi, Tyler Derr
ACM International Conference on Information and Knowledge Management (CIKM) 2024

Understanding and Accelerating Particle-Based Variational Inference

Chang Liu, Jingwei Zhuo, Pengyu Cheng, Ruiyi Zhang, Jun Zhu
International Conference on Machine Learning (ICML) 2019

A Unified Particle-Optimization Framework for Scalable Bayesian Sampling

Changyou Chen, Ruiyi Zhang, Wenlin Wang, Bai Li, Liqun Chen
Uncertainty in Artificial Intelligence (UAI) 2018

Visual Prompting in Multimodal Large Language Models: A Survey

Junda Wu, Zhehao Zhang, Yu Xia, Xintong Li, Zhaoyang Xia, Aaron Chang, Tong Yu, Sungchul Kim, Ryan A. Rossi, Ruiyi Zhang, Subrata Mitra, Dimitris N. Metaxas, Lina Yao, Jingbo Shang, Julian McAuley
Arxiv 2024

Figure Captioning with Reasoning and Sequence-Level Training

Charles Chen, Ruiyi Zhang, Eunyee Koh, Sungchul Kim, Scott Cohen, Tong Yu, Ryan Rossi, Razvan Bunescu
Winter Conference on Applications of Computer Vision (WACV) 2020

Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions

Zhenyi Wang, Ping Yu, Yang Zhao, Ruiyi Zhang, Yufan Zhou, Junsong Yuan, Changyou Chen
Association for the Advancement of Artificial Intelligence (AAAI) 2020

Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory

Jianyi Zhang, Ruiyi Zhang, Lawrence Carin, Changyou Chen
Artificial Intelligence and Statistics (AISTATS) 2020

Numerical pruning for efficient autoregressive models

Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu
AAAI Conference on Artificial Intelligence (AAAI) 2024

Dynasaur: Large language agents beyond predefined actions

Dang Nguyen, Viet Dac Lai, Seunghyun Yoon, Ryan A Rossi, Handong Zhao, Ruiyi Zhang, Puneet Mathur, Nedim Lipka, Yu Wang, Trung Bui, Franck Dernoncourt, Tianyi Zhou
Arxiv 2024

Taipan: Efficient and Expressive State Space Language Models with Selective Attention

Chien Van Nguyen, Huy Huu Nguyen, Thang M. Pham, Ruiyi Zhang, Hanieh Deilamsalehy, Puneet Mathur, Ryan A. Rossi, Trung Bui, Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen
Arxiv 2024

VipAct: Visual-perception enhancement via specialized vlm agent collaboration and tool-use

Zhehao Zhang, Ryan Rossi, Tong Yu, Franck Dernoncourt, Ruiyi Zhang, Jiuxiang Gu, Sungchul Kim, Xiang Chen, Zichao Wang, Nedim Lipka
Arxiv 2024

Bias and fairness in large language models: A survey

Isabel O Gallegos, Ryan A Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Nesreen K Ahmed
Arxiv 2024

Variational Inference and Model Selection with Generalized Evidence Bounds

Liqun Chen, Chenyang Tao, Ruiyi Zhang, Ricardo Henao, Lawrence Carin
International Conference on Machine Learning (ICML) 2018