Ying Shen
Ying Shen
Home
Publications
Experience
Light
Dark
Automatic
Publications
Type
Conference paper
Date
2024
2023
2019
2018
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Diffusion models have emerged as a powerful tool for generating high-quality images from textual descriptions. Despite their successes, …
Jiatao Gu
,
Ying Shen
,
Shuangfei Zhai
,
Yizhe Zhang
,
Navdeep Jaitly
,
Joshua M. Susskind
PDF
Cite
Poster
Many-to-many Image Generation with Auto-regressive Diffusion Models
Recent advancements in image generation have made significant progress, yet existing models present limitations in perceiving and …
Ying Shen
,
Yizhe Zhang
,
Shuangfei Zhai
,
Lifu Huang
,
Joshua M. Susskind
,
Jiatao Gu
PDF
Cite
Poster
InternalInspector I2: Robust Confidence Estimation in LLMs through Internal States
Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing …
Mohammad Beigi
,
Ying Shen
,
Runing Yang
,
Zihao Lin
,
Qifan Wang
,
Ankith Mohan
,
Jianfeng He
,
Ming Jin
,
Chang-Tien Lu
,
Lifu Huang
PDF
Cite
Multimodal Instruction Tuning with Conditional Mixture of LoRA
Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an …
Ying Shen
,
Zhiyang Xu
,
Qifan Wang
,
Yu Cheng
,
Wenpeng Yin
,
Lifu Huang
PDF
Cite
Poster
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist …
Zhiyang Xu
,
Chao Feng
,
Rulin Shao
,
Trevor Ashby
,
Ying Shen
,
Di Jin
,
Yu Cheng
,
Qifan Wang
,
Lifu Huang
PDF
Cite
Project
X-EVAL: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects
Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and …
Minqian Liu
,
Ying Shen
,
Zhiyang Xu
,
Yixin Cao
,
Eunah Cho
,
Vaibhav Kumar
,
Reza Ghanadan
,
Lifu Huang
PDF
Cite
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the …
Jingyuan Qi
,
Minqian Liu
,
Ying Shen
,
Zhiyang Xu
,
Lifu Huang
PDF
Cite
MULTIINSTRUCT: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Instruction tuning, a new learning paradigm that fine-tunes pre-trained language models on tasks specified through instructions, has …
Zhiyang Xu
,
Ying Shen
,
Lifu Huang
PDF
Cite
Code
Poster
Slides
The Art of Socratic Questioning: Recursive Thinking with Lange Language Models
Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps. …
Jingyuan Qi
,
Zhiyang Xu
,
Ying Shen
,
Minqian Liu
,
Di Jin
,
Qifan Wang
,
Lifu Huang
PDF
Cite
Code
Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviours
Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker …
Yansen Wang
,
Ying Shen
,
Zhun Liu
,
Paul Liang
,
Amir Zadeh
,
Louis-Philippe Morency
PDF
Cite
Code
Poster
Slides
Efficient Low-rank Multimodal Fusion with Modality-Specific Factors
Multimodal research is an emerging field of artificial intelligence, and one of the main research problems in this field is multimodal …
Zhun Liu
,
Ying Shen
,
Varun Bharadhwaj
,
Paul Liang
,
Amir Zadeh
,
Louis-Philippe Morency
PDF
Cite
Code
Slides
Cite
×