Jianshuo Dong (董建硕)

Hi! I'm a PhD student at Tsinghua University, where I explore machine learning security, trustworthy AI systems, and explainable AI. I'm advised by Prof. Han Qiu and collaborate closely with Prof. Tianwei Zhang. I'm always excited to connect with fellow researchers and practitioners—feel free to reach out!

Email: dongjs23(at)mails(dot)tsinghua(dot)edu(dot)cn

Office: Room 205, FIT Building, Tsinghua University, Beijing, China

News

2025-10-17 I became a PhD candidate at Tsinghua University!

2025-08-23 One paper got accepted to EMNLP 2025 main (oral)!

2025-08-10 I created this website for my research!

Publications

Conference Papers

[EMNLP'25]

"I've Decided to Leak": Probing Internals Behind Prompt Leakage Intents

Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Ke Xu, Minlie Huang, Chao Zhang, Han Qiu

PDF / Code Oral

TL;DR: Diving into the internals to understand LLMs' prompt leakage intents.

[ICLR'25]

An Engorgio Prompt Makes Large Language Model Babble on

Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, Ke Xu, Han Qiu

PDF / Code

TL;DR: An inference cost attack targeting modern auto-regressive LLMs.

[ICCV'23]

One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training

Jianshuo Dong, Han Qiu, Yiming Li, Tianwei Zhang, Yuanjie Li, Zeqi Lai, Chao Zhang, Shu-Tao Xia

PDF / Code

TL;DR: Insert unactivated backdoor in the model training process and make it activated by bit-flip attack.

Pre-prints

[arXiv:2512.14754]

Revisiting the Reliability of Language Models in Instruction-Following

Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Chao Zhang, Han Qiu

arXiv / Code

TL;DR: Investigating nuance-oriented reliability of LLMs in instruction-following with varied phrasings.

[arXiv:2509.23694]

SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents

Jianshuo Dong, Sheng Guo, Hao Wang, Xun Chen, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu

arXiv / Code

TL;DR: To proactively discover the potential vulnerabilities of LLM-based search agents.

[arXiv:2506.21571]

Towards Understanding the Cognitive Habits of Large Reasoning Models

Jianshuo Dong, Yujia Fu, Chuanrui Hu, Chao Zhang, Han Qiu

arXiv / Code

TL;DR: Does reasoning models have human-like cognitive habits?

[arXiv:2507.04214]

Can Large Language Models Automate the Refinement of Cellular Network Specifications?

Jianshuo Dong, Tianyi Zhang, Feng Yan, Yuanjie Li, Hewu Li, Han Qiu

arXiv / Code

TL;DR: Evaluate and improve the performance of LLMs in automatically refining cellular network specifications concerning security/trustworthiness issues.

Education

2023 - 2028 (expected) Tsinghua University, Beijing, China — Ph.D. Student

2019 - 2023 Wuhan University, Wuhan, China — B.E.

Teaching

2025 Spring Trustworthy Machine Learning, Tsinghua University — Teaching Assistant

Services

[ICLR'26] Reviewer

[NeurIPS'25] Reviewer Top Reviewer

[ICLR'25] Reviewer Notable