Jianshuo Dong   (董建硕)

I am a third-year Ph.D. student of INSC at Tsinghua University since 09/2023.
I focus on machine learning security, trustworthy AI systems, and explainable AI.
My advisor is Prof. Han Qiu.


Contact: dongjs23@mails.tsinghua.edu.cn
Office: Room 205, FIT Building, Tsinghua University, Beijing, China

News
10/17/2025 I became a PhD candidate at Tsinghua University!
08/23/2025 One paper got accepted to EMNLP 2025 main (oral)!
08/10/2025 I created a new website for my research!
Publications
Conference Papers
[EMNLP'25] “I’ve Decided to Leak”: Probing Internals Behind Prompt Leakage Intents
Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Ke Xu, Minlie Huang, Chao Zhang, Han Qiu
PDF / Code / Oral
TL;DR: Diving into the internals to understand LLMs' prompt leakage intents.
[ICLR'25] An Engorgio Prompt Makes Large Language Model Babble on
Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, Ke Xu, Han Qiu
PDF / Code
TL;DR: An inference cost attack targeting modern auto-regressive LLMs.
[ICCV'24] One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training
Jianshuo Dong, Han Qiu, Yiming Li, Tianwei Zhang, Yuanjie Li, Zeqi Lai, Chao Zhang, Shu-Tao Xia
PDF / Code
TL;DR: Insert unactivated backdoor in the model training process and make it activated by bit-flip attack.
Pre-prints
[arXiv 2509.23694] SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
Jianshuo Dong, Sheng Guo, Hao Wang, Xun Chen, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu
arXiv / Code
TL;DR: To proactively discover the potential vulnerabilities of LLM-based search agents.
[arXiv 2506.21571] Towards Understanding the Cognitive Habits of Large Reasoning Models
Jianshuo Dong, Yujia Fu, Chuanrui Hu, Chao Zhang, Han Qiu
arXiv / Code
TL;DR: Does reasoning models have human-like cognitive habits?
[arXiv 2507.04214] Can Large Language Models Automate the Refinement of Cellular Network Specifications?
Jianshuo Dong, Tianyi Zhang, Feng Yan, Yuanjie Li, Hewu Li, Han Qiu
arXiv
TL;DR: Evaluate and improve the performance of LLMs in automatically refining cellular network specifications concerning security/trustworthiness issues.
Education
2019-2023 Wuhan University, Wuhan, China
2023-2028 (expected) Tsinghua University, Beijing, China
Teaching
2025 Spring Trustworthy Machine Learning, Tsinghua University, Teaching Assistant
Professional Services
[ICLR'26] Reviewer
[NeurIPS'25] Top Reviewer
[ICLR'25] Notable Reviewer