Jianshuo Dong

Jianshuo Dong (董建硕)

Hi! I'm a PhD student at Tsinghua University, where I explore machine learning security, trustworthy AI systems, and explainable AI. I'm advised by Prof. Han Qiu and collaborate closely with Prof. Tianwei Zhang. I'm always excited to connect with fellow researchers and practitioners—feel free to reach out!

Email: dongjs23(at)mails(dot)tsinghua(dot)edu(dot)cn

Office: Room 205, FIT Building, Tsinghua University, Beijing, China

News

2025-10-17 I became a PhD candidate at Tsinghua University!
2025-08-23 One paper got accepted to EMNLP 2025 main (oral)!
2025-08-10 I created this website for my research!

Publications

Conference Papers

[EMNLP'25]
"I've Decided to Leak": Probing Internals Behind Prompt Leakage Intents
Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Ke Xu, Minlie Huang, Chao Zhang, Han Qiu
TL;DR: Diving into the internals to understand LLMs' prompt leakage intents.
[ICLR'25]
An Engorgio Prompt Makes Large Language Model Babble on
Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, Ke Xu, Han Qiu
TL;DR: An inference cost attack targeting modern auto-regressive LLMs.
[ICCV'23]
One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training
Jianshuo Dong, Han Qiu, Yiming Li, Tianwei Zhang, Yuanjie Li, Zeqi Lai, Chao Zhang, Shu-Tao Xia
TL;DR: Insert unactivated backdoor in the model training process and make it activated by bit-flip attack.

Pre-prints

[arXiv:2512.14754]
Revisiting the Reliability of Language Models in Instruction-Following
Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Chao Zhang, Han Qiu
TL;DR: Investigating nuance-oriented reliability of LLMs in instruction-following with varied phrasings.
[arXiv:2509.23694]
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
Jianshuo Dong, Sheng Guo, Hao Wang, Xun Chen, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu
TL;DR: To proactively discover the potential vulnerabilities of LLM-based search agents.
[arXiv:2506.21571]
Towards Understanding the Cognitive Habits of Large Reasoning Models
Jianshuo Dong, Yujia Fu, Chuanrui Hu, Chao Zhang, Han Qiu
TL;DR: Does reasoning models have human-like cognitive habits?
[arXiv:2507.04214]
Can Large Language Models Automate the Refinement of Cellular Network Specifications?
Jianshuo Dong, Tianyi Zhang, Feng Yan, Yuanjie Li, Hewu Li, Han Qiu
TL;DR: Evaluate and improve the performance of LLMs in automatically refining cellular network specifications concerning security/trustworthiness issues.

Education

2023 - 2028 (expected) Tsinghua University, Beijing, China — Ph.D. Student
2019 - 2023 Wuhan University, Wuhan, China — B.E.

Teaching

2025 Spring Trustworthy Machine Learning, Tsinghua University — Teaching Assistant

Services

[ICLR'26] Reviewer
[NeurIPS'25] Reviewer Top Reviewer
[ICLR'25] Reviewer Notable