Biography

Hi! I'm Jianshuo Dong, a PhD student at Tsinghua University, advised by Prof. Han Qiu.

My research interests lie in:

  • Safety & Security of Large Model Systems: Convey trustworthy large model services to downstream users.
  • Autonomous Agentic AI: Developing strong, safe, and efficient autonomous agents.
  • Explainable AI: Reverse-engineer and closely monitor AI models.

Email: dongjs23(at)mails(dot)tsinghua(dot)edu(dot)cn

News

2026-05-01 One paper (SafeSearch) was accepted to ICML 2026 as a regular paper!
2026-04-04 One paper (IFEval++) got accepted to ACL 2026 main (oral)!
2025-10-17 I became a PhD candidate at Tsinghua University!
2025-08-23 One paper (Leakage-intent probing) got accepted to EMNLP 2025 main (oral)!
2025-08-10 I created this website for my research!

Publications

Conference Papers

[ICML'26]
Regular
SafeSearch: Automated Red-Teaming of LLM-Based Search Agents
Jianshuo Dong, Sheng Guo, Hao Wang, Xun Chen, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu
TL;DR: To proactively discover the potential vulnerabilities of LLM-based search agents.
[ACL'26]
Oral
Revisiting the Reliability of Language Models in Instruction-Following
Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Chao Zhang, Han Qiu
TL;DR: Investigating nuance-oriented reliability of LLMs in instruction-following with varied phrasings.
[EMNLP'25]
Oral
"I've Decided to Leak": Probing Internals Behind Prompt Leakage Intents
Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Ke Xu, Minlie Huang, Chao Zhang, Han Qiu
TL;DR: Diving into the internals to understand LLMs' prompt leakage intents.
[ICLR'25]
Poster
An Engorgio Prompt Makes Large Language Model Babble on
Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, Ke Xu, Han Qiu
TL;DR: An inference cost attack targeting modern auto-regressive LLMs.
[ICCV'23]
Poster
One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training
Jianshuo Dong, Han Qiu, Yiming Li, Tianwei Zhang, Yuanjie Li, Zeqi Lai, Chao Zhang, Shu-Tao Xia
TL;DR: Insert unactivated backdoor in the model training process and make it activated by bit-flip attack.

Pre-prints

[arXiv:2506.21571]
Towards Understanding the Cognitive Habits of Large Reasoning Models
Jianshuo Dong, Yujia Fu, Chuanrui Hu, Chao Zhang, Han Qiu
TL;DR: Does reasoning models have human-like cognitive habits?
[arXiv:2507.04214]
Can Large Language Models Automate the Refinement of Cellular Network Specifications?
Jianshuo Dong, Yuanjie Li, Jun Liu, Hewu Li, Han Qiu
TL;DR: Evaluate and improve the performance of LLMs in refining cellular network specifications concerning security issues.

Education

2023 - 2028 (expected) Tsinghua University, Beijing, China — Ph.D. Student
2019 - 2023 Wuhan University, Wuhan, China — B.E.

Academic Services

Teaching

2025 Spring Trustworthy Machine Learning, Tsinghua University — Teaching Assistant

Reviewing

[ICLR'26] Reviewer
[ICML'26] Reviewer Gold Reviewer
[NeurIPS'25] Reviewer Top Reviewer
[ICLR'25] Reviewer Notable