Jianshuo Dong, PhD Student at Tsinghua

Biography

Hi! I'm Jianshuo Dong, a PhD student at Tsinghua University, advised by Prof. Han Qiu.

My research interests lie in:

Safety & Security of Large Model Systems: Convey trustworthy large model services to downstream users.
Autonomous Agentic AI: Developing strong, safe, and efficient autonomous agents.
Explainable AI: Reverse-engineer and closely monitor AI models.

Email: dongjs23(at)mails(dot)tsinghua(dot)edu(dot)cn

Updates

News

2026-05-01 One paper (SafeSearch) was accepted to ICML 2026 as a regular paper!

2026-04-04 One paper (IFEval++) got accepted to ACL 2026 main (oral)!

2025-10-17 I became a PhD candidate at Tsinghua University!

2025-08-23 One paper (Leakage-intent probing) got accepted to EMNLP 2025 main (oral)!

2025-08-10 I created this website for my research!

Research

Publications

Conference Papers

[ICML'26]
Regular

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

Jianshuo Dong, Sheng Guo, Hao Wang, Xun Chen, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu

arXiv / Code / News (机器之心) / Bib (arXiv)

TL;DR: To proactively discover the potential vulnerabilities of LLM-based search agents.

@article{dong2025safesearch,
  title={SafeSearch: Automated Red-Teaming of LLM-Based Search Agents},
  author={Dong, Jianshuo and Guo, Sheng and Wang, Hao and Chen, Xun and Liu, Zhuotao and Zhang, Tianwei and Xu, Ke and Huang, Minlie and Qiu, Han},
  journal={arXiv preprint arXiv:2509.23694},
  year={2025},
  url={https://arxiv.org/abs/2509.23694}
}

[ACL'26]
Oral

Revisiting the Reliability of Language Models in Instruction-Following

Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Chao Zhang, Han Qiu

arXiv / Code / Bib (arXiv)

TL;DR: Investigating nuance-oriented reliability of LLMs in instruction-following with varied phrasings.

@inproceedings{dong2026revisiting,
  title={Revisiting the Reliability of Language Models in Instruction-Following},
  author={Dong, Jianshuo and Zhang, Yutong and Liu, Yan and Zhong, Zhenyu and Wei, Tao and Zhang, Chao and Qiu, Han},
  booktitle={Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics},
  year={2026},
  url={https://arxiv.org/abs/2512.14754}
}

[EMNLP'25]
Oral

"I've Decided to Leak": Probing Internals Behind Prompt Leakage Intents

Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Ke Xu, Minlie Huang, Chao Zhang, Han Qiu

PDF / Code / Bib

TL;DR: Diving into the internals to understand LLMs' prompt leakage intents.

@inproceedings{dong-etal-2025-ive,
  title = "``{I}{'}ve Decided to Leak'': Probing Internals Behind Prompt Leakage Intents",
  author = "Dong, Jianshuo and Zhang, Yutong and Yan, Liu and Zhong, Zhenyu and Wei, Tao and Xu, Ke and Huang, Minlie and Zhang, Chao and Qiu, Han",
  booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
  month = nov,
  year = "2025",
  address = "Suzhou, China",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2025.emnlp-main.1082/",
  doi = "10.18653/v1/2025.emnlp-main.1082",
  pages = "21318--21348"
}

[ICLR'25]
Poster

An Engorgio Prompt Makes Large Language Model Babble on

Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, Ke Xu, Han Qiu

PDF / Code / Bib

TL;DR: An inference cost attack targeting modern auto-regressive LLMs.

@inproceedings{dong2025engorgio,
  title={An Engorgio Prompt Makes Large Language Model Babble on},
  author={Jianshuo Dong and Ziyuan Zhang and Qingjie Zhang and Tianwei Zhang and Hao Wang and Hewu Li and Qi Li and Chao Zhang and Ke Xu and Han Qiu},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=m4eXBo0VNc}
}

[ICCV'23]
Poster

One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training

Jianshuo Dong, Han Qiu, Yiming Li, Tianwei Zhang, Yuanjie Li, Zeqi Lai, Chao Zhang, Shu-Tao Xia

PDF / Code / Bib

TL;DR: Insert unactivated backdoor in the model training process and make it activated by bit-flip attack.

@inproceedings{dong2023onebit,
  title={One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training},
  author={Dong, Jianshuo and Han, Qiu and Li, Yiming and Zhang, Tianwei and Li, Yuanjie and Lai, Zeqi and Zhang, Chao and Xia, Shu-Tao},
  booktitle={ICCV},
  year={2023}
}

In Progress

Pre-prints

[arXiv:2506.21571]

Towards Understanding the Cognitive Habits of Large Reasoning Models

Jianshuo Dong, Yujia Fu, Chuanrui Hu, Chao Zhang, Han Qiu

arXiv / Code / Bib

TL;DR: Does reasoning models have human-like cognitive habits?

@article{dong2025cognitive,
  title={Towards Understanding the Cognitive Habits of Large Reasoning Models},
  author={Dong, Jianshuo and Fu, Yujia and Hu, Chuanrui and Zhang, Chao and Qiu, Han},
  journal={arXiv preprint arXiv:2506.21571},
  year={2025},
  url={https://arxiv.org/abs/2506.21571}
}

[arXiv:2507.04214]

Can Large Language Models Automate the Refinement of Cellular Network Specifications?

Jianshuo Dong, Yuanjie Li, Jun Liu, Hewu Li, Han Qiu

arXiv / Code / Bib

TL;DR: Evaluate and improve the performance of LLMs in refining cellular network specifications concerning security issues.

@article{dong2025cellular,
  title={Can Large Language Models Automate the Refinement of Cellular Network Specifications?},
  author={Dong, Jianshuo and Li, Yuanjie and Liu, Jun and Li, Hewu and Qiu, Han},
  journal={arXiv preprint arXiv:2507.04214},
  year={2025},
  url={https://arxiv.org/abs/2507.04214}
}

Background

Education

2023 - 2028 (expected) Tsinghua University, Beijing, China — Ph.D. Student

2019 - 2023 Wuhan University, Wuhan, China — B.E.

Community

Academic Services

Teaching

2025 Spring Trustworthy Machine Learning, Tsinghua University — Teaching Assistant

Reviewing

[ICLR'26] Reviewer

[ICML'26] Reviewer Gold Reviewer

[NeurIPS'25] Reviewer Top Reviewer

[ICLR'25] Reviewer Notable