Biography
Hi! I'm Jianshuo Dong, a PhD student at Tsinghua University, advised by Prof. Han Qiu.
My research interests lie in:
- Safety & Security of Large Model Systems: Convey trustworthy large model services to downstream users.
- Autonomous Agentic AI: Developing strong, safe, and efficient autonomous agents.
- Explainable AI: Reverse-engineer and closely monitor AI models.
Email: dongjs23(at)mails(dot)tsinghua(dot)edu(dot)cn
Updates
News
2026-05-01
One paper (SafeSearch) was accepted to ICML 2026 as a regular paper!
2026-04-04
One paper (IFEval++) got accepted to ACL 2026 main (oral)!
2025-10-17
I became a PhD candidate at Tsinghua University!
2025-08-23
One paper (Leakage-intent probing) got accepted to EMNLP 2025 main (oral)!
2025-08-10
I created this website for my research!
Research
Publications
Conference Papers
[ICML'26]
Regular
Regular
SafeSearch: Automated Red-Teaming of LLM-Based Search Agents
TL;DR: To proactively discover the potential vulnerabilities of LLM-based search agents.
@article{dong2025safesearch,
title={SafeSearch: Automated Red-Teaming of LLM-Based Search Agents},
author={Dong, Jianshuo and Guo, Sheng and Wang, Hao and Chen, Xun and Liu, Zhuotao and Zhang, Tianwei and Xu, Ke and Huang, Minlie and Qiu, Han},
journal={arXiv preprint arXiv:2509.23694},
year={2025},
url={https://arxiv.org/abs/2509.23694}
}
[ACL'26]
Oral
Oral
Revisiting the Reliability of Language Models in Instruction-Following
TL;DR: Investigating nuance-oriented reliability of LLMs in instruction-following with varied phrasings.
@inproceedings{dong2026revisiting,
title={Revisiting the Reliability of Language Models in Instruction-Following},
author={Dong, Jianshuo and Zhang, Yutong and Liu, Yan and Zhong, Zhenyu and Wei, Tao and Zhang, Chao and Qiu, Han},
booktitle={Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics},
year={2026},
url={https://arxiv.org/abs/2512.14754}
}
[EMNLP'25]
Oral
Oral
"I've Decided to Leak": Probing Internals Behind Prompt Leakage Intents
TL;DR: Diving into the internals to understand LLMs' prompt leakage intents.
@inproceedings{dong-etal-2025-ive,
title = "``{I}{'}ve Decided to Leak'': Probing Internals Behind Prompt Leakage Intents",
author = "Dong, Jianshuo and Zhang, Yutong and Yan, Liu and Zhong, Zhenyu and Wei, Tao and Xu, Ke and Huang, Minlie and Zhang, Chao and Qiu, Han",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.1082/",
doi = "10.18653/v1/2025.emnlp-main.1082",
pages = "21318--21348"
}
[ICLR'25]
Poster
Poster
An Engorgio Prompt Makes Large Language Model Babble on
TL;DR: An inference cost attack targeting modern auto-regressive LLMs.
@inproceedings{dong2025engorgio,
title={An Engorgio Prompt Makes Large Language Model Babble on},
author={Jianshuo Dong and Ziyuan Zhang and Qingjie Zhang and Tianwei Zhang and Hao Wang and Hewu Li and Qi Li and Chao Zhang and Ke Xu and Han Qiu},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=m4eXBo0VNc}
}
[ICCV'23]
Poster
Poster
One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training
TL;DR: Insert unactivated backdoor in the model training process and make it activated by bit-flip attack.
@inproceedings{dong2023onebit,
title={One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training},
author={Dong, Jianshuo and Han, Qiu and Li, Yiming and Zhang, Tianwei and Li, Yuanjie and Lai, Zeqi and Zhang, Chao and Xia, Shu-Tao},
booktitle={ICCV},
year={2023}
}
In Progress
Pre-prints
[arXiv:2506.21571]
Towards Understanding the Cognitive Habits of Large Reasoning Models
TL;DR: Does reasoning models have human-like cognitive habits?
@article{dong2025cognitive,
title={Towards Understanding the Cognitive Habits of Large Reasoning Models},
author={Dong, Jianshuo and Fu, Yujia and Hu, Chuanrui and Zhang, Chao and Qiu, Han},
journal={arXiv preprint arXiv:2506.21571},
year={2025},
url={https://arxiv.org/abs/2506.21571}
}
[arXiv:2507.04214]
Can Large Language Models Automate the Refinement of Cellular Network Specifications?
TL;DR: Evaluate and improve the performance of LLMs in refining cellular network specifications concerning security issues.
@article{dong2025cellular,
title={Can Large Language Models Automate the Refinement of Cellular Network Specifications?},
author={Dong, Jianshuo and Li, Yuanjie and Liu, Jun and Li, Hewu and Qiu, Han},
journal={arXiv preprint arXiv:2507.04214},
year={2025},
url={https://arxiv.org/abs/2507.04214}
}
Background
Education
2023 - 2028 (expected)
Tsinghua University, Beijing, China — Ph.D. Student
2019 - 2023
Wuhan University, Wuhan, China — B.E.
Community
Academic Services
Teaching
2025 Spring
Trustworthy Machine Learning, Tsinghua University — Teaching Assistant
Reviewing
[ICLR'26]
Reviewer
[ICML'26]
Reviewer Gold Reviewer
[NeurIPS'25]
Reviewer Top Reviewer
[ICLR'25]
Reviewer Notable