Publications

You can find an updated list of articles on my Google Scholar profile.

Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting

Published in Transactions on Machine Learning Research, 2025

Recommended citation: Bhambri, Siddhant, Mudit Verma, and Subbarao Kambhampati. "Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting." https://openreview.net/pdf/d49877b783d60e3cf6fdb22185a1a37de0ced704.pdf

Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation

Published in arXiv preprint arXiv:2505.13792, 2025

Recommended citation: Bhambri, Siddhant, Upasana Biswas, and Subbarao Kambhampati. "Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation." arXiv preprint arXiv:2505.13792 (2025). https://arxiv.org/pdf/2505.13792?

Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!

Published in arXiv preprint arXiv:2504.09762, 2025

Recommended citation: Kambhampati, Subbarao, et al. "Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!." arXiv preprint arXiv:2504.09762 (2025). https://arxiv.org/pdf/2504.09762

Position: LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks

Published in (ICML) Forty-first International Conference on Machine Learning, 2024

Recommended citation: Kambhampati, Subbarao, et al. "Position: LLMs can’t plan, but can help planning in LLM-modulo frameworks." Forty-first International Conference on Machine Learning. 2024. https://openreview.net/pdf?id=Th8JPEmH4z

Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

Published in arXiv Pre-print, 2024

Recommended citation: Gundawar, Atharva, et al. "Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning." arXiv preprint arXiv:2405.20625 (2024). https://arxiv.org/pdf/2405.20625

Efficient Reinforcement Learning via Large Language Model-based Search

Published in arXiv Pre-print, 2024

Recommended citation: Bhambri, Siddhant, et al. "Efficient Reinforcement Learning via Large Language Model-based Search." arXiv preprint arXiv:2405.15194 (2024). https://arxiv.org/abs/2405.15194

Who is Helping Whom? Analyzing Inter-dependencies to Evaluate Cooperation in Human-AI Teaming

Published in arXiv preprint arXiv:2502.06976, 2024

Recommended citation: Biswas, Upasana, Siddhant Bhambri, and Subbarao Kambhampati. "Who is Helping Whom? Analyzing Inter-dependencies to Evaluate Cooperation in Human-AI Teaming." arXiv preprint arXiv:2502.06976 (2025). https://arxiv.org/pdf/2502.06976

Theory of Mind abilities of Large Language Models in Human-Robot Interaction: An Illusion?

Published in Human Robot Interaction (HRI), 2024

Recommended citation: Verma, Mudit, Siddhant Bhambri, and Subbarao Kambhampati. "Theory of Mind abilities of Large Language Models in Human-Robot Interaction: An Illusion?." arXiv preprint arXiv:2401.05302 (2024). https://arxiv.org/pdf/2401.05302

Benchmarking Multi-Agent Preference-based Reinforcement Learning for Human-AI Teaming

Published in arXiv Pre-print, 2023

Recommended citation: Bhambri, Siddhant, Mudit Verma, Anil Murthy, and Subbarao Kambhampati. "Benchmarking Multi-Agent Preference-based Reinforcement Learning for Human-AI Teaming." arXiv preprint arXiv:2312.14292 (2023). https://arxiv.org/pdf/2312.14292

Preference Proxies: Evaluating Large Language Models in capturing Human Preferences in Human-AI Tasks

Published in ICML - Workshop on Theory of Mind in Communicating Agents, and Many Facets of Preference Learning Workshop, 2023

Recommended citation: Verma, Mudit, Siddhant Bhambri, and Subbarao Kambhampati. "Preference Proxies: Evaluating Large Language Models in capturing Human Preferences in Human-AI Tasks." In ICML 2023 Workshop The Many Facets of Preference-Based Learning. 2023. https://sbhambr1.github.io/files/Preference%20Proxies:%20Evaluating%20Large%20Language%20Models%20in%20capturing%20Human%20Preferences%20in%20Human-AI%20Tasks.pdf

Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach

Published in IEEE Conference on Games (CoG), 2023

Recommended citation: Bhambri, Siddhant, Amrita Bhattacharjee, and Dimitri Bertsekas. "Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach." arXiv preprint arXiv:2211.10298 (2022). https://arxiv.org/abs/2211.10298

Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning

Published in The AAAI Workshop on Representation Learning for Responsible Human-Centric AI (R2HCAI), and ICML - Many Facets of Preference Learning Workshop, 2023

Recommended citation: Verma, Mudit, Siddhant Bhambri, and Subbarao Kambhampati. "Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning." arXiv preprint arXiv:2302.08738 (2023). https://arxiv.org/abs/2302.08738

Using Deception in Markov Game to Understand Adversarial Behaviors through a Capture-The-Flag Environment

Published in Decision and Game Theory for Security: 13th International Conference, GameSec, 2022

Recommended citation: Bhambri, Siddhant, Purv Chauhan, Frederico Araujo, Adam Doupé, and Subbarao Kambhampati. "Using Deception in Markov Game to Understand Adversarial Behaviors Through a Capture-The-Flag Environment." In International Conference on Decision and Game Theory for Security, pp. 87-106. Cham: Springer International Publishing, 2022. https://arxiv.org/pdf/2210.15011

Contrastively Learning Visual Attention as Affordance Cues from Demonstrations for Robotic Grasping

Published in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

Recommended citation: Y. Zha, S. Bhambri and L. Guan, "Contrastively Learning Visual Attention as Affordance Cues from Demonstrations for Robotic Grasping," 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 7835-7842, doi: 10.1109/IROS51168.2021.9636760. https://ieeexplore.ieee.org/document/9636760

Multi-objective Reinforcement Learning based approach for User-Centric Power Optimization in Smart Home Environments

Published in IEEE International Conference on Smart Data Services (SMDS), 2020

Recommended citation: S. Gupta, S. Bhambri, K. Dhingra, A. B. Buduru and P. Kumaraguru, "Multi-objective Reinforcement Learning based approach for User-Centric Power Optimization in Smart Home Environments," 2020 IEEE International Conference on Smart Data Services (SMDS), 2020, pp. 89-96, doi: 10.1109/SMDS49396.2020.00018. https://ieeexplore.ieee.org/document/9288505

A Survey of Black-Box Adversarial Attacks on Computer Vision Models

Published in arXiv Pre-print, 2020

Recommended citation: Bhambri, S., Muku, S., Tulasi, A., & Buduru, A. B. (2019). A survey of black-box adversarial attacks on computer vision models. arXiv preprint arXiv:1912.01667. https://arxiv.org/abs/1912.01667

Multiple Resource Management and Burst Time Prediction using Deep Reinforcement Learning

Published in Eighth International Conference on Advances in Computing, Communication and Information Technology CCIT, 2019

Recommended citation: Kumar V, Bhambri S, Shambharkar PG. Multiple resource management and burst time prediction using deep reinforcement learning. In: Eighth International Conference on advances in computing, communication and information technology CCIT, 2019, pp. 51–58. https://www.seekdl.org/conferences/paper/details/10091.html