Abhinav Gupta

Heya! I am a final-year Ph.D. student at the Department of Computer Science and Operations Research in University of Montreal and part of Mila - Quebec AI Institute advised by Christopher Pal. My thesis is focused on investigating the phenomenon of emergent languages that arise in multi-agent communication systems.

I was fortunate to do research internships at: Google DeepMind in London advised by Grzegorz Swirszcz, Wojciech Czarnecki, and Oriol Vinyals on opponent behavior modeling with offline AlphaStar; Meta AI in Seattle advised by Madian Khabsa and Roberta Raileanu on improving generalization in RL using uncertainty estimates; Microsoft Research NYC advised by Jordan Ash on replacing PPO with weighted SFT in RLHF; Labs team in Google Research at Mountain View advised by Navneet Potti on enhancing code correctness in LLMs through RL with execution feedback; Microsoft Research Cambridge working with Raluca Georgescu, Sam Devlin, and Katja Hofmann on learning human-like behavior using offline RL and imitation learning.

I received my Master's degree from Courant Institute of Mathematical Sciences at New York University. I was part of the CILVR Lab where I worked with Kyunghyun Cho on analyzing compositionality in emergent languages; Jason Weston on creating a unified model for QA/VQA tasks. I also interned at Adobe Research in San Jose advised by Scott Cohen and Brian Price on building an intelligent vision-language assistant. During my undergrad, I spent some time at the Language Technologies Institute in Carnegie Mellon University advised by Florian Metze (also at Meta AI) on context-aware speech recognition. I also spent a semester at the School of Computing in National University of Singapore working with Khe Chai Sim (now at Google Research) on visualizing activations of a live recorded audio using Kaldi.


I'm interested in the intersection of reinforcement learning and language focusing on post-training, evaluation, and planning. My current interests include augmenting LLMs with multi-turn, tool-use, and self-critique capabilities to improve their mathematical reasoning; where tools could help enhance their factuality and code could be used to provide solutions to such problems.


2018 - 2024

2016 - 2018

2011 - 2016



Fall 2023 - Spring 2024


Fall 2022


Fall 2021, Spring 2023, Summer 2023


Spring - Summer 2021


Fall 2019

Summer 2017

Spring - Summer 2016


Fall 2015


neurips21 Dynamic population-based meta-learning for multi-agent communication with natural language
Abhinav Gupta, Marc Lanctot, Angeliki Lazaridou
Neural Information Processing Systems (NeurIPS), 2021
Learning to Learn Workshop (ICLR), 2021

Talk / Slides / Twitter / Workshop / Poster

By adopting an iterative mechanism of distillation and expansion, we can obtain a diverse population of agents capable of performing few-shot coordination with unseen partners.

iclr20 On the interaction between supervision and self-play in emergent communication
Ryan Lowe*, Abhinav Gupta*, Jakob Foerster, Douwe Kiela, Joelle Pineau (*equal contribution)
International Conference on Learning Representations (ICLR), 2020
Beyond Vision and LANguage: inTEgrating Real-world kNowledge - LANTERN Workshop (EMNLP) 2019

Talk / Slides / Twitter / GitHub / Workshop

Training agents via supervised learning on human data followed by self-play outperforms the converse, suggesting that it is not beneficial to emerge languages from scratch.

aamas20 Capacity, Bandwidth, and Compositionality in Emergent Language Learning
Cinjon Resnick*, Abhinav Gupta*, Jakob Foerster, Andrew M. Dai, Kyunghyun Cho (*equal contribution)
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2020
Representation Learning for NLP - Repl4NLP Workshop (ACL), 2020

Talk / Slides / Twitter / GitHub / Workshop

Investigate relationship between model capacity and channel bandwidth that induces compositional structure in the resulting language and consequently encourages systematic generalization.

rldm19 Learning to Learn to Communicate
Abhinav Gupta*, Ryan Lowe*, Jakob Foerster, Douwe Kiela, Joelle Pineau (*equal contribution)
Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2019
(Oral Presentation)
Adaptive and Multitask Learning - AMTL Workshop (ICML), 2019

Slides / Workshop / Poster

Train a meta-learning agent in simulation to interact with populations of pre-trained agents and then deploy in unseen human populations to learn human language.

jcole22 Signalling signalhood in machine learning agents
Edward Hughes, Abhinav Gupta, Ekaterina (Kate) Tolstaya, Thom Scott-Phillips
Behavioral and Brain Sciences, Volume 46, 2023
Machine Learning and the Evolution of Language - ml4evolang Workshop (JCoLE), 2022
(Oral Presentation)

Poster / Journal

Identify a minimal set of socio-cognitive biases, derived from human behavior, that help AI agents learn about pragmatic reasoning and ostension using an inverse model to interpret its partner's communicative intent in an online manner.

cogsci21 Structural Inductive Biases in Emergent Communication
Agnieszka SÅ‚owik*, Abhinav Gupta*, William L. Hamilton, Mateja Jamnik, Sean Holden, Christopher Pal (*equal contribution)
Annual Meeting of the Cognitive Science Society - Member Abstract (CogSci), 2022
Adaptive and Learning Agents - ALA Workshop (AAMAS), 2020   (Short Talk)
Reinforcement Learning in Games - RLG Workshop (AAAI), 2020

Talk / Poster / AAMAS Workshop / AAAI Workshop / RLG Poster

Agents parametrized by graph neural networks develop a more compositional language compared to bag-of-words and sequence models.

corl22 Learning Multi-Objective Curricula for Robotic Policy Learning
Jikun Kang, Miao Liu, Abhinav Gupta, Christopher Pal, Xue (Steve) Liu, Jie Fu
Conference on Robot Learning (CoRL), 2022

Talk / GitHub

Introduce a unified automatic curriculum learning framework to create a multi-objective but coherent curricula for improving sample efficiency using a shared hyper-network parameterized with a RNN.

redmond ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Qiuyuan Huang*, Jae Sung Park*, Abhinav Gupta*, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Christopher Pal, Yejin Choi, Jianfeng Gao (*equal contribution)
arXiv preprint, 2023


Infusing explicit knowledge for factuality and commonsense reasoning using external knowledge bases (RAG) helps in obtaining enhanced prompts to improve cross-modality performance e.g., for image generation tasks using DALLE-2.

icassp17 Visual features for context-aware speech recognition
Abhinav Gupta, Yajie Miao, Leonardo Neves, Florian Metze
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017
(Best Student Paper Nominee)


Adapting acoustic and language models to objects and scenes in context-aware Automatic Speech Recognition for video transcription.

patent Removing and replacing objects in images according to a directed user conversation
Scott Cohen, Brian Price, Abhinav Gupta
US patent, US10613726B2 (active)
China patent, CN109960453B (active)
UK patent, GB2569847B (active)
Australia patent, AU2018247342B2 (active)
Germany patent, DE102018007937A1 (pending)

Built Vera: Vision-enabled replacement assistant for directing a user conversation to obtain an edit query, and removing and replacing objects in an image based on the edit query.

world map hits counter

Website template borrowed from Jon Barron.