My research mostly focuses on visual grounding (symbol grounding) and context modeling (communicative grounding) in multi-modal visual conversational agents. I am interested to build machines that can see and talk. My research interests broadly span multi-modal representation and transfer learning for vision and language problems. Previously I also explored Natural Language Generation for data-to-text and knowledge grounded multi-modal dialog response generation. I was an integral part of Team Alana - finalists to Amazon’s Alexa Prize Socialbot Grand Challenge 2018. My submission was also the runner-up for the Visual Dialog Challenge 2018 (Pikabot) and E2E NLG Challenge 2017 (NLE). I am an active blogger, often contributing to open-source repositories for the democratization of AI. I also co-organized the Workshop on Evaluating NLG Evaluation (EvalNLGEval at INLG’20) and Workshop on Human Evaluation of NLP Systems (HumEval at EACL’21); served on the Program Committee for NeurIPS and ACL workshops; reviewed for NLP/AI conferences (NeurIPS, ICLR, ACL, COLING), and published at top NLP conferences such as ACL, EACL, EMNLP, INLG, SIGDial.