I am a PhD student at MIT, on leave until Fall 2021. I am an avid proponent of reform in machine learning, which allows me to spend time on teaching, mentoring, and alternative proposals for research distribution. I am lucky to be a GAAP mentor and a Machine Learning mentor, both of which are initiatives trying to level the playing field when it comes to machine learning academia.
I am an EECS PhD student at MIT, where I am on leave until in person instruction resumes. I was lucky to have worked with Max Tegmark, and interned at NVIDIA Research in Seattle during March through October 2020. I did my Masters at Mila under the supervision of Liam Paull and Christopher Pal, with generous support from support from Mila, Université de Montréal, Duckietown, IVADO, Jane Street + Depth First Learning, and Vraj Youth.
I completed my undergraduate degree at the University of Michigan where I studied Computer Science and Applied Math. At Michigan, my research projects under Satinder Singh and Matthew Johnson-Roberson focused on hierarchical RL as well as robotic reinforcement learning. I spent a semester at the Jet Propulsion Laboratory, working
full-time under Larry H. Matthies in the Computer Vision Group, and spent my last semester helping teach the Introduction to Machine Learning course at Michigan with Sindhu Kutty. I now help teach Duckietown at UdeM, and taught a custom, online class on Stein's Method in Machine Learning as an inaugural Depth First Learning Fellow.
My research interests are concentrated in multi-task learning. The field of multi-task learning has a lot of interesting avenues, such as robotic RL, curriculum learning, optimization of neural networks, information theory, and even differential geometry (all of which I am excitedly working on with collaborators).
In the past, I had cofounded an edtech startup named Project Chronicle. We were funded by
the University of Michigan CSE Department and the 1517 Fund. I am now a co-founder of q&ai, a Creative Destruction Lab-backed startup uses deep learning and recommender systems to tackle some of the same problems.
October 2020: Our paper, A User's Guide to Calibrating Robotics Simulators, was accepted to CoRL2020!
Research Projects
Symbolic Regression for Interpretable Offline Reinforcement Learning Bhairav Mehta, Max Tegmark [Paper]
While recent years have been company to much progress in the reinforcement learning community, many tasks in use today still rely on carefully designed reward functions, many of which are products of constant tweaking and tuning by engineers and scientists. These reward functions, often dense, symbolic functions of state, don't exist in real world datasets, many of which are labeled by human experimenters - each with their own biases about desired behavior. In this work, we describe a new paradigm of extracting symbolic reward functions from noisy data called Interpretable Symbolic Reinforcement Learning (ISRL). ISRL allows for human experimenters to extract interpretable reward functions solely from data via symbolic regression.
We explore current methods in the space of machine learning system identification, and push these algorithms to their limits along a variety of axes. We explore failure modes of each algorithm, and present a "user's guide" on when and where to use each. To present our results cleanly, we introduce the Simulation Parameter Estimation (SIPE) benchmark, which provides tools to efficiently test and compare past, current, and future algorithms in this space.
We show that bisimulation relations and metrics can be induced by graph neural networks, showing an equivalence between the original formulation of bisimulation on MDPs and the L2 distance induced by a particular type of GNN embedding.
We explore the effects of pretraining on the plasticity of neural networks. We find that different trajectories induce invariances that can be helpful, or harmful, to plasticity of neural networks in multi-task learning scenarios.
Can you learn domain randomization curricula with no rewards? We show that agents trained via self-play in the ADR framework outperform uniform domain randomization by magnitudes in both simulated and real-world transfer.
In a follow-up to ADR, we show that adaptive simulators can be learned in the maximum-entropy RL framework, allowing ADR's learned "randomization-distributions" to serve as a strong, meaningful prior in a domain randomization setting.
Can Meta-RL use curriculum learning? In this work, we explore that question and find that curriculum learning stabilizes meta-RL in complex navigation and locomotion tasks. We also highlight issues with Meta-RL benchmarks by highlighting failure cases when we vary task distributions.
We tackle the uniform sampling assumption in domain randomization and learn a randomization strategy, looking for the most informative environments. Our method shows significant improvements in agent performance, agent generalization, sample complexity, and interpretability over the traditional domain and dynamics randomization strategies.
In ongoing work, we're exploring why this works in practice. Using tools from optimization and dynamical systems, we're trying to decouple the time dependence between curriculum and neural network optimization.
We present a information retrieval approach to education and provide a end-to-end framework to go from raw text to a system where a student can learn about different topics such as History and Psychology, all while getting immediate feedback and recommendations on what to study from our system.
Past Work and Startups
Home Support Robotic Learning Research under Satinder Singh
Aiming to train robots in simulation, I helped to develop agent environment code and DRL algorithms for a home support robot.
ICLR 2018 Reproducability Challenge Class Project for EECS 498: Reinforcement Learning
Aiming to reproduce the results in "Parameter Space Noise for Exploration," my team
and I entered into the ICLR 2018 Reproducability Challenge.
Check out our final draft of the results here.
I helped develop the main pipeline that transformed monocular camera images from
the MAV's camera into a elevation height map. To then evaluate
the usefulness and safety of landing sites, we used metrics
like elevation, flatness and clutter to rank landing site
candidates, and then used the quadcopter's motor controllers
to autonomously land the vehicle.
My main project team at Michigan, I developed software features
such as localization, teleoperation, and odometry using C++
and ROS for an autonomous mining rover, which we utilized
during the Robotic Mining Competition.
Project Chronicle
There is no opportunity as big as education. It is an opportunity
to make life-long learners; to excite students about the
world; and to create explorers, scientists, entrepreneurs,
entertainers, and engineers. But most students dread school.
Many don't find relevance in their classes and many find
their knowledge useless. Our mission at Project Chronicle
is to empower those students with the ownership of stories
and enhance their learning through the power of speech.
Project Chronicle has students record their telling of a prompted
topic. Our platform analyzes the student's response and give
immediate feedback on both the accuracy and delivery of the
content. The application both challenges the students to
understand the material at a much deeper level than required
by typical homework and leaves them with stories they can
easily remember and share. It gives them confidence. It gives
them comprehension. It gives them a voice.
Teaching
EECS 445: Introduction to Machine Learning Winter 2018