Awarded or Cited by
Google IBM Facebook Docusign Amazon UCSC UCSD Berkeley UCLA UCSB NSF NeuroTechX MIT OpenBCI Capital One Angelhack Agora.io Hedera Hashgraph Radar.io MLH
Experience
OpenAI

As a Research Resident on the Training team, I develop architectures that get more intelligent with compute.

Our deep learning research is the foundation for the intelligence of models like GPT-4o and o1.

Reworkd (YC S23)

I was Founding Research Engineer, and explored multimodal code-generation for extracting web data at scale.

We raised $4M from Paul Graham, General Catalyst, AI Grant (Nat Friedman & Daniel Gross), SV Angel, Y Combinator, and founders of Reddit, Instacart, & Cruise.

Ousia

SEO content writers have to deeply research their topic to know what to write about. Ousia automates research.

As technical co-founder, I built NLP & LLM solutions to 10x our users' article writing ability. Exited via co-founder buyout.

Carnegie Mellon University — MultiComp

Vision-Language Models drastically fail to represent & align compositional structure (e.g. "mug in grass" vs "grass in mug").

In my Honors Thesis, we explore various vectorial approaches inspired by linguistic theory to address this problem, with papers at NeurIPS, ACL, EACL, and ICCV.

Microsoft AI

The AI Platform group at Microsoft builds infrastructure for enterprise-scale machine learning lifecycles on Azure.

I fine-tuned distilled LLMs to aid annotators in natural language data labeling, saving compute & improving speed.

Carnegie Mellon University — NeuLab

Are large language models just learning co-occurence statistics, or can they capture compositional relations as encoded by semantic formalisms?

We applied graph algorithms to Abstract Meaning Representation to create a task that probes compositional ability. I presented our work at the 2021 SCS Research Fair.

Vizerto

Vizerto is a digital sales assistant that makes domain-specific knowledge easily available to B2B sellers.

I advised their ML team on novel approaches to information retrieval, graphical knowledge representations, and more.

Language & Dialogue Systems Lab

Our conversational socialbot interacted with thousands of Amazon Alexa users every day, maintaining the top average user rating for 2 months straight against teams from Stanford, USC, and more.

My work on user modeling and entity graphs was included in our paper at EMNLP 2021.

SapientX

SapientX builds white label intelligent voice assistants for cars, phones, fridges, and stores.

I fine-tuned state-of-the-art models for extractive question answering to give Tele the ability to answer domain-specific user queries from large, unorganized document corpora.

Language, Logic, & Cognition Lab

Can deep reinforcement learning model how humans learn to parse syntax trees from experience?

We built a family of cognitively realistic parsing environments to explore how novel neural architectures & RL algorithms could inform psycholinguistic theory. Our work was accepted at NeurIPS 2021 Deep RL workshop.

Wordcab

Wordcab summarizes business meetings using the latest in abstractive neural summarization tech.

I worked with Aleks (CEO) to build topic-based summarization, a highly-demanded but technologically challenging feature.

Intheon

Intheon builds neural data processing infrastructure used by labs across the world to simplify their brainwave analysis pipelines.

I undertook NSF-funded research to investigate how language models could aid brain-computer interfaces in assisting users.

Applied Machine Learning Lab

The AMLL lab applies novel ML research to social good issues primarily in psychology and neuroscience.

Our work used hierarchical document representations to identify mental illness in social media discussions and quantify COVID's diachronic effects.

Bunch Inc

Bunch builds enterprise-grade video & computer vision software while exploring related high risk-reward projects.

I deployed tensorflow.js pose detection models client-side for a project virtualizing expensive gym equipment.

Publications
gzip Predicts Data-dependent Scaling Laws

ArXiv 2024
Rohan Pandey
Uncovering Cross-modal Syntax in Vision-Language Models with Causal Intervention

In Progress
Rohan Pandey, Aryaman Arora, Tristan Thrush, Christopher Potts
Multimodal Learning Without Multimodal Data: Guarantees and Applications

Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alexander Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Ruslan Salakhutdinov
Towards Vision-Language Mechanistic Interpretability: a Causal Tracing Tool for BLIP

Vedant Palit*, Rohan Pandey*, Aryaman Arora, Paul Pu Liang
WinogroundVQA: Zero-shot Reasoning with LLMs for Compositional Visual Question Answering

In Academic Purgatory
Rohan Pandey, Spandan Das, Tristan Thrush, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency
Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment

Rohan Pandey, Rulin Shao, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency
Syntax-guided Neural Module Distillation to Probe Compositionality in Sentence Embeddings

Rohan Pandey
Does Structural Attention Improve Compositional Representations in Vision-Language Models?


Rohan Pandey, Rulin Shao, Paul Pu Liang, Louis-Philippe Morency
Probing Compositional Representations in Neural Language Models with Semantic Graphs

Preprint, 2022

Rohan Pandey, Uri Alon, Frank Xu, Graham Neubig
A Family of Cognitively Realistic Parsing Environments for Deep Reinforcement Learning



Adrian Brasoveanu, Rohan Pandey, Maximilian Alfano-Smith
Athena 2.0: Contextualized Dialogue Management for an Alexa Prize SocialBot


Juraj Juraska, Kevin K. Bowden, Lena Reed, Vrindavan Harrison, Wen Cui, Omkar Patil, Rishi Rajasekaran, Angela Ramirez, Cecilia Li, Eduardo Zamora, Phillip Lee, Jeshwanth Bheemanpally, Rohan Pandey, Adwait Ratnaparkhi, Marilyn Walker
Transfer Learning for Mental Health Evaluation from Natural Language

Preprint, 2020
Kamil Kisielewicz, Rohan Pandey, Shivansh Rustagi, Narges Norouzi
Research

As of early 2021, I was interested in questions like...

How do humans perform semantic composition and how can we build systems that analyze language compositionally? Transformers have outpaced virtually all other architectures in NLP—is something about the self-attention mechanism inherently effective at expressing semantic composition?
How do humans ground language in their environment and how can we build systems that understand language in relation to the real world? The dominant approach of learning from large text corpora has gone a long way, but it falls into a trap that can only be avoided by grounding language. How do perception & action modalities influence semantic representations?
What is the underlying relationship between symbolic and statistical approaches? Why do some parts of nature seem so perfectly described by symbolic relations while others don't? Is reality fundamentally symbolic or are symbols a formalism that humans apply to our environment?
And a few miscellaneous ones: What makes specifically human brains so good at manipulating symbols, genetically, structurally, and culturally? How does the brain represent non-linguistic thoughts and is all perception symbolic at some level? How can classical theories from linguistics and philosophy of language aid modern research in NLP? Is internality an inherent property of matter?
Projects

LlamaGym

#1 HN, #2 r/LocalLlama, Github Trending, 900+ Stars


Fine-tune LLM agents with online reinforcement learning

Tarsier

In Production @ Reworkd
1.4k+ Github Stars


Vision utilities for web interaction agents

Llama2D

Won 2nd @ AGI House SF Launch an LLM Hackathon


2D Positional Embeddings for Web Structure Understanding

WikiLLM

Helped out with my little sister's first LLM project!


LLMs as Collaboratively Edited Knowledge Bases

fbIRL

Won 1st @ Facebook SF Dev Hackathon 2019


Tomorrow's AR social network (Pre-Meta)

Celery

Won 2nd & FinTech @
UCLA Hacks 2019


Big data forecasting for sustainable businesses

veda.dev

Deployed with active users


Morphology visualizer for Sanskrit literature research & education

sWEep

Won 1st @ SRC Code 2018


Cleaning neighborhoods with computer vision

Latent Space

Won 3rd @ HackMIT 2020


Domain-specific neural audio compression for virtual bands

We & You

Won Google Cloud @ BASEHacks 2018


Peer-to-peer mental health services for teens

Phil

Won Amazon & Blockchain @ CruzHacks 2019


Facilitating blockchain donations with Alexa skill art

Boolepathy

Won 1st in US @
NeuroTechX 2020


Non-invasive synthetic telepathy

Fun Facts