Rohan Pandey

Experience

Reworkd (YC S23)

As Founding Research Engineer, I explore multimodal code-generation for extracting web data at scale.

Backed by Paul Graham, General Catalyst, SV Angel, Y Combinator, and founders of Reddit, Instacart, & Cruise.

Ousia

SEO content writers have to deeply research their topic to know what to write about. Ousia automates research.

As technical co-founder, I built NLP & LLM solutions to 10x our users' article writing ability. Exited via co-founder buyout.

Carnegie Mellon University — MultiComp

Vision-Language Models drastically fail to represent & align compositional structure (e.g. "mug in grass" vs "grass in mug").

In my Honors Thesis, we explore various vectorial approaches inspired by linguistic theory to address this problem, with papers at NeurIPS, ACL, EACL, and ICCV.

Microsoft AI

The AI Platform group at Microsoft builds infrastructure for enterprise-scale machine learning lifecycles on Azure.

I fine-tuned distilled LLMs to aid annotators in natural language data labeling, saving compute & improving speed.

Carnegie Mellon University — NeuLab

Are large language models just learning co-occurence statistics, or can they capture compositional relations as encoded by semantic formalisms?

We applied graph algorithms to Abstract Meaning Representation to create a task that probes compositional ability. I presented our work at the 2021 SCS Research Fair.

Vizerto

Vizerto is a digital sales assistant that makes domain-specific knowledge easily available to B2B sellers.

I advised their ML team on novel approaches to information retrieval, graphical knowledge representations, and more.

Language & Dialogue Systems Lab

Our conversational socialbot interacted with thousands of Amazon Alexa users every day, maintaining the top average user rating for 2 months straight against teams from Stanford, USC, and more.

My work on user modeling and entity graphs was included in our paper at EMNLP 2021.

SapientX

SapientX builds white label intelligent voice assistants for cars, phones, fridges, and stores.

I fine-tuned state-of-the-art models for extractive question answering to give Tele the ability to answer domain-specific user queries from large, unorganized document corpora.

Language, Logic, & Cognition Lab

Can deep reinforcement learning model how humans learn to parse syntax trees from experience?

We built a family of cognitively realistic parsing environments to explore how novel neural architectures & RL algorithms could inform psycholinguistic theory. Our work was accepted at NeurIPS 2021 Deep RL workshop.

Wordcab

Wordcab summarizes business meetings using the latest in abstractive neural summarization tech.

I worked with Aleks (CEO) to build topic-based summarization, a highly-demanded but technologically challenging feature.

Intheon

Intheon builds neural data processing infrastructure used by labs across the world to simplify their brainwave analysis pipelines.

I undertook NSF-funded research to investigate how language models could aid brain-computer interfaces in assisting users.

Applied Machine Learning Lab

The AMLL lab applies novel ML research to social good issues primarily in psychology and neuroscience.

Our work used hierarchical document representations to identify mental illness in social media discussions and quantify COVID's diachronic effects.

Bunch Inc

Bunch builds enterprise-grade video & computer vision software while exploring related high risk-reward projects.

I deployed tensorflow.js pose detection models client-side for a project virtualizing expensive gym equipment.

Publications

Gzip Predicts Data-sensitive Scaling Laws

In Prep for NeurIPS 2024

Rohan Pandey

Uncovering Cross-modal Syntax in Vision-Language Models with Causal Intervention

In Progress

Rohan Pandey, Aryaman Arora, Tristan Thrush, Christopher Potts

Multimodal Learning Without Multimodal Data: Guarantees and Applications

ICLR 2024

Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alexander Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Ruslan Salakhutdinov

Towards Vision-Language Mechanistic Interpretability: a Causal Tracing Tool for BLIP

ICCV 2023 Vision-Language Workshop

Vedant Palit*, Rohan Pandey*, Aryaman Arora, Paul Pu Liang

WinogroundVQA: Zero-shot Reasoning with LLMs for Compositional Visual Question Answering

In Academic Purgatory

Rohan Pandey, Spandan Das, Tristan Thrush, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment

ACL 2023

Rohan Pandey, Rulin Shao, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

Syntax-guided Neural Module Distillation to Probe Compositionality in Sentence Embeddings

EACL 2023

Rohan Pandey

Does Structural Attention Improve Compositional Representations in Vision-Language Models?

NeurIPS 2022 Self-Supervised Learning Workshop

Rohan Pandey, Rulin Shao, Paul Pu Liang, Louis-Philippe Morency

Probing Compositional Representations in Neural Language Models with Semantic Graphs

Preprint, 2022

Rohan Pandey, Uri Alon, Frank Xu, Graham Neubig

A Family of Cognitively Realistic Parsing Environments for Deep Reinforcement Learning

Architectures and Mechanisms for Language Processing 2022;

NeurIPS 2021 Deep Reinforcement Learning Workshop

Adrian Brasoveanu, Rohan Pandey, Maximilian Alfano-Smith

Athena 2.0: Contextualized Dialogue Management for an Alexa Prize SocialBot

EMNLP 2021 System Demonstrations;

Proceedings of Amazon Alexa Prize Socialbot Grand Challenge 4

Juraj Juraska, Kevin K. Bowden, Lena Reed, Vrindavan Harrison, Wen Cui, Omkar Patil, Rishi Rajasekaran, Angela Ramirez, Cecilia Li, Eduardo Zamora, Phillip Lee, Jeshwanth Bheemanpally, Rohan Pandey, Adwait Ratnaparkhi, Marilyn Walker

Transfer Learning for Mental Health Evaluation from Natural Language

Preprint, 2020

Kamil Kisielewicz, Rohan Pandey, Shivansh Rustagi, Narges Norouzi

Projects

LlamaGym

#1 HN, #2 r/LocalLlama, Github Trending, 700+ Stars

Fine-tune LLM agents with online reinforcement learning

Tarsier

In Production @ Reworkd
400+ Github Stars

Vision utilities for web interaction agents

Llama2D

Won 2nd @ AGI House SF Launch an LLM Hackathon

2D Positional Embeddings for Web Structure Understanding

WikiLLM

Helped out with my little sister's first LLM project!

LLMs as Collaboratively Edited Knowledge Bases

fbIRL

Won 1st @ Facebook SF Dev Hackathon 2019

Tomorrow's AR social network (Pre-Meta)

Celery

Won 2nd & FinTech @
UCLA Hacks 2019

Big data forecasting for sustainable businesses

veda.dev

Deployed with active users

Morphology visualizer for Sanskrit literature research & education

sWEep

Won 1st @ SRC Code 2018

Cleaning neighborhoods with computer vision

Latent Space

Won 3rd @ HackMIT 2020

Domain-specific neural audio compression for virtual bands

We & You

Won Google Cloud @ BASEHacks 2018

Peer-to-peer mental health services for teens

Phil

Won Amazon & Blockchain @ CruzHacks 2019

Facilitating blockchain donations with Alexa skill art

Boolepathy

Won 1st in US @
NeuroTechX 2020

Non-invasive synthetic telepathy

Rohan Pandey

AI Researcher &
10x Hackathon Winner

Reworkd (YC S23)

Ousia

Carnegie Mellon University — MultiComp

Microsoft AI

Carnegie Mellon University — NeuLab

Vizerto

Language & Dialogue Systems Lab

SapientX

Language, Logic, & Cognition Lab

Wordcab

Intheon

Applied Machine Learning Lab

Bunch Inc

As of early 2021, I was interested in questions like...

LlamaGym

Tarsier

Llama2D

WikiLLM

fbIRL

Celery

veda.dev

sWEep

Latent Space

We & You

Phil

Boolepathy

Rohan Pandey

AI Researcher & 10x Hackathon Winner

Reworkd (YC S23)

Ousia

Carnegie Mellon University — MultiComp

Microsoft AI

Carnegie Mellon University — NeuLab

Vizerto

Language & Dialogue Systems Lab

SapientX

Language, Logic, & Cognition Lab

Wordcab

Intheon

Applied Machine Learning Lab

Bunch Inc

As of early 2021, I was interested in questions like...

LlamaGym

Tarsier

Llama2D

WikiLLM

fbIRL

Celery

veda.dev

sWEep

Latent Space

We & You

Phil

Boolepathy

AI Researcher &
10x Hackathon Winner