Research/Projects
Evaluation & Benchmarking
HumorDB: A Benchmark for Multimodal Reasoning
Created a novel 3500+ example dataset and benchmark to evaluate complex reasoning in multimodal models, featured at ICCV 2025.
Formal Methods for Trustworthy AI
Certifying Knowledge Comprehension in LLMs
Co-developed a probabilistic framework to formally certify LLM knowledge comprehension using Knowledge Graphs, presented at ICLR 2024.
Other Software & Research Engineering Projects
- ML Kernels: Optimized Matrix Multiplication/Convolution kernels in CUDA/NKI.
- Applied ML: Developed novel models for Parkinson’s disease progression (Diffusion Models), adversarial defenses for LLMs (Layer-Targeted Steering) against priming/prefilling attacks, and curriculum learning for embodied agents (Curriculum Learning).
- Full-Stack Application: Engineered a thread-based chat interface for LLMs using Tauri (Rust), React, and SQLite.