Research/Projects

Evaluation & Benchmarking

HumorDB: A Benchmark for Multimodal Reasoning

HumorDB: A Benchmark for Multimodal Reasoning

Created a novel 3500+ example dataset and benchmark to evaluate complex reasoning in multimodal models, featured at ICCV 2025.


Formal Methods for Trustworthy AI

Certifying Knowledge Comprehension in LLMs

Certifying Knowledge Comprehension in LLMs

Co-developed a probabilistic framework to formally certify LLM knowledge comprehension using Knowledge Graphs, presented at ICLR 2024.


Other Software & Research Engineering Projects

  • ML Kernels: Optimized Matrix Multiplication/Convolution kernels in CUDA/NKI.
  • Applied ML: Developed novel models for Parkinson’s disease progression (Diffusion Models), adversarial defenses for LLMs (Layer-Targeted Steering) against priming/prefilling attacks, and curriculum learning for embodied agents (Curriculum Learning).
  • Full-Stack Application: Engineered a thread-based chat interface for LLMs using Tauri (Rust), React, and SQLite.