An Vo
MS student @ Korea Advanced Institute of Science & Technology (KAIST)

I am broadly interested in Large Language Models (LLMs) and Vision Language Models (VLMs), especially in making them more trustworthy and explainable in edge/hard cases. My works have been accepted at top venues: ICML, AAAI, GECCO, etc.
I am also the lead author of VLMs are Biased, a paper that was featured on Hacker News (front page; top-5), LinkedIn, Gary Marcus's article (GPT-5: Overdue, overhyped and underwhelming), Lucas Beyer’s tweet,. In this work, we introduce VLMBias, a benchmark for evaluating visual counting in VLMs. We show that state-of-the-art models (e.g., o3, o4-mini, Gemini 2.5 Pro, Claude 3.7 Sonnet) achieve 100% accuracy counting on images of popular subjects (e.g. knowing that the Adidas logo has 3 stripes and a dog has 4 legs) but are only ~17% accurate in counting in counterfactual images (e.g. counting stripes in a 4-striped Adidas-like logo or counting legs in a 5-legged dog).