An Vo
MS student @ Korea Advanced Institute of Science & Technology (KAIST)

I am broadly interested in Large Language Models (LLMs) and Vision Language Models (VLMs), especially in making them more trustworthy and explainable in edge/hard cases. My works have been accepted at top venues: ICML, AAAI, GECCO, etc.
I am also the lead author of VLMs are Biased, a paper that was featured on the front page of Hacker News and reached the top 5. In this work, we introduce VLMBias, a benchmark for evaluating visual counting in VLMs. We show that state-of-the-art models (e.g., o3, o4-mini, Gemini 2.5 Pro, Claude 3.7 Sonnet) achieve 100% accuracy counting on images of popular subjects (e.g. knowing that the Adidas logo has 3 stripes and a dog has 4 legs) but are only ~17% accurate in counting in counterfactual images (e.g. counting stripes in a 4-striped Adidas-like logo or counting legs in a 5-legged dog).
🎓 I am actively seeking PhD opportunities starting Fall 2026 to continue broadly research in (but not limited to) LLMs/VLMs. If you believe I would be a good fit for your research group, please feel free to reach out at an.vo@kaist.ac.kr.