Mahdi Erfanian
Ph.D. Candidate at UIC | Research Intern at Microsoft
5453, Computer Design Research and Learning Center (CDRLC)
850 W Taylor street,
Chicago, IL 60607
I’m Mahdi Erfanian, a Ph.D. candidate in Computer Science at the University of Illinois Chicago, where I am a member of the IndexLab under the supervision of Dr. Abolfazl Asudeh. I received my M.S. in Computer Science from UIC (awarded en route to Ph.D.) and my B.Sc. in Computer Engineering from Sharif University of Technology. I am currently a Research Intern at Microsoft (CodeAI team), working on mitigating hallucination in LLMs and GenAI code agents. My research spans LLMs, Multimodal Data Management, Generative AI, Algorithmic Fairness, and Foundation Models.
I am particularly interested in employing foundation models, including large language models, to address different challenges in data management—such as mitigating bias in training data and enhancing multi-modal data retrieval through synthetic data generation. My work has been published in top-tier venues including VLDB, ICML, and Data Engineering Bulletin, with systems that outperform state-of-the-art baselines like OpenAI’s CLIP by 200% in mean average precision on complex natural language queries. Additionally, I am passionate about Algorithm Design, with a focus on optimizing both fairness and efficiency in data-driven systems.
news
| Apr 27, 2026 | Released BibTeX Verifier, an open-source, in-browser tool that checks .bib entries against CrossRef and Semantic Scholar—useful for catching metadata mistakes and AI-hallucinated citations. Everything runs locally; only titles are sent to public APIs. Live app · GitHub |
|---|---|
| Mar 20, 2026 | Serving as PC member and reviewer for top-tier venues! |
| Mar 15, 2026 | Delivered guest lectures at UIC: CS516 (Responsible Data Science) on Generative AI and Fairness, and CS418 (Intro to Data Science) on Generative AI and Multimodal Data Management. |
| Mar 01, 2026 | “NeedleDB: A Generative-AI Based System for Accurate and Efficient Image Retrieval using Complex NL Queries” has been submitted to VLDB 2026! |
| Feb 01, 2026 | “Needle: A Generative AI-Powered Multi-modal Database for Answering Complex Natural Language Queries” has been submitted to KDD 2026! |
| Jan 15, 2026 | Started a Ph.D. Research Internship at Microsoft (CodeAI team)! |
| Dec 15, 2025 | |
selected publications
- NeedleDB: A Generative-AI Based System for Accurate and Efficient Image Retrieval using Complex NL Queries2026Manuscript submitted for publication in VLDB 2026
- Needle: A Generative AI-Powered Multi-modal Database for Answering Complex Natural Language QueriesarXiv preprint arXiv:2412.00639, 2025Manuscript submitted for publication in KDD 2026
- An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural NetworksIn The 2025 International Conference on Machine Learning, 2024arXiv preprint arXiv:2411.06360
- Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of MinoritiesProceedings of the VLDB Endowment, 2024
-
- Coverage-based Data-centric Approaches for Responsible and Trustworthy AIBulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2024