Mahdi Erfanian

Ph.D. Candidate at UIC | Research Intern at Microsoft

prof_pic.jpg

5453, Computer Design Research and Learning Center (CDRLC)

850 W Taylor street,

Chicago, IL 60607

I’m Mahdi Erfanian, a Ph.D. candidate in Computer Science at the University of Illinois Chicago, where I am a member of the IndexLab under the supervision of Dr. Abolfazl Asudeh. I received my M.S. in Computer Science from UIC (awarded en route to Ph.D.) and my B.Sc. in Computer Engineering from Sharif University of Technology. I am currently a Research Intern at Microsoft (CodeAI team), working on mitigating hallucination in LLMs and GenAI code agents. My research spans LLMs, Multimodal Data Management, Generative AI, Algorithmic Fairness, and Foundation Models.

I am particularly interested in employing foundation models, including large language models, to address different challenges in data management—such as mitigating bias in training data and enhancing multi-modal data retrieval through synthetic data generation. My work has been published in top-tier venues including VLDB, ICML, and Data Engineering Bulletin, with systems that outperform state-of-the-art baselines like OpenAI’s CLIP by 200% in mean average precision on complex natural language queries. Additionally, I am passionate about Algorithm Design, with a focus on optimizing both fairness and efficiency in data-driven systems.

news

Apr 27, 2026 Released BibTeX Verifier, an open-source, in-browser tool that checks .bib entries against CrossRef and Semantic Scholar—useful for catching metadata mistakes and AI-hallucinated citations. Everything runs locally; only titles are sent to public APIs. Live app · GitHub
Mar 20, 2026 Serving as PC member and reviewer for top-tier venues! :dart: PC Member for WWW 2026, CIKM 2026/2025/2024, KDD 2026, NeurIPS 2026. Reviewer for ICLR 2026, NeurIPS 2025 (DynaFront), TKDE 2025/2024, PETRA 2024 (ETHER-AI).
Mar 15, 2026 Delivered guest lectures at UIC: CS516 (Responsible Data Science) on Generative AI and Fairness, and CS418 (Intro to Data Science) on Generative AI and Multimodal Data Management. :teacher:
Mar 01, 2026 “NeedleDB: A Generative-AI Based System for Accurate and Efficient Image Retrieval using Complex NL Queries” has been submitted to VLDB 2026! :sparkles:
Feb 01, 2026 “Needle: A Generative AI-Powered Multi-modal Database for Answering Complex Natural Language Queries” has been submitted to KDD 2026! :tada:
Jan 15, 2026 Started a Ph.D. Research Internship at Microsoft (CodeAI team)! :rocket: Working on mitigating hallucination in LLMs and GenAI code agents including Copilot and Codex.
Dec 15, 2025 :mortar_board: Received my M.S. in Computer Science from the University of Illinois Chicago, awarded en route to my Ph.D.!

selected publications

  1. NeedleDB: A Generative-AI Based System for Accurate and Efficient Image Retrieval using Complex NL Queries
    Mahdi Erfanian, and Abolfazl Asudeh
    2026
    Manuscript submitted for publication in VLDB 2026
  2. Needle: A Generative AI-Powered Multi-modal Database for Answering Complex Natural Language Queries
    Mahdi ErfanianMohsen Dehghankar, and Abolfazl Asudeh
    arXiv preprint arXiv:2412.00639, 2025
    Manuscript submitted for publication in KDD 2026
  3. An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks
    Mohsen DehghankarMahdi Erfanian, and Abolfazl Asudeh
    In The 2025 International Conference on Machine Learning, 2024
    arXiv preprint arXiv:2411.06360
  4. Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of Minorities
    Mahdi ErfanianH. V. Jagadish, and Abolfazl Asudeh
    Proceedings of the VLDB Endowment, 2024
  5. FairEM360: A Suite for Responsible Entity Matching
    Nima ShahbaziMahdi ErfanianAbolfazl Asudeh, and 2 more authors
    Proceedings of the VLDB Endowment, 2024
  6. Coverage-based Data-centric Approaches for Responsible and Trustworthy AI
    Nima ShahbaziMahdi Erfanian, and Abolfazl Asudeh
    Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2024