Mahdi Erfanian

Ph.D. Candidate at UIC

prof_pic.jpg

5453, Computer Design Research and Learning Center (CDRLC)

850 W Taylor street,

Chicago, IL 60607

I’m Mahdi Erfanian, a Ph.D. candidate in Computer Science at the University of Illinois Chicago, where I am a member of the IndexLab under the supervision of Dr. Abolfazl Asudeh. I received my B.Sc in Computer Engineering from Sharif University of Technology. My research spans Multimodal Data Management, Generative AI, Algorithmic Fairness, and Foundation Models.

I am particularly interested in employing foundation models, including large language models, to address different challenges in data management—such as mitigating bias in training data and enhancing multi-modal data retrieval through synthetic data generation. My work has led to systems that outperform state-of-the-art baselines like OpenAI’s CLIP by 200% in mean average precision on complex natural language queries. Additionally, I am passionate about Algorithm Design, with a focus on optimizing both fairness and efficiency in data-driven systems.

I am seeking a 2026 Research Scientist Internship to build scalable AI products in areas including multimodal retrieval, foundation models, and vector databases.

news

Oct 15, 2025 Excited to serve as a reviewer and PC member for top-tier venues in 2024-2025! 🎯 Including ICLR 2025, NeurIPS 2025 DynaFront Workshop, CIKM 2025 (PC Member), TKDE 2025, and more.
Oct 15, 2025 🎓 Successfully passed my Ph.D. preliminary exam! My thesis proposal “Generative AI for Multimodal Data Management” has been approved. Excited to continue advancing research in this cutting-edge area! 🚀
Jun 01, 2025 Our paper “An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks” has been accepted to ICML 2025! 🎉 This work achieves 24x speedup over NumPy baseline and 2.5x improvement on Quantized LLMs.
Nov 01, 2024 Our work on optimized inference for binary and ternary neural networks is now available on arXiv! This groundbreaking research achieves significant speedup improvements for quantized LLMs.
Aug 26, 2024 Chameleon (full research paper) and FairEM360 (demo paper) have been published and presented in VLDB 2024 :sparkles:

selected publications

  1. Task-aware Data Augmentation using Generative AI for Group-distributional Robustness
    Mahdi ErfanianBoris Glavic, and Abolfazl Asudeh
    arXiv preprint arXiv:TBD, 2025
    Manuscript submitted for publication in SIGMOD 2026
  2. Needle: A Generative AI-Powered Multi-modal Database for Answering Complex Natural Language Queries
    Mahdi ErfanianMohsen Dehghankar, and Abolfazl Asudeh
    arXiv preprint arXiv:2412.00639, 2025
    Manuscript submitted for publication in ICLR 2025
  3. An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks
    Mohsen DehghankarMahdi Erfanian, and Abolfazl Asudeh
    In The 2025 International Conference on Machine Learning, 2024
    arXiv preprint arXiv:2411.06360
  4. Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of Minorities
    Mahdi ErfanianH. V. Jagadish, and Abolfazl Asudeh
    Proceedings of the VLDB Endowment, 2024
  5. FairEM360: A Suite for Responsible Entity Matching
    Nima ShahbaziMahdi ErfanianAbolfazl Asudeh, and 2 more authors
    Proceedings of the VLDB Endowment, 2024
  6. Coverage-based Data-centric Approaches for Responsible and Trustworthy AI
    Nima ShahbaziMahdi Erfanian, and Abolfazl Asudeh
    Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2024