Icml2025_matrix_multiplication

Our paper “An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks” has been accepted to ICML 2025! 🎉 This work achieves 24x speedup over NumPy baseline and 2.5x improvement on Quantized LLMs.