Optimized_inference

Our work on optimized inference for binary and ternary neural networks is now available on arXiv! This groundbreaking research achieves significant speedup improvements for quantized LLMs.