Machine learning has achieved significant progress in recent years, with systems matching human capabilities in diverse tasks. However, the main hurdle lies not just in developing these models, but in deploying them optimally in everyday use cases. This is where AI inference becomes crucial, arising as a primary concern for scientists and industry