AI Post Transformers

ONNX Ecosystem, Optimization, and Deployment


Listen Later

The provided sources center on the Open Neural Network Exchange (ONNX) format and its inference engine, ONNX Runtime, highlighting their role in enabling high-performance, cross-platform machine learning deployment. Several sources detail the architectural benefits of ONNX Runtime, such as enabling AI inference in Java systems without Python dependencies and facilitating hardware acceleration across various chips like NVIDIA GPUs and Arm processors. One critical source introduces OODTE, a differential testing tool used to assess the functional correctness of the ONNX Optimizer, revealing multiple bugs and accuracy deviations in optimized models. Finally, a practical example from Firefox AI demonstrates switching from the WebAssembly (WASM) version to the native C++ ONNX Runtime for a significant speed increase in local AI features.Sources:https://en.wikipedia.org/wiki/Open_Neural_Network_Exchangehttps://github.com/onnx/onnx/blob/main/docs/Overview.mdhttps://github.com/onnx/optimizerhttps://github.com/onnx/onnx/blob/main/docs/IR.mdhttps://blog.stackademic.com/onnx-open-neural-network-exchange-29f39a84c5f2https://developer.nvidia.com/blog/end-to-end-ai-for-pcs-onnx-runtime-and-optimization/https://developer.arm.com/ai/kleidi-librarieshttps://newsroom.arm.com/blog/arm-microsoft-kleidiai-onnx-runtimehttps://hackernoon.com/mobile-ai-with-onnx-runtime-how-to-build-real-time-noise-suppression-that-workshttps://blog.mozilla.org/en/firefox/firefox-ai/speeding-up-firefox-local-ai-runtime/https://www.infoq.com/articles/onnx-ai-inference-with-java/https://arxiv.org/pdf/2202.06929https://arxiv.org/html/2505.01892v1
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof