
Sign up to save your podcasts
Or
In today’s episode we explore the current state-of-art in document AI with Andrej Baranovskij, an active open source contributor and founder of Katana ML. Andrej’s work centers using open source AI models to extract structured information from documents, including PDF’s, image files, and more. In our conversation, we discuss how document AI has advanced with the advent of transformer architectures, and increasing use of multi-modal models that combine image recognition capabilities with language understanding. We also talk about Andrej’s vision for Sparrow, his open source project geared toward helping organizations adopt these models more easily.
Links:
- Sparrow: https://github.com/katanaml/sparrow
Timestamps:
00:00 Document AI: evolution, accessibility, and real-world use
05:23 Enterprise expert finds common sense use case.
07:57 Goal: Successful open source product helping companies.
09:47 Advantages of DOM: free, commercial use allowed, key-value data.
15:19 Donut has limitations due to training data.
17:31 Elements automates invoice processing, reduces manual work.
22:20 Paducera groups receipt data exceptionally well with key value pairs.
24:41 Persist data for retrieval and calculate spending patterns.
29:50 Challenging integration with support, but successful.
34:09 AI accessibility for developers, smaller ML models.
35:54 Trend: running ML models locally instead of cloud.
39:08 Adding LLM support with Fast API framework.
Tune in and gain valuable knowledge about the power of data analytics in shaping the future of businesses. Do not forget to rate or review on your favorite platform!
In today’s episode we explore the current state-of-art in document AI with Andrej Baranovskij, an active open source contributor and founder of Katana ML. Andrej’s work centers using open source AI models to extract structured information from documents, including PDF’s, image files, and more. In our conversation, we discuss how document AI has advanced with the advent of transformer architectures, and increasing use of multi-modal models that combine image recognition capabilities with language understanding. We also talk about Andrej’s vision for Sparrow, his open source project geared toward helping organizations adopt these models more easily.
Links:
- Sparrow: https://github.com/katanaml/sparrow
Timestamps:
00:00 Document AI: evolution, accessibility, and real-world use
05:23 Enterprise expert finds common sense use case.
07:57 Goal: Successful open source product helping companies.
09:47 Advantages of DOM: free, commercial use allowed, key-value data.
15:19 Donut has limitations due to training data.
17:31 Elements automates invoice processing, reduces manual work.
22:20 Paducera groups receipt data exceptionally well with key value pairs.
24:41 Persist data for retrieval and calculate spending patterns.
29:50 Challenging integration with support, but successful.
34:09 AI accessibility for developers, smaller ML models.
35:54 Trend: running ML models locally instead of cloud.
39:08 Adding LLM support with Fast API framework.
Tune in and gain valuable knowledge about the power of data analytics in shaping the future of businesses. Do not forget to rate or review on your favorite platform!