The Python Podcast.__init__

Scaling Knowledge Management For Technical Teams With Knowledge Repo


Listen Later

Summary

One of the most persistent challenges faced by organizations of all sizes is the recording and distribution of institutional knowledge. In technical teams this is exacerbated by the need to incorporate technical review feedback and manage access to data before publishing. When faced with this problem as an early data scientist at AirBnB, Chetan Sharma helped create the Knowledge Repo project as a solution. In this episode he shares the story behind its creation and growth, how and why it was released as open source, and the features that make it a compelling option for your own team’s knowledge management journey.

Announcements
  • Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Your host as usual is Tobias Macey and today I’m interviewing Chetan Sharma about Knowledge Repo, an open source framework for managing documentation for technical users
  • Interview
    • Introductions

    • How did you get introduced to Python?

      • EE + CS/AI + Stats degrees
      • Airbnb working on ML models
      • Knowledge Repo itself
      • Can you describe what Knowledge Repo is and the story behind it?

        • We started seeing interviewees use ipython notebooks, thought they were great
        • Wanted to push more people to use notebooks, but they weren’t very shareable, vettable
        • Existing notebook hosting services weren’t very good, and weren’t built for people who aren’t data stakeholders. It was especially poor with images, annoying cell blocks
        • Made a simple post processor to remove cell blocks, push the images to s3, and host on flask
        • Once we were pushing notebooks into a Github repo for hosting on a flask app, so many things became possible
          • Review cycles
          • Shareability / collaboration features
          • Indexing / searching
          • Concurrently, great work was happening on developing internal R packages / python libraries to provide consistent, branded aesthetics
          • What are some of the approaches that teams typically take for recording and sharing institutional knowledge?

            • Copy and paste to google docs, slides
            • Facebook was using facebook photo albums
            • untrustworthy, not discoverable, divorced from the code
            • What are the unique requirements that are introduced when attempting to record and distribute learnings related to data such as A/B experiments, analytical methods, data sets, etc.?

              • Reproducibility is a big one
              • Making sure the learnings are trustworthy (good data? no bugs?)
              • Distributing widely, across the org and across time
              • Experimentation
                • Experimentation is at the end of a research-design-build-measure cycle, strategic analysis is often before
                • Capturing all of the context
                • Can you describe how the Knowledge Repo project is architected?

                  • Repositories: a store of posts, most commonly a github repo
                  • Markdown as original lingua franca, eventually a KR specific “KR post” concept (which is still basically markdown)
                  • Post processors
                    • Convert whatever upstream file to markdown / KR post (Jupyter notebook, R Markdown, markdown were the original ones)
                    • Handle images and other large assets, usually pushing them to cloud storage
                    • Evolved to handle PDFs, googledocs, keynotes
                    • What were the motivating factors for making it available as an open source project?

                      • It was such a common problem. Even incredibly sophisticated data teams at Uber, Facebook, etc. were begging us to share the system.
                      • What is the workflow for creating, sharing, and discovering information in an installation of Knowledge Repo?

                        • Create a github repo for hosting strategic analysis
                        • Use the KR script to create a stub/template for whatever format you’re working in
                        • Do your work in Jupyter, etc.
                        • Instead of using github scripts (git add) use knowledge scripts (knowledge add), which is basically the github scripts with postprocessors
                        • Do typical Github workflows
                        • See the result in the hosted knowledge repo app
                        • What are some of the options available for extending or customizing an installation of Knowledge Repo?

                          • More postprocessors! google docs, presentations, UX research, anything can be done in KR with a simple postprocessor to turn it to markdown/images/PDF
                          • Tying the system to your internal data tools. For example, an experimentation system like Eppo or whatever you use for marketing campaigns
                          • If you were to start over today, what are some of the ways that you might approach the solution to knowledge management differently?

                            • Think of it more holistically:
                            • What are the most interesting, innovative, or unexpected ways that you have seen Knowledge Repo used?

                              • UX research
                              • Writing up guide for acquihiring
                              • Demonstrating of capabilities, data framework
                              • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Knowledge Repo?

                                • Strategic analysis needs to be elevated, this leads to paradigm changes
                                • Organization problems are helped by tools like KR: eg. promotions
                                • Meeting people’s tools/workflows where they are is powerful
                                • When is Knowledge Repo the wrong choice?

                                  Keep In Touch
                                  • LinkedIn
                                  • @chesharma87
                                  • Picks
                                    • Tobias
                                      • Learning Guitar
                                      • Chetan
                                        • Underrated cooking ingredients: chickpea flour, butter fried kimchi (in grilled cheese, nachos)
                                        • Closing Announcements
                                          • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
                                          • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                                          • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                                          • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                                          • Links
                                            • Eppo
                                              • Data Engineering Podcast Episode
                                              • Knowledge Repo
                                              • IPython
                                              • Jupyter
                                              • Flask
                                              • The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

                                                ...more
                                                View all episodesView all episodes
                                                Download on the App Store

                                                The Python Podcast.__init__By Tobias Macey

                                                • 4.4
                                                • 4.4
                                                • 4.4
                                                • 4.4
                                                • 4.4

                                                4.4

                                                100 ratings


                                                More shows like The Python Podcast.__init__

                                                View all
                                                TED Talks Daily by TED

                                                TED Talks Daily

                                                11,424 Listeners

                                                6 Minute English by BBC Radio

                                                6 Minute English

                                                1,825 Listeners

                                                The Changelog: Software Development, Open Source by Changelog Media

                                                The Changelog: Software Development, Open Source

                                                285 Listeners

                                                Data Skeptic by Kyle Polich

                                                Data Skeptic

                                                475 Listeners

                                                Talk Python To Me by Michael Kennedy

                                                Talk Python To Me

                                                580 Listeners

                                                Software Engineering Daily by Software Engineering Daily

                                                Software Engineering Daily

                                                624 Listeners

                                                The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                                                The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                                                439 Listeners

                                                Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                                                Super Data Science: ML & AI Podcast with Jon Krohn

                                                295 Listeners

                                                Python Bytes by Michael Kennedy and Brian Okken

                                                Python Bytes

                                                214 Listeners

                                                Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                                                Syntax - Tasty Web Development Treats

                                                989 Listeners

                                                DataFramed by DataCamp

                                                DataFramed

                                                266 Listeners

                                                Practical AI by Practical AI LLC

                                                Practical AI

                                                196 Listeners

                                                The Real Python Podcast by Real Python

                                                The Real Python Podcast

                                                137 Listeners

                                                Last Week in AI by Skynet Today

                                                Last Week in AI

                                                271 Listeners

                                                Latent Space: The AI Engineer Podcast by swyx + Alessio

                                                Latent Space: The AI Engineer Podcast

                                                70 Listeners