Thinking Elixir Podcast

31: Crawling the Web using Elixir with Oleg Tarasenko and Tze Yiing


Listen Later

We talk with Oleg Tarasenko and Tze Yiing about crawling the web using Elixir. Oleg created the crawly project to help solve this problem and Tze Yiing joined him as a contributor and maintainer. We cover how Elixir is well suited to orchestrate crawling, how to deal with login pages, understanding the legal concerns, building a codeless scraper and much more!

Show Notes online - http://podcast.thinkingelixir.com/31

Elixir Community News

  • https://dashbit.co/blog/ten-years-ish-of-elixir – January 9th marked the 10th year since the first commit to the Elixir repository
  • https://github.com/elixir-lang/elixir/commit/337c3f2d569a42ebd5fcab6fef18c5e012f9be5b – First commit on the repository
  • https://twitter.com/josevalim/status/1349010127270129670 – Jose Valim reveals the name of his secret project is called 'Nx'
  • https://remote.com/blog/welcoming-elixir-creator-jose-valim – Jose Valim joins Remote as a Technical Adivsor
  • https://twitter.com/josevalim/status/1347858475267854336 – ExUnit will catch SIGQUIT message from CTRL+\ and shows the tests that were running
  • https://github.com/elixir-lang/elixir/blob/master/lib/mix/lib/mix/tasks/test.ex#L34 – ExUnit will print how much time the test suite spent on async tests vs sync tests
  • https://twitter.com/fhunleth/status/1348092050487570433 – Nerves support on the M1 is looking good
  • https://www.youtube.com/playlist?list=PLqj39LCvnOWZl_Pb0Y7wGWijKbTvL4gJg – Elixir Conf 2020 videos have all been publicly released!
  • Do you have some Elixir news to share? Tell us at @ThinkingElixir or email at [email protected]

    Discussion Resources

    • https://oltarasenko.medium.com/web-scraping-with-elixir-and-crawly-extracting-data-behind-authentication-a52584e9cf13
    • https://oltarasenko.medium.com/using-elixir-and-crawly-for-price-monitoring-7364d345fc64 – Using Elixir for price monitoring
    • https://hex.pm/packages/crawly
    • https://github.com/oltarasenko/crawly
    • https://www.erlang-solutions.com/blog/web-scraping-with-elixir.html – Oleg's older web scraping with Elixir article
    • https://www.erlang-solutions.com/blog/how-to-build-a-machine-learning-project-in-elixir.html – Building a machine learning projects with Elixir, Tensorflow and Crawly
    • https://oltarasenko.medium.com/what-is-web-scraping-and-why-you-might-want-to-use-it-a0e4b621f6d0 – What is web scraping, and why you might want to use it?
    • https://www.pillowskin.com – Ziinc's project using scraping and aggregation
    • https://www.tensorflow.org/
    • https://oltarasenko.medium.com/the-unofficial-guide-to-extracting-google-search-results-in-2021-with-elixir-7a6ef80d0f5b
    • https://scrapy.org/
    • https://github.com/fredwu/crawler
    • https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-linkedin-protects-scraping-public-data – EFF legal interpretation of LinkedIn vs HiQ scraping case
    • https://github.com/scrapinghub/splash/
    • https://www.joinhoney.com/
    • https://hexdocs.pm/crawly/readme.html#quickstart – Crawly quickstart guid
    • https://hexdocs.pm/crawly/tutorial.html – Crawley tutorial
    • https://github.com/oltarasenko/crawly_ui – Crawly UI project
    • http://crawlyui.com/ – Crawly UI project page
    • Data is the new gold
    • https://t.me/elixir_crawly – Crawley Telegram group
    • Guest Information

      • https://github.com/oltarasenko – Oleg on Github
      • https://oltarasenko.medium.com/ – Oleg's Blog
      • https://twitter.com/tzeyiing – Lee TzeYiing on Twitter
      • https://github.com/Ziinc – Lee TzeYiing on Github
      • https://www.tzeyiing.com – Lee TzeYiing Blog
      • Find us online

        • Message the show - @ThinkingElixir
        • Email the show - [email protected]
        • Mark Ericksen - @brainlid
        • David Bernheisel - @bernheisel
        • Cade Ward - @cadebward
        • ...more
          View all episodesView all episodes
          Download on the App Store

          Thinking Elixir PodcastBy ThinkingElixir.com

          • 4.9
          • 4.9
          • 4.9
          • 4.9
          • 4.9

          4.9

          32 ratings


          More shows like Thinking Elixir Podcast

          View all
          Hanselminutes with Scott Hanselman by Scott Hanselman

          Hanselminutes with Scott Hanselman

          377 Listeners

          Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

          Software Engineering Radio - the podcast for professional software developers

          272 Listeners

          The Changelog: Software Development, Open Source by Changelog Media

          The Changelog: Software Development, Open Source

          283 Listeners

          Talk Python To Me by Michael Kennedy

          Talk Python To Me

          592 Listeners

          Software Engineering Daily by Software Engineering Daily

          Software Engineering Daily

          624 Listeners

          Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

          Syntax - Tasty Web Development Treats

          982 Listeners

          REWORK by 37signals

          REWORK

          211 Listeners

          CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

          CoRecursive: Coding Stories

          189 Listeners

          Elixir Mix by Charles M Wood

          Elixir Mix

          13 Listeners

          Elixir Wizards by SmartLogic LLC

          Elixir Wizards

          22 Listeners

          The Stack Overflow Podcast by The Stack Overflow Podcast

          The Stack Overflow Podcast

          64 Listeners

          Beam Radio by Lars Wikman

          Beam Radio

          11 Listeners

          Oxide and Friends by Oxide Computer Company

          Oxide and Friends

          47 Listeners

          Elixir Mentor by Jacob Luetzow

          Elixir Mentor

          2 Listeners

          The Pragmatic Engineer by Gergely Orosz

          The Pragmatic Engineer

          52 Listeners