We talk with Oleg Tarasenko and Tze Yiing about crawling the web using Elixir. Oleg created the crawly project to help solve this problem and Tze Yiing joined him as a contributor and maintainer. We cover how Elixir is well suited to orchestrate crawling, how to deal with login pages, understanding the legal concerns, building a codeless scraper and much more!
Show Notes online - http://podcast.thinkingelixir.com/31
https://dashbit.co/blog/ten-years-ish-of-elixir – January 9th marked the 10th year since the first commit to the Elixir repositoryhttps://github.com/elixir-lang/elixir/commit/337c3f2d569a42ebd5fcab6fef18c5e012f9be5b – First commit on the repositoryhttps://twitter.com/josevalim/status/1349010127270129670 – Jose Valim reveals the name of his secret project is called 'Nx'https://remote.com/blog/welcoming-elixir-creator-jose-valim – Jose Valim joins Remote as a Technical Adivsorhttps://twitter.com/josevalim/status/1347858475267854336 – ExUnit will catch SIGQUIT message from CTRL+\ and shows the tests that were runninghttps://github.com/elixir-lang/elixir/blob/master/lib/mix/lib/mix/tasks/test.ex#L34 – ExUnit will print how much time the test suite spent on async tests vs sync testshttps://twitter.com/fhunleth/status/1348092050487570433 – Nerves support on the M1 is looking goodhttps://www.youtube.com/playlist?list=PLqj39LCvnOWZl_Pb0Y7wGWijKbTvL4gJg – Elixir Conf 2020 videos have all been publicly released!Do you have some Elixir news to share? Tell us at @ThinkingElixir or email at [email protected]
https://oltarasenko.medium.com/web-scraping-with-elixir-and-crawly-extracting-data-behind-authentication-a52584e9cf13https://oltarasenko.medium.com/using-elixir-and-crawly-for-price-monitoring-7364d345fc64 – Using Elixir for price monitoringhttps://hex.pm/packages/crawlyhttps://github.com/oltarasenko/crawlyhttps://www.erlang-solutions.com/blog/web-scraping-with-elixir.html – Oleg's older web scraping with Elixir articlehttps://www.erlang-solutions.com/blog/how-to-build-a-machine-learning-project-in-elixir.html – Building a machine learning projects with Elixir, Tensorflow and Crawlyhttps://oltarasenko.medium.com/what-is-web-scraping-and-why-you-might-want-to-use-it-a0e4b621f6d0 – What is web scraping, and why you might want to use it?https://www.pillowskin.com – Ziinc's project using scraping and aggregationhttps://www.tensorflow.org/https://oltarasenko.medium.com/the-unofficial-guide-to-extracting-google-search-results-in-2021-with-elixir-7a6ef80d0f5bhttps://scrapy.org/https://github.com/fredwu/crawlerhttps://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-linkedin-protects-scraping-public-data – EFF legal interpretation of LinkedIn vs HiQ scraping casehttps://github.com/scrapinghub/splash/https://www.joinhoney.com/https://hexdocs.pm/crawly/readme.html#quickstart – Crawly quickstart guidhttps://hexdocs.pm/crawly/tutorial.html – Crawley tutorialhttps://github.com/oltarasenko/crawly_ui – Crawly UI projecthttp://crawlyui.com/ – Crawly UI project pageData is the new goldhttps://t.me/elixir_crawly – Crawley Telegram grouphttps://github.com/oltarasenko – Oleg on Githubhttps://oltarasenko.medium.com/ – Oleg's Bloghttps://twitter.com/tzeyiing – Lee TzeYiing on Twitterhttps://github.com/Ziinc – Lee TzeYiing on Githubhttps://www.tzeyiing.com – Lee TzeYiing BlogMessage the show - @ThinkingElixirEmail the show - [email protected]Mark Ericksen - @brainlidDavid Bernheisel - @bernheiselCade Ward - @cadebward