
Sign up to save your podcasts
Or
Welcome back to AI Daily! Today we discuss three great stories, starting with HyenaDNA. The application of the hyena model in DNA sequencing - enabling models to handle a million context length and revolutionizing our understanding of genomics. Secondly, we cover the exciting open-source implementation of StyleDrop - a tool that's making waves in the world of image editing and style replacement. Finally, we delve into the topic of data poisoning - how a small amount of injected data can drastically alter the outcome of an instruction tuning and the implications this has for AI security.
Key Points:
1️⃣ HyenaDNA
* HyenaDNA utilizes sub-quadratic scaling for DNA sequences, enabling a million context length, each a unique nucleotide, trained on 3 trillion tokens.
* HyenaDNA, setting a new state-of-the-art in genomics benchmarks, could predict gene expression changes, elucidating protein creation from genetic polymorphisms.
* It's 160 times faster than previous LLMs, fitting on a single CoLab, showcasing the potential to outperform transformers and attention models.
2️⃣ Open-Source StyleDrop
* An open-source version of Style Drop, an image editing and style replacing tool, has been implemented and made available for public use.
* Style Drop outperforms comparable models and offers comprehensive instructions for setup, allowing users to experiment with stylizing lettering and more.
* Following a pattern set by Dream Booth, Style Drop went from being a Google research paper to being implemented as an open-source project on GitHub.
3️⃣ Data Poisoning
* Two papers discuss data poisoning, a technique where information like ads or SEO can be injected into LLMs, impacting their responses and recommendations.
* Even a small number of examples in a dataset can effectively "poison" it, significantly altering the output of a language model during fine tuning.
* This technique is expected to be used with open-source datasets for fine-tuning, similar to how publishers put fake words in dictionaries to trace usage.
🔗 Episode Links
* HyenaDNA
* StyleDop
* Data Poisoning
* OpenAI
Follow us on Twitter:
* AI Daily
* Farb
* Ethan
* Conner
Subscribe to our Substack:
* Subscribe
4.9
99 ratings
Welcome back to AI Daily! Today we discuss three great stories, starting with HyenaDNA. The application of the hyena model in DNA sequencing - enabling models to handle a million context length and revolutionizing our understanding of genomics. Secondly, we cover the exciting open-source implementation of StyleDrop - a tool that's making waves in the world of image editing and style replacement. Finally, we delve into the topic of data poisoning - how a small amount of injected data can drastically alter the outcome of an instruction tuning and the implications this has for AI security.
Key Points:
1️⃣ HyenaDNA
* HyenaDNA utilizes sub-quadratic scaling for DNA sequences, enabling a million context length, each a unique nucleotide, trained on 3 trillion tokens.
* HyenaDNA, setting a new state-of-the-art in genomics benchmarks, could predict gene expression changes, elucidating protein creation from genetic polymorphisms.
* It's 160 times faster than previous LLMs, fitting on a single CoLab, showcasing the potential to outperform transformers and attention models.
2️⃣ Open-Source StyleDrop
* An open-source version of Style Drop, an image editing and style replacing tool, has been implemented and made available for public use.
* Style Drop outperforms comparable models and offers comprehensive instructions for setup, allowing users to experiment with stylizing lettering and more.
* Following a pattern set by Dream Booth, Style Drop went from being a Google research paper to being implemented as an open-source project on GitHub.
3️⃣ Data Poisoning
* Two papers discuss data poisoning, a technique where information like ads or SEO can be injected into LLMs, impacting their responses and recommendations.
* Even a small number of examples in a dataset can effectively "poison" it, significantly altering the output of a language model during fine tuning.
* This technique is expected to be used with open-source datasets for fine-tuning, similar to how publishers put fake words in dictionaries to trace usage.
🔗 Episode Links
* HyenaDNA
* StyleDop
* Data Poisoning
* OpenAI
Follow us on Twitter:
* AI Daily
* Farb
* Ethan
* Conner
Subscribe to our Substack:
* Subscribe
2,243 Listeners
323 Listeners