Зустрічайте восьмий випуск Fwdays Architecture Talks!
Наші постійні спікери — Олександр Савченко, Йожеф Гісем та гість випуску Дмитро Овчаренко, AI CTO of Ministry of Digital Transformation, обговорять теми:
— Reference Architecture(s), Patterns, Styles для AI
— Процес створення кастомної LLM
- Тренування моделей
- Основні quality attributes (performance, caching, availability, security, ethical aspects)
— Як з'являються AI/GenAI інженери та де їх шукати?
Запрошуємо вас на конференцію Highload fwdays'25: https://bit.ly/3DitOVr
Корисні посилання:
— AI Enterprise Architecture:
- https://opea-project.github.io/latest/framework/framework.html
- https://www.nvidia.com/en-us/data-center/products/ai-enterprise/
- FTI - https://learning.oreilly.com/library/view/llm-engineers-handbook/9781836200079/
— What are Large Language Models (LLMs)? by Databricks - https://www.databricks.com/glossary/large-language-models-llm
— Continius pre-training vs finetuning - https://www.linkedin.com/pulse/teaching-old-dog-new-tricks-difference-between-lewis-ms-ccrp-ches-3abqc/
— Inference high-load and LLM in production - https://www.youtube.com/watch?v=NJ1jAfWR84k&list=WL&index=11&t=134s,
— Recommended Book by O.Savchenko: Build a Large Language Model (From Scratch) by Sebastian Raschka - https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167
- Code for developing and videos from Book - https://github.com/rasbt/LLMs-from-scratch
- Author Youtube - https://www.youtube.com/@SebastianRaschka
— Thoughtworks Tech Radar - AI items from ON-HOLD - https://www.thoughtworks.com/radar/techniques
— ISO/IEC 42001:2023 - https://www.iso.org/ru/standard/81230.html
— EU Artificial Intelligence Act - https://artificialintelligenceact.eu/
— Custom LLMs - https://er.ucu.edu.ua/server/api/core/bitstreams/1b205c1b-226c-4021-86e7-29a6063a46e6/content
- GPT-NL (Netherlands)
- Modello Italia (Italy, 9B parameters)
- ALLaM (Saudi Arabia, 70B parameters)
- SEA-Lion (Singapore, 7B parameters)
- TAIDE (Taiwan, 13B parameters)
- UAE’s Falcon models
— Datasets info:
- Malyuk Corpus: https://huggingface.co/datasets/lang-uk/malyuk - A large Ukrainian web corpus useful for pretraining foundational models.
- NER-UK: https://github.com/lang-uk/ner-uk - A dataset for Named Entity Recognition (NER) covering people, organizations, and locations.
- UA-GEC (Grammarly Corpus): https://github.com/grammarly/ua-gec - Focused on grammatical error correction—useful for fine-tuning language correctness models.
- БрУК (Brown Ukrainian Corpus): https://github.com/brown-uk/corpus - A balanced Ukrainian text corpus for linguistic analysis and text classification tasks.
- ZNO Corpus: https://huggingface.co/datasets/osyvokon/zno - Extracted from Ukrainian standardized exam texts, beneficial for educational AI applications.
- Eval-UA-tion Benchmark: https://aclanthology.org/2024.unlp-1.13/ - A standardized benchmarking suite for evaluating Ukrainian LLMs across multiple NLP tasks.
- UNLP-2024 Shared Task: https://github.com/unlp-workshop/unlp-2024-shared-task - A leaderboard and challenge dataset for fine-tuning Ukrainian LLMs.
- Awesome Ukrainian NLP: https://github.com/osyvokon/awesome-ukrainian-nlp - A collection of Ukrainian NLP resources, pretrained models, datasets, and tools.
На що варто підписатися:
– Більше цікавого для розробників: https://fwdays.com
– Телеграм-канал Fwdays: https://t.me/fwdays
– Телеграм-канал Олексія: https://t.me/OleksiiTheArchitect
– LinkedIn Олексія: https://www.linkedin.com/in/alexhelkar/
– LinkedIn Олександра: https://www.linkedin.com/in/o-savchenko/
– LinkedIn Дмитра: https://www.linkedin.com/in/dmytroovcharenko/?originalSubdomain=ua