Intro topic: Grills
News/Links:
- You can’t call yourself a senior until you’ve worked on a legacy project
- https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project
- Recraft might be the most powerful AI image platform I’ve ever used — here’s why
- https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why
- NASA has a list of 10 rules for software development
- https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm
- AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
- https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre
Book of the Show
- Patrick:
- The Player of Games (Ian M Banks)
- https://a.co/d/1ZpUhGl (non-affiliate)
- Jason:
- Basic Roleplaying Universal Game Engine
Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h
Tool of the Show
- Patrick:
- Jason:
- Features and Labels ( https://fal.ai )
Topic: Reinforcement Learning
- Three types of AI
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Online vs Offline RL
- Optimization algorithms
- Value optimization
- Policy optimization
- Policy Gradients
- Actor-Critic
- Proximal Policy Optimization
- Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
- Policy optimization is less intuitive at first (policy gradients)
- Converting values to policies in deep learning is difficult
- Imitation Learning
- Supervised policy learning
- Often used to bootstrap reinforcement learning
- Policy Evaluation
- Propensity scoring versus model-based
- Challenges to training RL model
- Two optimization loops
- Collecting feedback vs updating the model
- Difficult optimization target
- RLHF & GRPO