Rubberduck FM

#15: Princess Mononoke rages against Image Generation


Listen Later

MasaがGPT-4o画像生成の仕組みについて、各エンジニアの予想を調査したのでそれについて話します。

  • Pythonで学ぶ画像生成 機械学習実践シリーズ
  • dataclass で万物に型を付けよう
  • Limitless Pendant
  • 創作する遺伝子 僕が愛したMEMEたち
  • 【トーク】インパルス板倉 嫉妬した芸人ベスト10!板倉が抱えていた様々な「言い訳クリスタル」を粉砕した芸人たちを本音で話す!
  • Mickey 17
  • Bong Joon Ho
  • Robert Pattinson
  • Mickey7
  • try! Swift Tokyo Timetable
  • WWDC 2025
  • Apple Park
  • Claude 3.7 Sonnet
  • OpenAI Realtime API
  • TC39
  • SeattleJS
  • Temporal
  • ts-blank-space
  • TypeScript syntax not supported by `ts-blank-space`
  • Oracle justified its JavaScript trademark with Node.js—now it wants that ignored
  • Sun Microsystems
  • Oracle JavaScript Extension Toolkit
  • Princess Mononoke 4K IMAX
  • Introducing 4o Image Generation
  • Autoregressive model
  • Understanding Next Token Prediction
  • Sora: Creating video from text
  • Video generation models as world simulators
  • Bay Bridge 近くのOpenAIオフィスはありました
  • Golden Gate Bridge
  • San Francisco–Oakland Bay Bridge
  • 1人目 動詞 さんの予想
  • GPT-4oとGemini-2.0の画像生成能力はいかにして作られているのか
  • [2206.10789] Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
  • [2110.04627] Vector-quantized Image Modeling with Improved VQGAN
  • [2309.02591] Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
  • [2206.03605] Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
  • [2402.12226] AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
  • [2404.02905] Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
  • A GPT-4o generated image, 2024年5月
  • 2人目 Sangyun Lee さんの予想
  • [2310.01400] Sequential Data Generation with Groupwise Diffusion Process
  • 3人目 Wh さんの予想
  • [2406.11838] Autoregressive Image Generation without Vector Quantization
  • [2105.01601] MLP-Mixer: An all-MLP Architecture for Vision
  • 条件付き確率分布
  • 4人目 K.Ishi さんの予想
  • [2408.11039] Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
  • 5人目 Saining Xie さんの予想
  • [2103.00020] Learning Transferable Visual Models From Natural Language Supervision
  • [2112.10752] High-Resolution Image Synthesis with Latent Diffusion Models
  • 6人目 Nayan Saxena さんの予想
  • OpenAI image gen actually shows just 5 frames
  • [2005.14165] Language Models are Few-Shot Learners
  • 4o Image Generation In-Context Learning
...more
View all episodesView all episodes
Download on the App Store

Rubberduck FMBy rubberduckfm