TMTB: Ilya Sutsekver on Dwarkesh's Pod Key Quotes
Full video here
Back to the “age of research” (scaling alone won’t get us there) — generalization is the crux
“Up until 2020 it was the age of research. From 2020 to 2025 it became the age of scaling—people said, ‘This is amazing, just keep scaling.’ But now the scale is so big that it’s fair to ask: if you 100× the scale, does everything transform? I don’t think that’s true. So it’s back to the age of research again, just with big computers… The most fundamental thing is that these models generalize dramatically worse than people. That seems very fundamental.”
Eval scores vs. real-world impact — RL “reward hacking” & brittle generalization
“Models do amazingly well on evals, yet the economic impact lags. You fix a bug, it introduces another; you report the second, it brings back the first. One explanation is that RL teams keep creating new RL environments inspired by the evals—so the model looks great on release tasks, while generalization is inadequate. Combine that with generalization being weak and you can explain a lot of the disconnect between eval performance and reality.”
Human-like continual learning agents (not “finished AGI”)
“A human is not an AGI; we rely on continual learning. Success looks more like a super-intelligent 15-year-old—eager, capable, but with lots to learn, and then it learns on the job. You deploy a learning algorithm the way you’d bring a worker into an organization. The deployment itself is part of the learning process, not ‘drop a finished mind that knows every job.’”
Timelines (5–20 years to human-level learners → superhuman via deployment)
Keep reading with a 7-day free trial
Subscribe to TMT Breakout to keep reading this post and get 7 days of free access to the full post archives.


