Discussion about this post

User's avatar
Olle Häggström's avatar

Good post! I propose to use the term "Schubertian sobriety" for this kind of level-headed analysis.

David Manheim's avatar

I think a large part of the gap in expectations is about generalization, or in different terms, how the correlations between the metrics and the capabilities degrade. Everyone agrees that the metrics are going to be overfit, and won't directly predict success - as you note - but if the correlation between measurable capabilities and hard to measure capabilities doesn't entirely disappear, we'll also see continued significant progress on the hard to measure items as well.

And evaluating the two theories performance to date, we have seen exactly the sort of general progress that broad generalization predicts - lower text prediction log-loss ended up leading to increased performance on a wide variety of tasks, and improved time horizon success aimed primarily at for software development has led to better writing, better qualitative analysis, and better mathematical capabilities. And this isn't back-forecasting, it's effectively exactly what those promoting the scaling hypothesis were betting on, and it has paid off in the last several generations of model.

Of course, it's always possible that the correlation between measured capability and most other tasks drops to zero, or even ends up perversely decreasing performance - but at that point, the generalization argument is that the developers will find (possibly harder to measure but at least temporarily) more robust measures to improve. Though, critically, this retargeting only works as long as we can observe the changes' impact on performance. And that's exactly the case that alignment researchers have been worried about for well over a decade, at this point.

15 more comments...

No posts

Ready for more?