I'm a former superforecaster and while working at Metaculus I introduced the practice of recurring quarterly tournaments on short-term questions, which seems to have become a major source for these type of AI benchmarks and forecasting tournaments.
While I'm impressed with LLM forecasting performance in a general way, I don't find the specific headline findings of these benchmarks -- "AI expected to reach human level at forecasting in October" -- to be remotely interesting or important, and I'm mystified why anyone finds them convincing.
In general, I think the nonprofit forecasting space has suffered badly from its takeover by EA and I would not take the topline findings of any of these orgs very seriously on questions that overlap with "EA topics". Even before the takeover, Metaculus turns out to have been founded for futurist advocacy purposes by people with strong pre-existing views who don't know very much about practical forecasting, or really about practical anything. It would be nice if we had one of these that was actually run by people with a neutral "needs of the platform" ethos as opposed to being boosters for some predetermined external cause that they want supportive forecasting for.
I'm a former superforecaster and while working at Metaculus I introduced the practice of recurring quarterly tournaments on short-term questions, which seems to have become a major source for these type of AI benchmarks and forecasting tournaments.
While I'm impressed with LLM forecasting performance in a general way, I don't find the specific headline findings of these benchmarks -- "AI expected to reach human level at forecasting in October" -- to be remotely interesting or important, and I'm mystified why anyone finds them convincing.
In general, I think the nonprofit forecasting space has suffered badly from its takeover by EA and I would not take the topline findings of any of these orgs very seriously on questions that overlap with "EA topics". Even before the takeover, Metaculus turns out to have been founded for futurist advocacy purposes by people with strong pre-existing views who don't know very much about practical forecasting, or really about practical anything. It would be nice if we had one of these that was actually run by people with a neutral "needs of the platform" ethos as opposed to being boosters for some predetermined external cause that they want supportive forecasting for.
On the preference to be served by a robot, I would strongly prefer home assistance (tidying, cleaning) from a robot over a person
I would think the more menial and personal the task, the more I would prefer to be served by a robot
Why such selective data on city crime rates? Makes me wonder if this is cherry-picking.
I haven't verified this but there's some skepticism about Pangram's accuracy:
https://x.com/i/status/2010531325401280893