Most evidence you have about the world is claims from other people, not direct experiment. There seems to be a thought-terminating cliche here on HN, dismissing any claim from employees of large tech companies.
Unlike seemingly most here on HN, I judge people's trustworthiness individually and not solely by the organization they belong to. Noam Brown is a well known researcher in the field and I see no reason to doubt these claims other than a vague distrust of OpenAI or big tech employees generally which I reject.
> I judge people's trustworthiness individually and not solely by the organization they belong to
This is certainly a courageous viewpoint – I imagine this makes it very hard for you to engage in the modern world? Most of us are very bound by institutions we operate in!
Over the years my viewpoint has led to great success in predicting the direction and speed of development of many technologies, among other things. As a result, by objective metrics of professional and financial success, I have done very well. I think your imagination is misleading you.
> dismissing any claim from employees of large tech companies
Me: I have a way to turn lead into gold.
You: Show me!!!
Me: NO (and then spends the rest of my life in poverty).
Cold Fusion (physics not the programing language) is the best example of why you "Show your work". This is the Valley we're talking about. It's the thudnderdome of technology and companies. If you have a meaningful breakthrough you don't talk about it you drop it on the public and flex.
I don't think this is a reasonable take. Some people/organizations send signals about things that we're not ready to fully drop it on the world. Others consider those signals in context (reputation of sender, prior probability of being true, reasons for sender to be honest vs. deceptive, etc).
When my wife tells me there's a pie in the oven and it's smelling particularly good, I don't demand evidence or disbelieve the existence of the pie. And I start to believe that it'll probably be a particularly good pie.
This is from OpenAI. Here they've not been so great with public communications in the past, and they have a big incentive in a crowded marketplace to exaggerate claims. On the other hand, it seems like a dumb thing to say unless they're really going to deliver that soon.
> Some people/organizations send signals about things that we're not ready to fully drop it on the world.
This is called marketing.
> When my wife tells me there's a pie in the oven and it's smelling particularly good, I don't demand evidence
Because you have evidence, it smells.
And if later your ask your wife "where is the pie" and she says "I sprayed pie scent in the air, I was just singling" how are you going to feel?
Open AI spent its "fool us once" card already. Doing things this way does not earn back trust, failure to deliver (and they have done that more than once) ... See staff non disparagement, see the math fiasco, see open weights.
Many signals are marketing, but the purpose of signals is not purely to develop markets. We all have to determine what we think will happen next and how others will act.
> Because you have evidence, it smells.
I think you read that differently than what I intended to write -- she claims it smells good.
> Open AI spent its "fool us once" card already.
> > This is from OpenAI. Here they've not been so great with public communications in the past, and they have a big incentive in a crowded marketplace to exaggerate claims.
A thought-terminating cliché? Not at all, certainly not when it comes to claims of technological or scientific breakthroughs. After all, that's partly why we have peer review and an emphasis on reproducibility. Until such a claim has been scrutinised by experts or reproduced by the community at large, it remains an unverified claim.
>> Unlike seemingly most here on HN, I judge people's trustworthiness individually and not solely by the organization they belong to.
That has nothing to do with anything I said. A claim can be false without it being fraudulent, in fact most false claims are probably not fraudulent; though, still, false.
Claims are also very often contested. See e.g. the various claims of Quantum Superiority and the debate they have generated.
Science is a debate. If we believe everything anyone says automatically, then there is no debate.
They don't give a lot of details but they give enough for it to be pretty hard to say the claim is false but unfraudulent.
Some researchers got a breakthrough and decided to share right then rather than the months later it would take for a viable product. It happens, researchers are humans after all and i'm generally glad to take a peek at the actual frontier rather than what's behind by many months.
You can and it's fair to ignore such claims until that part but i think anything more than that is fairly uncharitable for the situation.
It's only a "debacle" if you already assume OpenAI isn't trustworthy, because they said they don't train on the test set. I hope you can see that presenting your belief that they lied about training on the test set as evidence of them being untrustworthy is a circular argument. You're assuming the thing you're trying to prove.
The one OpenAI "scandal" that I did agree with was the thing where they threatened to cancel people's vested equity if they didn't sign a non-disparagement agreement. They did apologize for that one and make changes. But it doesn't have a lot to do with their research claims.
I'm open to actual evidence that OpenAI's research claims are untrustworthy, but again, I also judge people individually, not just by the organization they belong to.
They funded the entire benchmark and didn’t disclose their involvement. They then proceeded to make use of the benchmark while pretending like they weren’t affiliated with EpochAI. That’s a huge omission and more than enough reason to distrust their claims.
IMO their involvement is only an issue if they gained an advantage on the benchmark by it. If they didn't train on the test set then their gained advantage is minimal and I don't see a big problem with it nor do I see an obligation to disclose. Especially since there is a hold-out set that OpenAI doesn't have access to, which can detect any malfeasance.
It's typically difficult to find direct evidence for bias. That is why rules for conflict of interest and disclosure are strict in research and academia. Crucially, something is a conflict of interest if it could be perceived as a conflict of interest by someone external, so it doesn't matter if you think you could judge fairly, it's important if someone else might doubt you could.
Not disclosing a conflict of interest is generally considered a significant ethics violation, because it reduces trust in the general scientific/research system. Thus OpenAI has become untrustworthy in many people's view irrespective if their involvement with the benchmarks creation affected their results or not.
There’s no way to figure out whether they gained an advantage. We have to trust their claims, which again, is an issue for me after finding out they already lied.