Where did you get this from? AFAIK GPT-3 (for example) was trained on a GPU clus...

sillysaurusx · on Sept 14, 2020

Experience, for one. TPUs are dominating MLPerf benchmarks. That kind of performance can't be dismissed so easily.

GPT-2 was trained on TPUs. (There are explicit references to TPUs in the source code: https://github.com/openai/gpt-2/blob/0574c5708b094bfa0b0f6df...)

GPT-3 was trained on a GPU cluster probably because of Microsoft's billion-dollar Azure cloud credit investment, not because it was the best choice.

lostmsu · on Sept 14, 2020

I checked MLPerf website, and it looks like A100 is outperforming TPUv3, and is also more capable (there does not seem to be a working implementation of RL for Go on TPU).

To be fair, TPUv4 is not out yet, and it might catch up using the latest processes (7nm TSMC or 8nm Samsung).

https://mlperf.org/training-results-0-7

option · on Sept 14, 2020

no they are not. Go read recent MLPerf results more carefully and not Google’s blogpost. NVIDIA won 8/8 benchmarks for publicly available SW/HW combo. Also 8/8 on per chip performance. Google did show better results with some “research” system which is not available to anyone other then them yet.

sillysaurusx · on Sept 14, 2020

This is a weirdly aggressive reply. I don't "read Google's blogpost," I use TPUs daily. As for MLPerf benchmarks, you can see for yourself here: https://mlperf.org/training-results-0-6 TPUs are far ahead of competitors. All of these training results are openly available, and you can run them yourself. (I did.)

For MLPerf 0.7, it's true that Google's software isn't available to the public yet. That's because they're in the middle of transitioning to Jax (and by extension, Pytorch). Once that transition is complete, and available to the public, you'll probably be learning TPU programming one way or another, since there's no other practical way to e.g. train a GAN on millions of photos.

You'd think people would be happy that there are realistic alternatives to nVidia's monopoly for AI training, rather than rushing to defend them...

p1esk · on Sept 14, 2020

transitioning to Jax (and by extension, Pytorch)

Wait, what? Why would transition to Jax imply transition to Pytorch?

llukas · on Sept 14, 2020

You are basing your opinion on last year MLPerf and some stuff that may or may not be available in the future. MLPerf 0.7 "available" category has been ghosted by google.

Pointing this out is not aggressive.