I'm not sure if you're pointing out any / all of these:
#1. It is possible to get an arbitrarily fast tokens/second number, given you can pick model size.
#2. Llama 1B is roughly GPT-4.
#3. Given Llama 1B runs at 100 tokens/sec, and given performance at a given model size has continued to improve over the past 2 years, we can assume there will eventually be a GPT-4 quality model at 1B.
On my end:
#1. Agreed.
#2. Vehemently disagree.
#3. TL;DR: I don't expect that, at least, the trend line isn't steep enough for me to expect that in the next decade.