Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> 0: p=0.40 1: p=0.60 which suggests that 1 is the next bit and leads to a suboptimal starting point for predicting the bit after that. The error is even more prominent with longer sequences as the joint probability distribution becomes more unfactorizable into marginal distributions (as I would expect any minimal algorithmic description of real-world data to be).

Can someone explain this part a bit more? I'm not seeing the issue. From what I see, if the first token (t1) output is a zero, then the next token (t2) would have probabilities 0:p=.90 and 1:p=.10. (And t2 0/1:p= .50/.50 if t1=1)

Mathematically, those line up with the initial distribution, so what's the concern? That's how conditional probability works.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: