I also find this stuff fascinating, but I'm not sure Kolmogorov complexity (and therefore Solomonoff Induction) are useful models of any form of intelligence that matters in the real world. The main issue I see is that Komogorov Complexity is dominated by noise. The majority of bits in the smallest program that generates a given sequence will be devoted to reproducing arbitrary/accidental features of the sequence, rather than its computational structure. For example, if a Solomonoff Induction machine sees a data stream produced by someone flipping a coin, its model of that process will be... the data stream itself. To me, that's not intelligent behaviour. I would like an intelligent agent to "average out" the noise and say "this data stream is produced by sampling from a Bernoulli process with p = 0.5".
This is why I like Crutchfield's Epsilon Machine formalism[1]. He factorizes Kolmogorov Complexity into a noise part (Shannon Entropy) and a "computational part", which measures the structural complexity of the smallest probabilistic finite state machine that is an optimal predictor of the data sequence, under some coarse-graining. If the size of the optimal machine gets bigger and bigger the more fine-grained your measurement, then there's no finite representation of the sequence as a FSM, so you "jump up one level" to a stack machine and try again.
In the coin-flipping example above, the epsilon machine will correctly factor out all of the noise in the sequence, leaving you with a simple probabilistic model of a coin-flip process.
This procedure is also computable, and even practical: you can implement the FSM-only algorithm in a couple hundred lines of code.
This is why I like Crutchfield's Epsilon Machine formalism[1]. He factorizes Kolmogorov Complexity into a noise part (Shannon Entropy) and a "computational part", which measures the structural complexity of the smallest probabilistic finite state machine that is an optimal predictor of the data sequence, under some coarse-graining. If the size of the optimal machine gets bigger and bigger the more fine-grained your measurement, then there's no finite representation of the sequence as a FSM, so you "jump up one level" to a stack machine and try again.
In the coin-flipping example above, the epsilon machine will correctly factor out all of the noise in the sequence, leaving you with a simple probabilistic model of a coin-flip process.
This procedure is also computable, and even practical: you can implement the FSM-only algorithm in a couple hundred lines of code.
[1]: https://csc.ucdavis.edu/~cmg/compmech/pubs/CalcEmergTitlePag...