Oscar Stiffelman

5.2CEApr 15

Oscar Stiffelman

In 1956 John Kelly wrote a paper at Bell Labs describing the relationship between gambling and Information Theory. What came to be known as the Kelly Criterion is both an objective and a closed-form solution to sizing wagers when odds and edge are known. Samuelson argued it was arbitrary and subjective, and successfully kept it out of mainstream economics. Luckily it lived on in computer science, mostly because of Tom Cover's work at Stanford. He showed that it is the uniquely optimal way to invest: it maximizes long-term wealth, minimizes the risk of ruin, and is competitively optimal in a game-theoretic sense, even over the short term. One of Cover's most surprising contributions to portfolio theory was the universal portfolio. Related to universal compression in information theory, it performs asymptotically as well as the best constant-rebalanced portfolio in hindsight. I borrow a trick from that algorithm to show that Kelly's objective, even in the general form, factors the investing problem into three terms: a money term, an entropy term, and a divergence term. The only way to maximize growth is to minimize divergence which measures the difference between our distribution and the true distribution in bits. Investing is, fundamentally, a compression problem. This decomposition also yields new practical results. Because the money and entropy terms are constant across strategies in a given backtest, the difference in log growth between two strategies measures their relative divergence in bits. I also introduce a winner fraction heuristic which allocates capital in proportion to each asset's probability of dominating the candidate set. The growth shortfall of this heuristic relative to the optimal portfolio is bounded by the entropy of the winner fraction distribution. To my knowledge, both the heuristic and the entropy bound are original contributions.

LGApr 3, 2014

The Least Wrong Model Is Not in the Data

Oscar Stiffelman

The true process that generated data cannot be determined when multiple explanations are possible. Prediction requires a model of the probability that a process, chosen randomly from the set of candidate explanations, generates some future observation. The best model includes all of the information contained in the minimal description of the data that is not contained in the data. It is closely related to the Halting Problem and is logarithmic in the size of the data. Prediction is difficult because the ideal model is not computable, and the best computable model is not "findable." However, the error from any approximation can be bounded by the size of the description using the model.

Oscar Stiffelman

2 Papers