Nassim will be a featured presenter at the New England Complex Systems Institute’s five day certificate program in Complexity and Data Analytics, Risk & Opportunity, and Implications for Strategy and Policy from May 1-5, 2017, in Washington, D.C. Please visit the site for more details.
Nassim discusses “how ‘evidence’ about the risk of terrorism as shown in the NYT and by BS vending journos makes no sense statistically” and “explains the difference between classes using principles of Extreme Value Theory EVT without any math.”
Ed Thorp memoirs read like a thriller –mixing wearable computers that would have made James Bond proud, shady characters, great scientists and poisoning attempts (in addition to the sabotage of Ed’s car so he would have an “accident” in the desert). The book will reveal a thorough, rigorous, methodical person in search of life, knowledge, financial security, and, not least, fun. Thorp is also known to a generous man, intellectually speaking, eager to share his discoveries with random strangers (in print but also in person) –something you would hope to find in a scientist but usually don’t. But he is humble –he would qualify as the only humble trader on planet Earth –so, unless the readers can reinterpret what’s between the lines, they won’t notice that his contribution is vastly more momentous than he reveals. Why?
Because of its simplicity. Its sheer simplicity.
For it is the straightforward character of his contributions and insights that made it both invisible in academia and useful for practitioners. My purpose here is not to explain or summarize the book. Thorp –not surprisingly –writes in a direct, clear, and engaging way; I am here, as a trader and a practitioner of mathematical finance, to show its importance and put it in context for my community of real world scientists-traders and risk takers in general.
The context is as follows. Ed Thorp is the first modern mathematician who successfully used quantitative methods for risk taking –and most certainly the first mathematician who met financial success doing it. Since then there have been a cohort, such as the Stony Brook whiz kids –but Thorp is their dean. His main and most colorful predecessor, Girolamo (sometimes Geronimo) Cardano, a sixteenth Century polymath and mathematician who –sort of –wrote the first version of Beat the Dealer, was a compulsive gambler. To put it mildly, he was unsuccessful at it –not least because addicts are bad risk takers, and, to be convinced, just take a look at the magnificence of Monte Carlo, Las Vegas, and Biarritz, places financed by their compulsion. Cardano’s book, Liber de ludo aleae (“Book on Games of Chance”) was instrumental in the later development of probability, but, unlike Thorp, was less of an inspiration for gamblers and more of one for mathematicians. Another mathematician, a French Protestant refugee in London, Abraham de Moivre, a frequenter of gambling joints and author of The doctrine of chances: or, a method for calculating the probabilities of events in play (1718) could hardly make both ends meet. One can count another half a dozen mathematician-gamblers, in a line that include the great Fermat, Huygens — who were either indifferent to the bottom line or (for those who weren’t) not particularly good at it. Before Ed Thorpe, mathematicians of gambling had their love of chance largely unrequited.
Thorp’s method is as follows. He cuts to the chase in identifying a clear edge (that is something that in the long run puts the odds in his favor). The edge has to be obvious and uncomplicated. For instance, calculating the roulette momentum with the first wearable computer (with no less of a co-conspirator than the great Claude Shannon, father of information theory), he estimated a typical edge of roughly 40% per bet. But that part is easy, very easy. It is capturing the edge, converting it into dollars in the bank, restaurant meals, interesting cruises, and Christmas gifts to friends and family; that’s the hard part. It is the dosage of your betting –not too little, not too much –that in the end matters. For that, Ed did great work on his own, before the theoretical refinement that came from a third member of the Information Trio: John Kelly, of the Kelly criterion, which we discuss today because of Ed Thorp made it operational.
A bit more about the simplicity before we discuss the dosing. For an academic judged by his colleagues, rather than the bank manager of his local branch (or his tax accountant), a mountain giving birth to a mouse, after huge labor, is not a very good thing. They prefer the mouse to give birth to a mountain; it is the perception of sophistication that matters. The more complicated, the better; the simple doesn’t get you citations, H-values or some such metric du jour that brings the respect of the university administrators as they can understand that stuff but not the substance of the real work. The only academics who escape the burden of complication-for-complication’s sake are the great mathematicians and physicists (and from what I hear this is becoming harder and harder in today’s funding and ranking environment).
Ed was initially an academic, but he favored learning by doing, with his skin in the game. When you reincarnate as practitioner, you want the mountain to give birth to the simplest possible strategy, and one that has the smallest amount of side effects, the minimum possible hidden complications. The genius of Ed is demonstrated in the way he came up with very simple rules in Black Jack. Instead of engaging in complicated combinatorics and memory–challenging card counting (something that requires one to be a savant), he crystallizes all his sophisticated research into simple rules. Go to a Black Jack table. Keep a tally. Start with zero. Add one for some strong cards, minus ones for weak ones, and nothing for others. It is easy to just increment up and down mentally, bet larger when the number is high, smaller when it is low, and such a strategy is immediately applicable by anyone with the ability to tie his shoes or find a casino on a map. Even while using wearable computers at the roulette table, the detection of edge was simple, so simple that one can get it while standing on a balance ball in the gym; the fanciness resides in the implementation and the wiring.
As a side plot, Ed discovered what is known today as the Black Scholes option formula, before Black and Scholes (and it is a sign of economics public relations that the formula doesn’t bears his name –I’ve called it Bachelier-Thorp) . His derivation was too simple –nobody at the time realized it could be potent.
Now the money management –something central for those who learn from being exposed to their own profits and losses. Having an “edge” and surviving are two different things: the first requires the second. As Warren Buffet said: “in order to succeed you must first survive”. You need to avoid ruin. At all costs.
And there is a dialectic between you and your P/L: you start betting small (a proportion of initial capital) and your risk control –the dosage — also controls your discovery of the edge. It is like trial and error, by which you revise both your risk appetite and your assessment of your odds one step at a time.
Finance academics, as it has been recently shown by Ole Peters and Murray Gell-Mann, did not get the point that avoiding ruin, as a general principle, makes your gambling and investment strategy extremely different from the one that is proposed by their literature. As we saw they were paid by administrators via colleagues to make life complicated, not simpler. They invented something useless called utility theory (tens of thousands of papers are still waiting for a real reader). And they invented the idea that one could get to know the collective behavior of future prices in infinite detail –things such as correlation, identified today, would never change in the future. More technically, to implement the portfolio construction suggested by modern financial theory, one needs to know the entire joint probability distribution of all assets for the entire future, plus the exact utility function for wealth at all future times. And without errors! (I have shown that estimation errors make the system explode.) We are lucky if we can know what we will eat for lunch tomorrow –how can we figure out the dynamics until the end of time?
Kelly-Thorp method, requires no joint distribution or utility function. In practice one needs the ratio of expected profit to worst-case return — dynamically adjusted (that is, one gamble at a time) to avoid ruin. That’s all.
Thorp and Kelly’s ideas were rejected by economists — in spite of their practical appeal — because of their love of general theories for all asset prices, dynamics of the world, etc. The famous patriarch of modern economics, Paul Samuelson, was supposedly on a vendetta against Thorp. Not a single one of the works of these economists will eventually survive: your strategy to survive isn’t the same as ability to impress colleagues.
So the world today is divided into two groups. The first method is that of the economists who tend to blow up routinely or get rich collecting fees for managing money, not from direct speculation. Consider that Long Term Capital Management that had the crème de la crème of financial economists, blew up spectacularly in 1998, losing a multiple of what they thought their worst case scenario was.
The second method, that of the information theorists as pioneered by Ed, is practiced by traders and scientists-traders. Every surviving speculator uses explicitly or implicitly the second method (evidence: Ray Dalio, Paul Tudor Jones, Renaissance Technologies, even Goldman Sachs!) I said every because, as Peters and Gell-Mann have shown, those who don’t will eventually go bust.
So say you inherit $82,000 from uncle Morrie: now you know that there exists a strategy that will allow you to double the inheritance without ever going through bankruptcy.
Some additional wisdom I personally learned from Thorp. Many successful speculators, after their first break in life, get involved in large scale structures, with multiple offices, morning meeting, coffee, corporate intrigues, building more wealth while losing control of their lives. Not Ed. After the separation from his partners and the closing of his firm (for reasons that have nothing to do with him), he did not start a new mega-fund. He limited his involvement in managing other people’s money. Most other people do reintegrate in the comfort of firms and leverage their reputation by raising monstrous amounts of outside money in order to collect large fees. But such a restraint requires some intuition, some self knowledge. It is vastly less stressful to be independent –and one is never independent when involved in a large structure with powerful clients. It is hard enough to deal with the intricacies of probabilities, you need to avoid the vagaries of exposure to human moods. True success is exiting some rat race to modulate one’s activities for his peace of mind. Thorp certainly learned a lesson: the most stressful job he ever had was running the math department of the University of California Irvine. You can detect that the man is in control of his life. This explains why he looked younger on the second time I saw him, in 2016, than he did the first time, in 2005.
Let us take the idea of the last chapter [the intransigent minority’s disproportional influence] one step further, get a bit more technical, and generalize. It will debunk some of the fallacies we hear in psychology, “evolutionary theory”, game theory, behavioral economics, neuroscience, and similar fields not subjected to proper logical (and mathematical) rigor, in spite of the occasional semi-complicated equations. For instance we will see why behavioral economics will necessarily fail us even if its results were true at the individual level and why use of brain science to explain behavior has been no more than great marketing for scientific papers.
Consider the following as a rule. Whenever you have nonlinearity, the average doesn’t matter anymore. Hence:
The more nonlinearity in the response, the less informational the average.
For instance, your benefit from drinking water would be linear if ten glasses of water were ten times as good as one single glass. If that is not the case, then necessarily the average water consumption matters less than something else that we will call “unevenness”, or volatility, or inequality in consumption. Say your average daily consumption needs to be one liter a day and I gave you ten liters one day and none for the remaining nine days, for an average of one liter a day. Odds are you won’t survive. You want your quantity of water to be as evenly distributed as possible. Within the day, you do not need to consume the same amount water every minute, but at the scale of the day, you want maximal evenness.
The effect of the nonlinearity in the response on the average –and the informational value of such an average –is something I’ve explained in some depth in Antifragile, as it was the theme of the book, so I will just assume a summary here is sufficient. From an informational standpoint, someone who tells you “We will supply you with 0ne liter of water liter day on average” is not conveying much information at all; there needs to be a second dimension, the variations around such an average. You are quite certain that you will die of thirst if his average comes from a cluster of a hundred liters every hundred days.
Note that an average and a sum are mathematically the same thing up to a simple division by a constant, so the fallacy of the average translate into the fallacy of summing, or aggregating, or looking at collective that has many components from the properties of a single unit.
As we saw, complex systems are characterized by the interactions between their components, and the resulting properties of the ensemble not (easily) seen from the parts.
There is a rich apparatus to study interactions originating from what is called the Ising problem, after the physicist Ernst Ising, originally in the ferromagnetic domain, but that has been adapted to many other areas. The model consists of discrete variables that represent atoms that can be in one of two states called “spins” but are in fact representing whether the state is what is nicknamed “up” or “down” (or can be dealt with using +1 or −1). The atoms are arranged in a lattice, allowing each unit to interact with its neighbors. In low dimensions, that is that for every atom you look at an interaction on a line (one dimensional) between two neighbors one to its left and one to its right, on a grid (two dimensional), the Ising model is simple and lend itself to simple solutions.
One method in such situations called “mean field” is to generalize from the “mean”, that is average interaction and apply to the ensemble. This is possible if and only if there is no dependence between one interaction and another –the procedure appears to be the opposite of renormalization from the last chapter. And, of course, this type of averaging is not possible if there are nonlinearities in the effect of the interactions.
More generally, the Übererror is to apply the “mean field” technique, by looking at the average and applying a function to it, instead of averaging the functions –a violation of Jensen’s inequality [Jensen’s Inequality, definition: a function of an average is not an average of a function, and the difference increases with disorder]. Distortions from mean field techniques will necessarily occur in the presence of nonlinearities.
What I am saying may appear to be complicated here –but it was not so with the story of the average water consumption. So let us produce equivalent simplifications across things that do not average.
From the last chapter [Minority Rule],
The average dietary preferences of the population will not allow us to understand the dietary preferences of the whole.
Some scientist observing the absence of peanuts in U.S. schools would infer that the average student is allergic to peanuts when only a very small percentage are so.
Or, more bothersome
The average behavior of the market participant will not allow us to understand the general behavior of the market.
These points appear clear thanks to our discussion about renormalization. They may cancel some stuff you know. But to show how under complexity the entire field of social science may fall apart, take one step further,
The psychological experiments on individuals showing “biases” do not allow us to understand aggregates or collective behavior, nor do they enlighten us about the behavior of groups.
Human nature is not defined outside of transactions involving other humans. Remember that we do not live alone, but in packs and almost nothing of relevance concerns a person in isolation –which is what is typically done in laboratory-style work.
Some “biases” deemed “irrational” by psycholophasters interested in pathologizing humans are not necessarily so if you look at their effect on the collective.
What I just said explains the failure of the so-called field of behavioral economics to give us any more information than orthodox economics (itself rather poor) on how to play the market or understand the economy, or generate policy.
But, going further, there is this thing called, or as Fat Tony would say, this ting called game theory that hasn’t done much for us other than produce loads of verbiage. Why?
The average interaction as studied in game theory insofar as it reveals individual behavior does not allow us to generalize across preferences and behavior of groups.
Groups are units on their own. There are qualitative differences between a group of ten and a group of, say 395,435. Each is a different animal, in the literal sense, as different as a book is from an office building. When we focus on commonalities, we get confused, but, at a certain scale, things become different. Mathematically different. The higher the dimension, in other words the number of possible interactions, the more difficult to understand the macro from the micro, the general from the units.
Or, in spite of the huge excitement about our ability to see into the brain using the so-called field of neuroscience:
Understanding how the subparts of the brain (say, neurons) work will never allow us to understand how the brain works.
So far we have no f***g idea how the brain of the worm C elegans works, which has around three hundred neurons. C elegans was the first living unit to have its gene sequenced. Now consider that the human brain has about one hundred billion neurons. and that going from 300 to 301 neurons may double the complexity. [I have actually found situations where a single additional dimension may more than double some aspect of the complexity, say going from a 1000 to 1001 may cause complexity to be multiplied by a billion times.] So use of never here is appropriate. And if you also want to understand why, in spite of the trumpeted “advances” in sequencing the DNA, we are largely unable to get information except in small isolated pockets of some diseases.
Understanding the genetic make-up of a unit will never allow us to understand the behavior of the unit itself.
A reminder that what I am writing here isn’t an opinion. It is a straightforward mathematical property.
I cannot resist this:
Much of the local research in experimental biology, in spite of its seemingly “scientific” and evidentiary attributes fail a simple test of mathematical rigor.
This means we need to be careful of what conclusions we can and cannot make about what we see, no matter how locally robust it seems. It is impossible, because of the curse of dimensionality, to produce information about a complex system from the reduction of conventional experimental methods in science. Impossible.
My colleague Bar Yam has applied the failure of mean-field to evolutionary theory of the selfish-gene narrative trumpeted by such aggressive journalists as Richard Dawkins and Steven Pinker and other naive celebrities with more mastery of English than probability theory. He shows that local properties fail, for simple geographical reasons, hence if there is such a thing as a selfish gene, it may not be the one they are talking about. We have addressed the flaws of “selfishness” of a gene as shown mathematically by Nowak and his colleagues.
Hayek, who had a deep understanding of the properties of complex systems, promoted the idea of “scientism” to debunk statements that are nonsense dressed up as science, used by its practitioners to get power, money, friends, decorations, invitations to dinner with the Norwegian minister of culture, use of the VIP transit lounge at Kazan Airport, and similar perks. It is easier to take a faker seriously, since science doesn’t look neat and cosmetically appealing. So with the growth of science, we will see a rise of scientism, and my general heuristics are as follows: 1) look for the presence of simple nonlinearity, hence Jensen’s Inequality. If there is such nonlinearity, then call Yaneer Bar Yam at the New England Complex Systems Institute for a friendly conversation about the solidity of the results ; 2) If the paper writers use anything that remotely looks like a “regression” and “p-values”, ignore the quantitative results.
Nassim, along with his five colleagues in the Real World Risk Institute, offers a Qualitative Mini Certificate in Risk (real world risk, not risk management) for risk professionals and analysts interested in how what they know applies to the real word, professional risk takers (with some basic familiarity with technical language) willing to gain perspective and understand how to use the research without falling into model error, and other executives/decision makers… literally any risk taker with some technical understanding. The next intense one-week workshop will take place from June 6th to the 10th at the Princeton Club in New York City.
A new Quantitative Mini Certificate in Risk, “the only quantitative program embedded in the real world,” will be offered from August 15th to the 19th in Stony Brook, New York. This program is designed for professional quantitative risk takers and managers interested in depth and links to reality.