Last week, Morningstar published a paper by Paul Kaplan and Maciej Kowara, titled How Long Can a Good Fund Underperform its Benchmark?
At the start, the authors explain the aim of the study.
Most relative performance metrics such as alpha, beta, and information ratio take the period of analysis as given, typically 3 or 5 years. To our knowledge, there has been no systematic analysis of how long a period of underperformance an investor may have to bear while waiting for a fund to ultimately outperform its benchmark. Put differently, given that a manager is skilled and has a good chance of beating the benchmark over a set time period, over how long a stretch can that manager be reasonably expected to underperform within that period?
Conversely, there has been no analysis of how long a period of outperformance a fund might enjoy before ultimately underperforming. The purpose of this study is to fill that gap.
We do so by introducing two new performance-related measures: Longest Underperformance Period (LUP) and Longest Outperformance Period (LOP). LUP is the longest subperiod of underperformance within a given period of outperformance, and LOP is the longest subperiod of outperformance within a given period of underperformance. These measures were investigated empirically for a global set of active funds over the 15-year period starting January 2003 and ending December 2017.
Note that LUP and LOP are in units of time and do not measure the magnitude of under- or outperformance, nor do they measure probabilities. However, we estimate their probability distributions with Monte Carlo simulation.
John Rekenthaler, a member of Morningstar's investment research department, wrote a brilliant post on the study which appeared on Morningstar.com. It has been reproduced below.
Paul and Maciej examined mutual fund slumps from two perspectives.
1) Existing fund histories, from the United States, Canada, Europe, and developed Asian markets (excluding Japan and Australia, owing to data availability.)
2) Run simulations on hypothetical funds, run by hypothetically skilled managers. (Get two Ph.D.s working on a project, and simulations are inevitable.)
Real-World Results
Their paper begins with the fund histories. The authors calculated gross returns over the 15 years from 2003 through 2017 by adding each fund’s expense ratio back to its official results. Gross returns matter not to investors, who can’t spend what they don’t receive, but it is the correct measure for evaluating manager skill.
Of the 5,500 equity funds that qualified for the study, two thirds had higher returns over that 15-year period than did their costless benchmarks. That is an impressive showing, but it is affected by survivorship bias, as several thousand funds that existed in 2003 disappeared before 2017 concluded. The true winning percentage was probably close to 50%.
Still, that makes for almost 4,000 funds that beat their relevant indexes over a 15-year stretch. Naturally, they did not do so the whole time. At times, every winner was a loser, raising the question: What was the lengthiest period in which these successful funds were unsuccessful? Specifically, what was the longest time that the fund’s gross returns trailed that of its index?
Dry Spells
Oh, boy. The median length of the longest underperformance period, or LUP, was … a decade! (Technically, one group of funds had a median of 8.5 years and another of 11 years, the details of which are immaterial for the purposes of this column.) Consultants place funds on watchlists if they lag their indexes (or most of their competitors) for three years, and Morningstar assigns its initial star rating after that same span. Yet the typical 15-year winner suffered a 10-year dry spell.
At this point, you may be wondering about the math. How can a fund trail for 10 years out of 15, yet finish ahead? The answer is that during its LUP, the fund barely lags its index. This occurs by construction; if one more month could be added to the LUP, then it would cease to be the LUP! Thus, the LUP measures the time over which a fund’s gross returns almost, but not quite, match those of its benchmark.
That most successful funds’ LUPs are so prolonged suggests that most of them didn’t beat the index by very much during those 15 years. Neither did most of the unsuccessful funds trail by much. Just as the winners suffered through times when they looked like losers, so too did the losers often appear to be winners. In fact, on average, their longest outperformance periods, or LOPs, were even longer than the slumps for the index-beating funds. On average, funds that trailed the indexes for the full 15 years, had 11- to 12-year stretches of outperformance.
Those are daunting statistics. How can one distinguish between skill and luck when the stronger funds can slump for a decade, and the weaker ones can thrive for that long or longer? Perhaps the task is futile. Perhaps that 15-year measurement period is arbitrary, so that its list of good and bad funds is merely an accident of the time period. Perhaps that list would look very different if the authors had evaluated the funds across 20 years.
The Simulations
Which leads us to the second part of Paul and Maciej’s paper: the hypothetical funds, run by hypothetical managers. They might only be figments of a computer program’s imagination, but they are skilled figments. On 75% of occasions, the gross returns for their funds beat the benchmark indexes over a 15-year trial. Their superiority is built into the simulation.
And … pfft. The simulated managers also have prolonged LUPs. On average, in fact, their LUPs are even longer than those of actual fund managers. When evaluating their results, Paul and Maciej found that a logical investor who had no other information about the hypothetical fund than its LUP, and the fact that its LUP was the same as the average LUP from a skilled manager, would conclude on 45% of occasions that these incontestably skilled managers had skill. On 32% of simulations they would perceive no skill, and 23% of the time they would decide that the manager had negative skill.
One of four simulations--a simulated manager that was programmed to be skilled--would have a LUP that matched that of the typical bad manager! On the bright side, Paul & Maciej did conclude that over a 100-year simulation, one could generally tell the difference between their managers who were programmed to be strong, and those who were programmed to be weak.
I will leave the task to you, dear reader, to determine the disadvantages of investing with a 100-year time horizon. I believe that there are some.
Feel free to access the paper here: How Long Can a Good Fund Underperform its Benchmark?