Factor Investing: From CAPM to the Factor Zoo

The Fama-French factor models did something unusual for academic finance: they became genuinely useful to practitioners. When I set out to build factor portfolios, the models gave me a concrete framework — decompose returns into systematic risk premia, measure exposures, construct portfolios that isolate specific sources of return. This series documents that process: the theoretical foundation I started from, the data I used, and what I found when I implemented and tested it myself.

The Capital Asset Pricing Model

Any factor discussion starts with the Capital Asset Pricing Model. The CAPM asserts that assets are priced based on their risk relative to the overall market:

$R_{i,t} = R_{f} + \beta_{i} \times (R_{M} - R_{f})$

This equation states that the return on any investment $i$ at time $t$ is equal to the risk-free rate of return $R_{f}$ plus a coefficient $\beta$ that measures the investment’s level of risk relative to the market, scaled by the market’s excess returns above the risk-free rate $(R_{M} - R_{f})$ .

The CAPM has theoretical appeal but faces empirical challenges. The $R^2$ of the regression is often quite low, indicating limited explanatory power in practice. The model also implies that all assets can be replicated by some multiple of the market return — suggesting it’s impossible for investors to outperform the market on a risk-adjusted basis, an assumption not well supported by evidence.

Despite these limitations, Sharpe shared the 1990 economics Nobel for his contributions to the CAPM. It remains a foundational building block, even if the building has since grown several more floors.

The Fama-French 3 and 5 Factor Models

Fama and French extended the CAPM into a multi-factor model. Where the CAPM uses a single factor (market excess return), their Three Factor model adds two more:

$R_{i,t} - R_{f,t} = \beta_{i} \times (R_{M,t} - R_{f,t}) + s_{i} \times SMB_{t} + h_{i} \times HML_{t} + \epsilon_{i,t}$

where:

$R_{i,t}$ is the return of asset $i$ at time $t$
$R_{f,t}$ is the risk-free rate at time $t$
$\beta_{i}$ is the sensitivity of asset $i$ to the market factor
$R_{M,t}$ is the return on the market portfolio at time $t$
$SMB_{t}$ is the return difference between small and large firms at time $t$
$s_{i}$ is the sensitivity of asset $i$ to the SMB factor
$HML_{t}$ is the return difference between high and low book-to-market equity firms at time $t$
$h_{i}$ is the sensitivity of asset $i$ to the HML factor
$\epsilon_{i,t}$ is the idiosyncratic error term

The model asserts that the return on any investment is composed of some coefficient of the market’s excess returns, plus some coefficient of the “size premium” and “value premium,” plus noise.

The Size Premium — smaller companies tend to outperform larger ones over the long term. The rationale is straightforward: smaller companies carry more risk (less liquidity, less analyst coverage, more volatile earnings), and investors demand compensation for that. It’s a long-run statistical regularity, not a guarantee — and it can underperform for extended periods, as I’ll show in the next post.

The Value Premium — companies trading at low price-to-book ratios tend to outperform expensive ones. One explanation is that the market systematically overestimates growth for glamour stocks and underestimates recovery potential for beaten-down ones. This is the anomaly that Fama and French formalised in the three-factor model.

The model was later extended to five factors by adding:

Investment (CMA): the premium for owning stocks of companies with conservative asset growth over those investing aggressively
Profitability (RMW): the premium for owning more profitable firms versus less profitable firms

Fama shared the 2013 Nobel Prize in Economic Sciences for his empirical analysis of asset prices, of which the factor models are a central contribution.

The “Factor Zoo”

The Fama-French framework has been instrumental in advancing asset pricing, but it also opened the floodgates. The proliferation of new factors has led to concerns about data mining and overfitting, where researchers identify relationships that don’t hold up out of sample.

While factors like Betting Against Beta, Low Volatility, and Momentum have been widely accepted and implemented in investment strategies, the sheer volume of factor research papers has created what’s colloquially known as the “factor zoo.” It has become as much a race to publish papers on novel factors for citation counts as it is an effort to advance our understanding of asset pricing. Harvey, Liu, and Zhu, in their 2016 Review of Financial Studies paper ‘…and the Cross-Section of Expected Returns,’ catalogued over 300 published factors — a number that has only grown since.

Legitimate research on new factors can provide valuable insights into financial markets. But investors should be cautious: a thorough understanding of the underlying economic rationale and robust out-of-sample testing are prerequisites before implementing any factor in a live portfolio.

How Factor Portfolios Work

Factors are designed to be market neutral. Consider the Value factor: it represents the difference in returns between holding cheap companies and expensive companies. To construct such a portfolio, we create two positions:

Long leg: stocks with favourable Value characteristics + everything else
Short leg: stocks with unfavourable Value characteristics + everything else

The “everything else” — the risk-free rate, the size factor, market return — cancels out:

$\text{Portfolio} = \text{Cheap stocks} - \text{Expensive stocks}$

In practice, if we go long 10 stocks and short 10 stocks, and size the positions appropriately, we achieve a delta-neutral position where any movement in the overall market is hedged. What remains is the pure factor return.

What’s Next?

The next post in this series conducts a review and backtesting of the traditional factor strategies using data from Ken French’s data library, with code you can run yourself.