At Rebel Fund, one of our ‘unfair advantages’ when it comes to investing in seed-stage Y Combinator startups is a proprietary machine learning algorithm we developed called Rebel Theorem.

Developed for Rebel by the brilliant MIT-trained computer scientist Ajay Saini, this algorithm uses machine learning techniques to score and screen startups based on their exit probability and forecasted ROI. It’s adapted from a predecessor machine learning model that Ajay developed as his graduate school thesis and ultimately published in the paper “Picking Winners: A Framework for Venture Capital Investment” along with two MIT co-authors.

Since data-driven investing is a hot topic these days, I’d like to share some details on how Rebel Theorem works and the results that it has acheived.

First off, since Rebel invests in startups at a very early stage and is often the first institutional investor into startups after Y Combinator itself, many of the companies we invest in don’t yet have much in terms of operational metrics or ‘signal’ from other investors. Therefore, most of the data we have available as Rebel Theorem inputs are founder-related metrics, such as:

  • Previous companies founded or worked at
  • Financial outcomes of previous companies
  • University attended and/or graduated
  • Degree type, major and completion year
  • Etc.

This suits us just fine, because most investment decisions made at this stage of a company’s lifecycle are “team bets” anyway. The key part for us is collecting and compiling thousands of data points on the ~200 startups in each semi-annual Y Combinator batch and processing them in time to inform our investment decisions, typically before YC Demo Day.

While Rebel isn’t a quant fund by any stretch, we do rely on Rebel Theorem scores to help us prioritize startups for further diligence, since we’re only able to meet face-to-face with 15-20% of the companies in each YC batch. However, our investment decisions always come down to what we learn about a company and its founding team when we meet with them ourselves.

We’re also careful to blend together Rebel Theorem scores with other information about each company in our screening process, partially because algorithm scores can never capture subjective factors like whether we think the company has a good idea, and partially to avoid perpetuating existing Silicon Valley biases (particularly around founder gender and ethnicity) that have worked their way into historical startup outcomes.

The Rebel Theorem algorithm was trained with 10+ years of outcomes data on 1500+ Y Combinator startups, dating back to the first YC batch ever in 2005. Its key endpoints are whether a company achieved an exit for investors (i.e., an acquisition or IPO) and the return on investment (i.e., ROI) that seed-stage investors in these startups enjoyed.

Interestingly, there are several factors that we found to be highly predictive of startup success. While of course I can’t share the full list of ingredients for our secret sauce in this post, I’m happy to share a few of the factors that we found to predictive of a high investor ROI:

  • Founder(s) previously started a company that was acquired or had an IPO
  • Founder(s) was an of a company that was acquired or had an IPO
  • Founder(s) has attended a top university
  • Founder(s) earned a Bachelor degree (interestingly, Masters and PhD degrees are correlated with startup success)

We also found that certain of factors are negatively or positively correlated with startup success. For example, if the founders went to different universities from one another or earned different degrees, their startup is more likely to be successful than if they all had a similar education.

In our backtesting of the Rebel Theorem algorithm, we found that applying data science and machine learning to startup investment decisions can produce some pretty spectacular results:

YC startups that the algorithm ranked in the top-quartile of their respective YC batch achieved an estimated 40x gross ROI on average for seed-stage investors¹, or 263% higher than the overall YC average. Like all things in venture, these ROI statistics are driven by a few spectacularly high-performing companies in each YC batch, but the algorithm tends to identify them².

While data science will never replace the human intuition and judgement of an experienced investor, we believe strongly that it can put the odds of success in his or her favor.

Techie and investor. Founder at Rebel Fund and previously Pioneer Fund. Chairman of Infosurv/Intengo and CrowdMed (YC W13). Former Bain consultant. Data nerd.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store