On Rebel Theorem 2.0

Jared Heyman
7 min readSep 15, 2022

A couple of years ago I published On Rebel Theorem to reveal the proprietary machine-learning algorithm we use at Rebel Fund to maximize our odds of success investing in tomorrow’s top Y Combinator startups.

We’ve recently finished a complete overhaul of the algorithm and used it to influence our Y Combinator Summer 2022 batch investments with great success, so the goal of this post is to share our learnings from developing Rebel Theorem 2.0 with the world.

This new algorithm was trained with over 100k data points sampled from thousands of YC startups from the past 15+ years. Our sample included 250+ top startups valued at over $150M each and 60+ unicorns valued at over $1B each. We compared these top-performing YC startups to their peers to understand which characteristics are predictive of startup valuation growth and top portfolio returns for seed-stage investors like Rebel.

As mentioned in my previous post, we’re not a quant fund by any stretch, but we do rely on Rebel Theorem 2.0 algorithm scores to help us prioritize startups for further diligence, alongside other “signals” like top existing investors, outcomes of previous startups founded by the same team, funding history, and more.

Background

Before digging into our findings, I’ll first explain how Rebel Theorem 2.0 was developed.

For each YC startup in our training database, we collected dozens of data points about the company and its founders. We looked at everything from the company’s sector and geography, to where its founders went to school and worked in the past, to how many words are in its company description or characters in its website URL. Just about any quantifiable characteristic we could find on these startups was fair game. We then divided the companies into 4 outcome categories depending on their latest valuation:

  • <$150M valuation
  • $150M-$499M valuation
  • $500M-$1B valuation
  • over $1B valuation

Our data scientist then ran multiple machine learning models to identify which startup characteristics were consistently predictive of strong valuation growth (i.e., signal) and which were not (i.e., noise). After looking at hundreds of contenders, he narrowed the field to a few dozen startup “features” that are the most statistically significant predictors of YC startup success.

Since the top 5% of Y Combinator startups essentially drive all investor returns, knowing which characteristics these top startups tend to exhibit at the seed-stage can make a massive difference in an early-stage investor’s portfolio performance.

In backtesting, we found that the top YC startups according to our algorithm ended up achieving significantly higher valuations than their peers. At Rebel, we typically target top ~20% startups for further diligence, which achieve a nearly 3x higher valuation than their YC peers on average. Importantly, avoiding the lowest-scoring startups in each batch also helps us reduce portfolio losses.

Our algorithm not only scores and ranks startups, but also startup founders since we believe that a company’s leadership is its destiny. The chart below shows the scores distribution that Rebel Theorem 2.0 assigned to YC founders in our training dataset.

Findings

While I can’t publish the full inner workings of our algorithm, I’m glad to share some of the startup features that we found to be most predictive of startup success.

I’d first like to caution that our algorithm is backwards-looking. In other words, we can only see which characteristics have been predictive of startup success in the past. Though we expect many of these features to remain persistently predictive in the future, others may not be, as the world’s technology and economic environment is constantly evolving.

Our goal was to identify features with a >80% chance of predicting whether a startup will have a >10x outcome. It turns out that only ~34 features were necessary to drive model performance out of hundreds tested.

Geography

Where a startup is located seems to make a huge difference in terms of its likely outcome.

According to our findings, it’s no coincidence that 59% of the startups on YC’s latest top companies list are headquartered in the San Francisco Bay Area (even if Silicon Valley’s dominance as a startup hub is slowly eroding).

Our algorithm gives extra points to startups headquartered in North America and particularly the SF Bay Area. Interestingly though, we found that a Bay area headquarters outside of San Francisco proper isn’t as good — sorry, Oakland!

Other strong geographies for startup performance are Southeast Asia and Northern Europe — however only if their target market is global.

Industry

The technology subsector that a startup operates in also has major implications for its eventual outcome, though we don’t expect this characteristic to be as persistent since dominant startup industries tend to shift over time.

I’ve said in several previous posts that financial technology (i.e. fintech) is a sector to watch, and our data proves this out with fintech being the single most predictive industry in terms of startup outcomes. Other historically strong industries for YC startups are supply chain & logistics, security, infrastructure, human resources, and engineering, product & design.

Interestingly, we sometimes see certain industries being difficult for startups but with certain subindusries within them more conducive to success. For example, we found the healthcare industry negatively correlated with startup success overall, yet healthcare services strongly correlated with success. The same goes for consumer (negatively correlated with success) and consumer retail (strongly correlated with success). The implication for YC founders is that even within a tough industry, there are certain niches you can do well in.

Co-founders

As a fund, we place more emphasis on a company’s co-founders than just about anything else. So, we collect a huge amount of data on founders’ backgrounds to feed into our algorithm.

The first finding I’ll share probably isn’t shocking — founders who have taken a startup all the way to acquisition in the past are much more likely to build a successful startup in the future.

However, we found repeat YC founders to be negatively correlated with success. We suspect there could be some selection bias at play here (i.e., founders who go back to YC perhaps didn’t have a huge amount of success with their first YC startup). There are some notable exceptions to this, like my YC batchmate Parker Conrad who founded the unicorns Zenefits (W13) and then Rippling (W17).

We also found some truth to the old saying “many hands make light work” — the number of co-founders a startup has is positively correlated with success.

Co-founder education

Despite all the media attention of college dropouts like Steve Jobs and Mark Zuckerberg who went on to start successful tech companies, we found that founders who finish their undergraduate degrees are more likely to have a successful startup. Mothers rejoice!

Not all universities and degrees are created equal though. YC has long held a bias towards technical founders, and with good reason — the word “computer” in a founder’s major is positively correlated with startup success. However, and this one surprised us, the word “engineering” in a founder’s major is negatively correlated with success (we suspect because many types of engineering degrees aren’t that helpful for writing software).

Silicon Valley’s cultural disdain for “suits’’ is also well-founded… we discovered that founders with Masters of Business Administration (MBA) degrees are less likely to be successful than their non-MBA peers. But the worst type of advanced degree for founders is a law degree — founders with JDs do even worse than MBAs :-)

In case you’re wondering how much higher education is ideal for a founder, the answer is “not too much.” While graduating from college is good, multiple advanced degrees is a negative signal for founders — I think because great founders have a strong bias towards action.

Random stuff

On our quest to leave no stone unturned, we looked at some seemingly random startup characteristics to see if they’re predictive of success in a statistically reliable way. Here are some of my favorites:

  • Company domain name — Top-level domains ending in .com, .co, and .io are good, though the numbers of characters in the domain name doesn’t seem to matter.
  • Company description — The less words the better, and this was actually a quite strong predictor! Perhaps either because great founders are more succinct communicators, or great startup ideas are just simpler.
  • LinkedIn mutuals — The number of mutual connections a founder has with me is positively correlated with success. This one may have the causation arrow pointed in the wrong direction though — startup success itself may cause founders to build more Silicon Valley connections.

I could go on and on, but I think this should give you a decent understanding of how our Rebel Theorem 2.0 algorithm works and what startup characteristics any seed-stage investor should keep their eye on. I hope you’ll join Rebel and our venture investing peers in selecting and supporting tomorrow’s YC startup unicorns!

Thanks to our fantastic data scientist Justin Hilliard for reading and editing drafts of this post

--

--

Jared Heyman

Tech guy and investor. Founder at Rebel Fund and previously Pioneer Fund, CrowdMed (YC W13), Infosurv & Intengo (acq. LON: NFC). Ex-Bain consultant. Data nerd.