Sitemap

On Rebel Theorem 4.0

9 min readJun 19, 2025

--

Last year I shared a post On Rebel Theorem 3.0, which was the world’s most advanced machine-learning (ML) algorithm at the time for predicting Y Combinator startup success. Now a year later, Rebel Fund has released our latest Rebel Theorem 4.0 model, and I’m excited to share how the model works and performs, and what we learned developing it.

In case you’re new here, Rebel is one of the largest investors in the Y Combinator startup ecosystem, now with 250+ YC portfolio companies valued collectively in the tens of billions of dollars. As an extremely data-driven fund, we’ve built the world’s most comprehensive dataset on YC startups and founders outside of YC itself, now encompassing millions of data points across every YC company in history.

We’ve invested millions of dollars into collecting this data and training our internal ML (and now AI) algorithms, which give us a major edge in identifying which YC startups in each batch are most likely to become tomorrow’s unicorns. It’s also helped inspire the dozens of blog posts I’ve published on YC startup trends, outcomes, and optimal investor strategies.

Outcomes

I’ll start by explaining the outcomes our new Rebel Theorem 4.0 model is designed to predict.

As I explained in my popular post On the power law of Y Combinator startups, the top ~6% of YC startups that become $1B+ companies drive the overwhelming majority of early investor returns. So, the obvious thing would be to predict startup valuation growth directly, but because the power law driving YC startup outcomes is so steep, the algorithm would only try to predict the extreme outliers (e.g. DoorDash, Airbnb, Stripe, Coinbase) which would add too much ‘noise’ to be useful.

So instead, as with Rebel Theorem 3.0, we bucketed YC startups into three broad categories for model training purposes, which we can predict accurately and reliably:

“Success” — $60M+ valuation and operating or exited

“Zombie” — Under $60M valuation and still operating

“Dead” — No longer operating nor exited

For batches before 2019, we’ve observed that 1 in 4 companies that achieved $60M+ valuations eventually became unicorns. So, if we can enrich our portfolio with more of these ‘Success’ companies, we’ll end up with a disproportionate share of unicorns in the end, the true driver of portfolio returns.

Performance

Now let’s get to the exciting part — how well our algorithm performs at predicting YC startup success.

The chart below shows the actual outcomes of startups with top-decile Rebel Theorem 4.0 scores in backtesting. As you’ll see, nearly 70% of startups predicted to be a Success actually were one — about 2.5x better than YC averages. These startups were only half as likely to end up Dead as their peers, and only one-third as likely to end up Zombies.

This is performance is astonishing if you know how difficult it is to predict startup success at the seed stage. You’ll also see that our v4 algorithm performed much better than our year-old v3 algorithm, thanks to more and higher quality training data.

If you’re an investor, you’re probably wondering how this outperformance vs YC averages might translate into financial returns. We backtested that as well:

It’s a busy chart, here are the key findings:

  • Average YC Demo Day investor returns are estimated around a 15–18% gross IRR for mature vintages — already better than an S&P 500 index
  • Y Combinator itself kills it — due to its super-low entry valuation, we estimate YC achieved around a 45–55% gross IRR¹ for mature vintages, placing it amongst the highest performing venture funds of all time
  • Rebel Theorem 4.0 really kills it — had we invested in the top 10% of YC startups per the algorithm, our portfolio would have achieved an estimated 65%+ gross IRR for these mature vintages, even better than YC itself

I know some of these IRR numbers sound fanciful, but in an asset class where a single big winner can return 1,000x or more, they’re entirely possible with great deal selection, access and portfolio strategy.

Top Features

One of the questions I’m asked most often is which characteristics (or ‘features’ in data science speak) are most predictive of YC startup success.

At Rebel, we looked at hundreds of them when training our models, and our latest Rebel Theorem 4.0 model factors 200+ features into its scores. We scoured the internet and various public and private databases for any quantifiable characteristic of YC startups and founders that may have predictive power, ranging from the obvious (previous founder successes, educational and work history, company location, etc) to the obscure (whether founders have last names associated with certain nationalities, if they wear glasses in their LinkedIn profile picture, who is their YC Group Partner, etc)

Here are the top 25 features that our model found most predictive of YC startup success:

To make it more readable, we grouped these top features into 4 color-coded categories:

  • Company — features related to the company itself (company location, age, sector, etc)
  • Education — features related to the founders’ educational history (where they went to school, how long they studied, how long ago they graduated, etc)
  • Employment — features related to the founders’ employment history (years of work experience, years of co-founder experience, cities worked in, etc)
  • Personality — features related to the founders’ personality traits, as inferred by a specialized AI model based on their online footprint (skepticism, dominance, risk aversion, etc)

You’ll notice the vast majority of these top features are related to the founders vs the company itself. This is in part because we simply know more about the founders than the company at the seed stage, and in part because founders really are the main driver of YC startup success. I often say that technically our algorithm is scoring companies, but really it’s scoring founders.

You also may notice that many of the most predictive features are obvious — things like how experienced the founders are (both as co-founders and overall), how old they are (years since they started college or their first job are proxies), or where they went to school (top school rank).

However, there is a long tail of highly predictive features that are less obvious, like certain personality traits, and even how long the founders have lived in San Francisco.

I should note that since this is a non-linear model, you shouldn’t assume that ‘more is better’ of any given feature. The top 25 list above is based on feature frequency in the 100+ decision trees our model uses to score companies, not how ‘good’ or ‘bad’ a given feature is.

For example, while years of founder work experience is the #1 most predictive feature, this could mean that more years are generally better, more years are generally worse, or more likely, that it depends on various other features in the decision tree (e.g., more years of work experience is better for healthcare founders, worse for AI founders, better for fintech founders who have achieved an exit in the past, etc)

Non-linear ML models like Rebel Theorem 4.0 are unfathomably complex and difficult to explain, but essentially the model has figured out which of the hundreds of features in its training data are most predictive of YC startup success, in which combinations, for which types or companies. For example, if it’s statistically true that younger, non-skeptical, highly-conscientious, top university educated founders, who worked at certain prior employers, and co-founded a certain number of companies in the past, tend to build the most successful B2B startups, but not B2C startups, the algorithm would ‘know’ that and score them accordingly.

This is the reason that a sophisticated ML model like Rebel Theorem can achieve investment returns far surpassing any mortal, and why we’ve invested so heavily into its underlying data collection, training and development — the algorithm literally has super-human capabilities².

Other Features

We want to give our algorithm the best chance possible of discovering non-obvious yet statistically-valid predictors of YC startup success, so we’ve been creative with the types of data we’ve included in its training set. Now that we’re armed with sophisticated AI platforms with computer vision, web search, and advanced reasoning, we can be more creative than ever.

I won’t divulge the full list of features Rebel Theorem 4.0 is trained on, but one of my favorite new additions is a series of features based on YC founders’ LinkedIn profile pictures. Here are few fun examples:

Is the person’s hair neatly and professionally styled?

True examples:

False examples:

Does this image appear to be a staged professional photo rather than a candid photo?

True examples:

False example:

Does this person have a clean-shaven appearance or facial hair?

True example:

False example:

Believe it not, some of these LinkedIn profile pic features are quite statistically predictive!

Our AI Future

We’ve started to incorporate even more sophisticated features into our model as well, which were only unlocked once advanced AI reasoning models were released like OpenAI’s new o3 model. These features allow us to quantify more subjective aspects of a startup, such as its product quality, the timeliness of its idea, competitive defensibility, founder-product fit, etc. that were until now only within the grasp of human experts.

In my post On why AI is coming for my job next, I shared the chart below illustrating how OpenAI’s o1 model (the predecessor to o3) made better predictions about the success of Rebel portfolio companies than our own internal partner ratings:

I published that post a couple of months ago and it’s already outdated — the new o3 model does even better, and when o4 or GPT-5 is released, I’m sure it will perform better still.

Our ‘traditional’ ML approach to predicting YC startup success has given us a massive advantage over other investors, and I expect our new advanced AI reasoning features to expand our advantage further. I’m convinced that our latest ML/AI models have already surpassed the capabilities of not only the average Silicon Valley VC, but the very best VCs — and as Sam Altman famously says “this is the dumbest these models will ever be”.

¹We estimate YC fund performance based on publicly announced and algorithmically estimated valuations of YC startups across these vintages. We have no special insight into actual YC fund performance.

²As a thought experiment, for a human investor to effectively compete with with a non-linear ML model like Rebel Theorem, she would have to read millions of unique facts on thousands of YC startups and founders, memorize them, hypothesize and test hundreds of fact patterns that predict which startups are most likely to succeed, calculate their statistical significance, validate them in backtesting, memorize those patterns, and apply them to new startups and founders she sees to accurately calculate their odds of success… in a matter of seconds. Good luck!

--

--

Jared Heyman
Jared Heyman

Written by Jared Heyman

Tech guy and investor. Founder of Rebel Fund and previously Pioneer Fund, CrowdMed (YC W13), Infosurv & Intengo (acq. LON: NFC). Ex-Bain consultant. Data nerd.

No responses yet