On the Y Combinator Winter 2023 batch (written with GPT-4)

Jared Heyman
12 min readMay 17, 2023

Rebel Fund is one of the largest and most prolific investors in Y Combinator startups and builds the most comprehensive database that exists of new YC startups each year to feed our proprietary Rebel Theorem 2.0 machine learning algorithm.

Since we now collect and analyze hundreds of thousands of data points on YC startups each year, and after each new batch, I like to share some statistics with the broader technology community (see my previous post On the YC Summer 2022 batch).

Since artificial intelligence (AI) powered by a new generation of large language models (LLMs) is a huge secular trend right now in the technology landscape, it only seemed fitting to write our post on the latest YC Winter 2023 batch using OpenAI’s newly-released GPT-4 model.

All of the charts and text below were chosen and created by the GPT-4 Code Interpreter plug-in, with my prompting, utilizing a small sample of the hundreds of data points Rebel collects on each YC company and team.

This post will not only give you insights into the W23 batch, but also the incredible (even scary…) capabilities of GPT-4. I want to stress that it made every chart and wrote every single word of this post — I only did some light editing for length. Strange times ahead, my friends. Enjoy!

Company Blurbs

The “Company Blurb” column contains short descriptions of each company. To analyze this text data, we can use Natural Language Processing (NLP) techniques such as word frequency analysis or topic modeling.

The word cloud visually represents the most common words in the companies’ blurbs, with the size of each word indicating its frequency. The bar chart lists the top 20 most common words along with their frequencies.
The word “data” appears most frequently, indicating that a significant number of companies are likely involved in data-related services or products. This aligns with trends in the tech industry where data-driven decision-making and data-based products and services are highly valued.

Other common words include “ai” (artificial intelligence), “platform”, “time”, “use”, and “companies”. These words suggest that many companies are building AI-based platforms, likely offering services to other companies, and focusing on usability and time-related aspects (e.g., saving time, real-time capabilities).

The Latent Dirichlet Allocation (LDA) model has identified five topics in the company blurbs. For each topic, we’ve listed the top ten most characteristic words. Please note that the topics are numbered from 0 to 4.

Topic 0: Seems to be about AI and data usage, with a focus on teams, financial aspects, credit, and automating tasks.
Topic 1: Appears to be centered around AI and data again, with an emphasis on business use cases, software creation, and team collaboration.
Topic 2: Highlights the use of AI and data on a platform that assists developers, with an emphasis on machine learning (ML) and customer service.
Topic 3: Features content creation and AI, with a focus on data usage, platform-based solutions, and real-time capabilities.
Topic 4: Primarily about sales and data management on a platform designed to aid businesses, with an emphasis on team coordination.

Geography Analysis

North America is the dominant region, housing a substantial proportion of the companies. This isn’t surprising as North America, specifically the United States, has traditionally been a hub for startups and tech companies. The high concentration of venture capital, world-renowned universities, and tech giants in this region have contributed to a thriving startup ecosystem.

The second most represented region is South Asia, indicating the rising prominence of this region in the global startup landscape. Countries in South Asia, notably India, have seen a surge in entrepreneurship and startup culture in recent years, bolstered by an increasing number of tech-savvy youth, government initiatives, and growing investor interest.

Other regions like East Asia, Europe, Middle East & North Africa, and Latin America also have a presence, albeit smaller. This distribution highlights the global nature of entrepreneurship, with startups sprouting in various parts of the world.

The United States (US) is by far the most represented subregion, which aligns with the trend observed in the YC region data. The US is known for its strong entrepreneurial culture, advanced infrastructure, and access to a vast network of investors, making it an attractive place for startups.

India is the second most represented subregion, signifying the growing importance of this country in the global startup scene. India has experienced a startup boom in recent years due to factors such as a large young population, growing digital infrastructure, and supportive government policies.

Other subregions like Canada, United Kingdom, and Singapore also have a presence in the dataset. This shows that entrepreneurship and startup culture are spreading across different parts of the world.

San Francisco, which is part of the Silicon Valley region in the United States, has the highest number of companies. This is consistent with its global reputation as a leading hub for tech startups and venture capital. The presence of numerous tech giants, prestigious universities, and a culture of innovation contribute to its vibrant startup scene.

New York, another major hub for startups in the United States, is also significantly represented. The city offers a diverse industry landscape and access to a vast network of investors, attracting many entrepreneurs.

The chart also shows a diverse range of other cities worldwide, indicating the global nature of startups and entrepreneurship.

Industry Analysis

The chart reveals a substantial dominance of B2B (Business-to-Business) Software & Services. This sector is known for providing software and services to help businesses operate more efficiently or automate internal functions. Its strong representation indicates a prevalent trend of digitization and automation in business operations.

Consumer Goods & Services represents the second most common industry. These companies are typically focused on producing goods and services for the general public or end consumers.

Education, Health & Wellness, Financial Services & Insurance, and Real Estate sectors are also represented, although to a lesser extent. This distribution signifies the diversity of sectors in which startups operate.

The most common tag is “B2B,” which stands for Business-to-Business. This indicates that a large number of startups in this dataset are focused on providing products or services to other businesses rather than directly to consumers. This aligns with the earlier observation that B2B Software & Services is the most common industry among these companies.

“SaaS” (Software as a Service) is another common tag. SaaS companies provide software products on a subscription basis and typically deliver them over the internet. The prominence of this tag suggests that many startups are adopting the SaaS business model, which can provide predictable revenue streams and scalability.

“Artificial Intelligence” is also a frequently occurring tag, reflecting the growing impact of AI in various sectors. Other common tags like “Analytics,” “API,” and “Data” suggest a strong focus on leveraging data and technology to drive business value.

Previous Founder Acquisitions & Funding

The bar chart displays the distribution of the number of acquisitions by the founders of the companies in the dataset.

Most of the founders have not had a previous company that was acquired, as indicated by the high count for zero acquisitions. This suggests that for many founders, the current company may be their first venture, or they haven’t had a previous company that reached an acquisition exit.

A smaller number of founders have had one or more of their previous companies acquired. These founders bring valuable experience from successful exits, which may benefit their current venture.

However, it’s worth noting that the lack of prior acquisitions does not necessarily reflect on a founder’s abilities or potential for success. There are various paths to success in entrepreneurship, and different founders may have different goals for their companies.

The histogram illustrates the distribution of the total funding raised by the founders in their roles as co-founders of previous companies.

The chart shows that a large number of founders have not raised significant funding in their previous ventures as co-founders, as indicated by the high count near the zero mark. This suggests that for many of these founders, raising substantial venture funding might be a new experience.

However, there are also founders who have raised significant amounts in the past. These founders bring valuable fundraising experience to their current ventures, which can be beneficial when seeking capital.

The descriptive statistics provide further insights:

  • The mean (average) amount raised is approximately $6.04 million, but this figure is heavily skewed by a few founders who have raised exceptionally high amounts.
  • The standard deviation is large, indicating a wide spread in the amounts raised.
  • The median (50th percentile) is zero, confirming that a large number of founders have not raised substantial funding in their previous roles as co-founders.
  • The maximum amount raised is approximately $322.65 million, showing that at least one founder in the dataset has had a very successful fundraising experience in the past.

Valuations & Round Sizes

The histogram illustrates the distribution of the Investment SAFE Cap, which represents the company’s valuation, in the dataset.

The chart shows a wide range of valuations, with a considerable number of companies valued around $20 million. This suggests that many of these companies are at a stage where they have demonstrated significant potential or growth, leading to higher valuations.

The descriptive statistics provide further insights:

  • The mean (average) valuation is approximately $19.42 million. This figure is influenced by companies with high valuations.
  • The standard deviation is $5.97 million, indicating a wide spread in the valuations.
  • The median (50th percentile) is $20 million, suggesting that half of the companies have a valuation less than this amount.
  • The maximum valuation is $40 million, showing that at least one company in the dataset has achieved a high valuation.

It’s important to note that valuations can be influenced by many factors, including the company’s growth prospects, market conditions, competitive landscape, and negotiation dynamics during fundraising.

The chart shows that most companies have raised amounts in the lower range, with a peak around $1.5 million to $2 million. This suggests that a large number of these companies are probably in their early stages, where funding rounds tend to be smaller.

However, there are also companies that have raised larger amounts in their current round, indicating they might be at a later stage or have significant capital requirements.

The descriptive statistics provide further insights:

  • The mean (average) round size is approximately $1.93 million, but this figure is influenced by the companies that have raised larger amounts.
  • The standard deviation is $702,412, indicating a wide spread in the round sizes.
  • The median (50th percentile) is $1.69 million, suggesting that half of the companies have a current round size less than this amount.
  • The maximum round size is $4 million, showing that at least one company in the dataset has had a significant fundraising round.

In conclusion, the round sizes vary considerably, reflecting the diverse stages and capital needs of the companies.

Team Size

The most common team size is 2, as indicated by the highest bar in the chart. This suggests that many startups in this dataset have been founded by pairs of co-founders. This aligns with a common pattern in the startup world, where two co-founders often bring complementary skills and perspectives.

Teams of 3 are also relatively common. Larger team sizes (4 and above) are less frequent, suggesting that startups typically start with a small core team.

Very few companies have team sizes of 10 or more. Larger teams could indicate that the company is at a later stage or operates in a sector that requires a larger workforce.

It’s important to note that team size can be influenced by various factors, such as the stage of the company, the nature of the business, and the availability of resources. While having a larger team can bring more skills and resources, it also comes with challenges such as increased communication overhead and higher burn rate.

Founder Education & Work Experience

The pie chart illustrates the distribution of the founders’ advanced degrees across all companies in the dataset, taking into account that each company may have multiple founders with different degrees.

A significant proportion of degrees are non-MBA master’s degrees, as represented by the largest slice of the pie. This indicates that a considerable number of founders have pursued advanced education in specialized fields other than business administration.

The second most common category is MBA degrees. This suggests that a notable number of founders have a broad education in various aspects of business, such as marketing, management, and finance.

PhD degrees make up the third largest slice of the pie. A PhD degree suggests a high level of expertise in a specific field and indicates that a substantial number of founders have engaged in significant research work.

The “Other” category includes all other types of degrees, such as JD, MD, and multiple non-MBA master’s degrees. These degrees suggest a high level of expertise in their respective fields and a breadth of knowledge across different disciplines.

The first bar chart displays the distribution of the average ranks of the founders’ top schools. The rank indicates how predictive the school is of success as a YC startup founder, according to Rebel Fund’s analysis.

The chart reveals a wide range of school ranks among the founders. A significant number of founders have attended schools ranked 1, indicating that these schools have a strong track record of producing successful YC startup founders. Other notable ranks where a considerable number of founders have attended schools are 4, 6, 2, and 14.

However, the distribution is quite dispersed, with founders having attended schools across various ranks. This suggests that while attending a top-ranked school may provide certain advantages, such as a strong network or prestige, it’s not the sole determinant of success as a YC startup founder.

The second bar chart displays the distribution of the average ranks of the founders’ top employers. Similar to the school ranks, the employer rank indicates how predictive the employer is of success as a YC startup founder.

The chart shows that a significant number of founders have worked at employers ranked 9, 5, 6, and 1. This suggests that these employers have a history of employees who went on to become successful YC startup founders.

However, as with school ranks, the distribution is quite dispersed, with founders having worked at employers across various ranks. This indicates that while experience at a top-ranked employer can be beneficial, it’s not the only factor influencing success as a YC startup founder.

Conclusion

In conclusion, the analysis of the Y Combinator Winter 2023 batch provides several valuable insights into the evolving startup landscape. We see a strong emphasis on AI and data-driven solutions across diverse sectors, with many startups leveraging cutting-edge technologies to create innovative products and services. The trend of B2B Software & Services and SaaS business models reflects the ongoing digital transformation in business operations and the increasing demand for scalable, subscription-based software products.

Geographically, North America, particularly the United States, continues to be a major hub for startups. However, the presence of companies from various regions worldwide, notably South Asia, underscores the global nature of entrepreneurship.

The founders of these companies come from diverse educational backgrounds, with a significant number holding advanced degrees in specialized fields. They have attended a wide range of schools and worked at various employers, highlighting that there are many paths to becoming a successful startup founder.

Most of the companies have small core teams, which is typical for startups in their early stages. The size of the funding rounds and the valuations of these companies are also quite diverse, reflecting the different stages and capital needs of these startups.

It’s important to remember that this analysis only provides a snapshot of the YC Winter 2023 batch based on the data points we’ve explored. The startup journey is complex and multifaceted, and success can come in many forms. As always, behind each data point is a team of dedicated founders who are working hard to bring their visions to life. We look forward to seeing how these companies will grow and evolve in the future.

--

--

Jared Heyman

Tech guy and investor. Founder at Rebel Fund and previously Pioneer Fund, CrowdMed (YC W13), Infosurv & Intengo (acq. LON: NFC). Ex-Bain consultant. Data nerd.