AI is Eating The World

Part 2 of “Generative AI is … Not Enough?”

Over a decade ago, the phrase “software is eating the world” described how software was rapidly becoming the center of many industries beyond the technology sector. The leading book retailers, video services providers, music companies, entertainment companies, and even movie production companies were essentially software companies.

That trend is still going strong.

It’s often useful to think of AI as an extension of software giving it new and improved capabilities. In that sense, the developments in AI are likely to accelerate the rate at which software proliferates. It’s also clear to see that it grants access to new capabilities that were not previously possible.

As new software capabilities make way for new products, it’s reasonable to ask: How does this change the value game? If the proliferation of software tipped the scales from Barnes & Noble to Amazon, and from Blockbuster to Netflix, then what will AI do in the marketplace? Is the value in the models? Is it in the data? Where are the moats in this new regime?

The first article in this series, What's the big deal with Generative AI? Is it the future or the present? (which contains points 1-4), discussed major points about useful perspectives to have about generative AI. In this article, we share observations on the value in the AI technology stack and focus on where some of the technical moats might be.

5) Maps and Landscapes of AI Technology and Value Stacks

By now there have been a number of generative AI landscape figures published by different analysts and investors. These are generally useful to understand the lay of the land of an up-and-coming industry and how different players compare to each other.

Generative AI Landscape plots from Antler, Sequoia Capital, and NfX that contextualize Generative AI startups and capabilities

Personally, I find more value in breaking down companies in a tech stack (e.g., application/infrastructure) as opposed to data modality (e.g., text/images). These stack plots differentiate between companies that sell directly to users (the application level) and the platforms they rely on. So, a natural place to start is an AI tech stack of these three layers:

The three layers of Application, Models, and Cloud Platform are a reasonable starting point for tech stacks of AI product

Splitting the model layer to differentiate between proprietary and open source models (and factoring in that Midjourney has no API that developers can use to build apps on top of it) is how you can arrive at the generative AI tech stack by a16z.

The Generative AI Tech stack plot from Who Owns the Generative AI Platform? provides more details in terms of different types of models and how they're served

It’s useful to add a couple more components to this graph.

The first is that models derive their value from the data they’re trained on. So, there’s a need to factor in data and machine learning operations (MLOps) as a layer supporting the models. See this Data and MLOps landscape plot for details on these two areas and their players.

The Models layer relies on Data and MLOps technologies that have their own emerging and evolving business models

This addition makes the landscape inclusive of companies like Scale, Surge, and Snorkel. The data layer is also where Shutterstock would live as a data provider for training DALL-E (and subsequently becoming also an application to distribute DALL-E-created images).

Don’t Forget About The Business Moats

While our figure now captures the major software pieces, it’s essential to consider business factors that can help differentiate or boost the adoption of a product beyond its mere software components. One great example is how Lensa AI's existing distribution base (and engaging influencers) helped usage explode, resulting in a reported $8M of revenue in December 2022. On the text side, Jasper’s growth engine managed to propel it to a reported 2022 revenue of $75 million. Writer points to its domain expertise in style guides and brand tone-of-voice as differentiators against the many AI writing assistants.

6) Enterprises: Plan Not for One, but Thousands of AI Touchpoints in Your Systems

If you’re building an ML strategy for a company, it’s worth thinking of the “Models” layer as not being limited to just one or a few models. Just like how software is used in all of the functions of a company (e.g., IT, HR, Sales, marketing, etc.), count on AI to provide value for the majority of functions where software is used.

A good example of accelerated adoption is this curve from a Google presentation from 2018. It shows the growing number of internal Google projects that use a deep learning model, reaching about seven thousand projects at the end of 2017.

In an AI-first company, AI usage can rapidly proliferate to thousands of use cases in a few years. [source]

Where does this trend stand today? In Google’s Plan to Catch ChatGPT Is to Stuff AI Into Everything, Bloomberg reports: "A new internal directive requires “generative artificial intelligence” to be incorporated into all of its biggest products within months."

Several forces drive an expectation like this, for example:

While we tend to think of AI as a standalone component, a more useful perspective is to think of it as simply an extension of software that empowers it to tackle more complex problems. So, wherever software lives, we’ll continue to find areas where AI can improve those systems.
Your first model will rarely solve the problem completely. There’s always a need to iterate across multiple models until one can properly be used in production.

Notice that AI touchpoints here don’t necessarily mean models. A single model can empower multiple use cases. A text generation model, for example, can tackle different use cases by changing the text prompt. A text embedding model can empower neural search, as well as text classification and sentiment analysis, for example.

A company’s tech stack would be completely different if it aimed to utilize ten models vs. a thousand models. Therefore, in the value equation, we need to account for one of the major components of the current deep learning revolution powering generative AI: the fine-tuned custom model.

7) Account for the Many Descendants and Iterations of a Foundation Model

Enter the Fine-Tuned Model

If you were to build a text generation model ten years ago, you’d most likely have to train it from scratch over the duration of months. One of the central developments in AI is that we now have pre-trained foundation models that are great at a large number of tasks (say, language tasks), which can then be trained a little bit more (a process known as fine-tuning) on a much smaller dataset to become excellent at one task.

Fine-tuning matters to the economic value map because it allows businesses to build proprietary custom models, even if the original model was publicly accessible or even open source.

If You’re on the Application Layer, Consider Sinking Your Claws in the Model Layer with Fine-Tuned Models

If you’re building a product in the application layer, fine-tuned custom models can be a differentiator for your product in the Models layer. A rapid boost can be achieved here when using a managed language model provider, which makes fine-tuning a model as easy as uploading a single file. This setup makes it convenient to experiment with tens or hundreds of custom models.

Fine-tuning factors into the value equation of generative AI when you consider that a product like Lensa AI likely fine-tunes a base Stable Diffusion model for each paying user (speculation). Another example is when using AI to write the script of an episode of the Stargate sci-fi series, twelve fine-tuned models were needed to capture the tone and style of each character.

Generation, Usage, and Feedback Data are Valuable for Future Versions of the Model

Deploying an AI product is not the final step. On the contrary, it’s merely the first step in a new and vital process: collecting new data to improve the model and improve the user experience. Read more about this in the People + AI Guidebook pattern called Let users give feedback. In user interfaces, a simple version of this can look like Grammarly’s feedback options attached to each model suggestion.

Collecting feedback data will add to the pool of proprietary data that can differentiate your product.

Another form of feedback is to collect human preference data to optimize models in the process now commonly referred to as RLHF (Reinforcement Learning from Human Feedback).

Your product’s usage data can yield valuable training data with the help of users and data annotators. One such process is described in the figure below. It’s one version of this model (and data) iteration cycle:

Release a prototype and study its usage

1) Put your application in front of users. Optionally, the application can be powered by a custom model that’s been fine-tuned using v1 of your proprietary data.

2) Collect user interaction with your application.

3) Examine user prompts and source high-quality generations for those prompts.

Sourcing high-quality generations is an entire topic of its own. Both human labelers and models can be used in pipelines to supply those completions. But glossing over this process for now, what happens after getting this data?

Using what you learn, take it to the next level

The following two steps are:

4. Add these new prompts and generations to your dataset to create v2 of the dataset.

5. Create a new model using this new dataset.

Another useful byproduct of a deployed model is collecting a model’s generations and making them public to aid other users.

8) Model Usage Datasets Allow Collective Exploration of a Model’s Generative Space

While it may not be the one generative AI moat that everybody is looking for, a public gallery of a model’s previous generations is emerging as an essential part of economic value around image generation models.

Midjourney is a great example of this. The free trial allows users a certain number of generations and has been wildly successful for the company. All the images these users generate are viewable in public Discord chat rooms, as well as on midjourney.com. But then, even if you pay for the service, both the basic and standard plans still make your generated images public on the website. Only the Pro Plan allows what the company calls Stealth Mode.

A large and diverse gallery of generated images vastly improves the user experience of these services by allowing a user to quickly zoom in on the kind of results they’re after. A lot of the time they expose you to ideas that you may find even better than the ones you had in mind, and so they allow you to quickly develop a certain concept by looking at different sources of inspiration.

Another example of using public galleries of a model’s generation for a product is Lexica.art, which quickly became one of the leading galleries for images generated by the Stable Diffusion model and the prompts that were used to create them.

Pockets of Moats for Application Layer Players

Let’s now bring all these points together for one final visual of where pockets of competitive advantage may be present for players on the application layer.

Are these pockets the most important for an AI business? Not necessarily. Business moats are often a larger factor than technological moats.

What do you think? We’d love to hear your thoughts on this topic as it’s a rapidly developing field. Join Cohere’s community Discord and follow @CohereAI on Twitter to learn when the next article is published.

Acknowledgments

Thanks to Aaron Brindle, Ivan Zhang, Luis Serrano, Nick Frosst, Rajan Sheth, Ryan Shannon, and Sally Vedros for feedback on earlier versions of this article.