While clinical applications require a rigorous approach to safety, rapid advances against "human benchmarks" e.g. medical licensing exams, suggest the problem is shifting from science to engineering. We believe these applications, from the administrative to the clinical, are likely to benefit from domain-specific fine-tuning.
Your Personal Seller
Small businesses today list their products across a variety of different platforms (e.g. Amazon, eBay, Shopify, etc). There exist a host of consultancies and digital asset managers today that help them manage their digital presence across all of these channels, some of which attempt to help SMBs understand their customers, optimize their advertising spend, market and price their products more efficiently. There's a lot of potential improvement left on the table in the form of product photography, descriptions, richer metadata generation, or more innovative ways to present products.
Foundation models enable the at-scale generation of alternative product listings and also therefore A/B testing of variants to improve conversion and potentially optimize for different audiences. With some improvement in technology, we believe it will also be possible to generate videos and other assets that convert better than anything manually generated today. By crawling marketplaces and experimenting with listings, a selling assistant might even be able to provide recommendations for new products to build or untapped audiences.
AI Therapist & Coach
One of the emergent use cases with the initial wave of LLMs has been a mix of therapy and personal coaching. What should you say to the friend who just lost a loved one? How do you have a hard conversation with your boss? How should you approach that person you're interested in?
General-purpose LLMs do a passable job at this today, but aren't perfect. Security and privacy are key - you probably want to have a WhatsApp-like commitment to privacy with disappearing messages and expiring history, but you also want the product to have high-level recollection of prior conversations. Safety is even more important for sensitive conversations. And from a brand and positioning perspective, you'd also have to normalize getting advice from a machine vs. a trained human.
A generally useful strategy for building new products is to observe emergent behaviors in horizontal technologies (like chatbots) and then build a specialized experience for those behaviors - this makes us optimistic that there is space for a dedicated solution. The space also likely supports a mix of B2C and B2B2C business models (e.g., employer-provided therapy or coaching services).
Automated Root Cause Analysis
If you've ever had the honor (read: misfortune) of being placed on on-call rotation, you've experienced the dreaded 2 am PagerDuty alert. Still groggy, you pulled up a dashboard with a bunch of failing services, poorly written logs, and angry messages from your manager. If you were lucky, you had a pretty good idea for what the issue was, with a previously written set of steps to resolve, you could be back to bed within the hour. Unlucky and you'd spend hours debugging in the middle of the night.
We think there's an opportunity for automated root cause analysis to substantially improve incident resolution time, and the experience for engineers. A lightweight "agent" with access to logs and metrics can, to start, retrieve relevant information (e.g. service statuses, past error logs, similar prior incidents) and suggest fixes based on what previous resolutions were. After an incident, the same agent could be used to provide a "best practice" resolution for future incidents. and generate a post mortem. Long term, agents might even be able to automatically fix common reoccurring issues.
While agents, broadly, don't (yet) work well, we think this area might be easier to tackle. For one, existing runbooks, which explain steps to resolve common issues, make bootstrapping much easier. More broadly, a lot of debugging is hypothesizing what might have gone wrong and looking for evidence. A basic information-only agent can already help substantially by testing hypotheses; by the time you roll out of bed, your debugging assistant can already tell you "it's not DNS," "all AWS AZs look good," and one day even "it's not a cascading cache invalidation."
End-to-End Legal Outcomes
We were early investors in Harvey, in part because it was clear that despite legal firms historically being minor buyers of technology, LLMs would allow massive automation of such a text-based industry. Continuing down one path from that thesis, we think there's potential to automate certain end-to-end, transactional services provided by law firms.
For example, we think it's possible to build an "Immigration Firm in a Box" -- a system of models (supported by some human help to start out) that ingests your employment & personal data, files your immigration claim, advises you on possible visa options, and can answer questions about status and progress. Clippy for Immigration might even be able to provide more available (and cheaper) opinions than current systems.
Another example might be basic trademark search and filing, or preliminary sales contract markup based on precedent - always frustratingly slow.
We're sure there's lots of variants of this pattern that we're not familar with yet, but we think this shape of company is really exciting.
High-Consideration Research
There are a bunch of queries that are broken in the current search paradigm. If you want to know what washer/dryer combo to buy or what to do in New York City for three days with kids, you either spend hours reading through dozens of tabs littered with ads and terrible UIs or you just end up on a trusted site like Wirecutter (or worse, you do both).
LLMs are really good at quickly reading and synthesizing hundreds of pages of content on any topic. Is there a good consumer product experience to be built around building a dynamic, interactive, comprehensive query experience for high-intent, high-consideration purchases?
The first challenge will be building a product experience that is better-enough that an existing search experiences that it develops a cult following. The second will be figuring out distribution. But if you can solve both, this is a very valuable and lucrative problem to solve.
Workforce Tetris
Recruiting and allocating hourly workers across restaurants, retail, field services, and warehousing today often involves a manager posting paper flyers and texting team members about changes, playing human tetris to fill a shift. Recruiting, employment compliance/admin and logistics requires a backend database and scalable workflows, but the next generation of workforce software shouldn’t put that burden on managers or workers. The preferred interface for the field isn’t a clunky mobile workflow app, it’s natural language chat over SMS, and we now have models good enough to engage/screen applicants and (help) beat the game of tetris.
Next-Generation Autocomplete
Why isn't there a Copilot-like experience for the rest of your computing experience yet? A browser extension that learns your writing style and makes you 10x faster at email and anything else you have to author. The ideal experience would be deeply personalized based on everything you've ever written and could author an entire email with just a couple of words of context.
While it might seem that incumbents have an unassailable distribution advantage here, Grammarly has shown you can build a large, independent business here. Incumbents are also likely to be slow and cautious in launching this, creating space for new entrants.
Speed and privacy will be key, likely necessitating a hybrid local/cloud approach.
The All-Seeing Eye
The physical security monitoring industry remains stuck in the past. Organizations and consumers deploy millions of cameras, but the last generation of companies is still moving storage to the cloud and creating seamless networking gateways (a huge improvement), and penetration of sophisticated computer vision remains minimal.
Hardware and storage should be rethought from the ground up in the age of semantic video understanding, and powerful on-device models. A full-stack security services firm could see more, cost less, and offer a step-function better experience.
Always Pick Up the Phone
Businesses (in particular small businesses) do not answer about half the calls they receive, but inbound calls are often their most important source of leads. Everyone has experienced this.
Use cases range from home services qualification to informational updates, from restaurant reservations to appointment-booking, from order tracking and stock checks to bill collection. These critical customer experiences are widespread, scoped and transactional. Voice generation quality and LLM capability are approaching the ability to handle many transactional calls. What’s missing is the last mile — distribution, customer journey design, guardrails and workflow automation.
Should this be developer infrastructure, horizontal SMB application, or a rethought full stack vertical solution? You tell us.
Developer in a Box
Code generation might be the most obvious area for language models to make a large impact. Beyond being an in-domain problem for AI practitioners, and an obviously valuable mostly-text format, code models also benefit from the rigid structure of code as a language and the ability to leverage compilation & testing checks as a mechanism to provide feedback to models. The work of developers is so valuable (and expensive) that automating or accelerating even small portions of it is incredibly valuable as well.
Empirically, this has been partially true; one of the first AI products to get real traction was Github CoPilot, and still is among the most successful today with over a million developers using it. More recently, ChatGPT has proven to be a useful assistant for writing and editing code. But the list of products with widespread usage roughly stops there; from surveys of engineers in our network, we haven't found any other code development products that have gotten widespread adoption.
A gap that we see in the market that's particularly exciting is the ability to go from a human description of an issue to a draft solution, in code, to the problem. We've, of course, seen some exciting open source projects like AutoPR and GPT Engineer working on this problem, but we believe that there exist some deeper technical challenges that have need to be tackled in order to solve this well. Some examples below:
- validation; code generation work at large research labs (i.e. OpenAI, DeepMind) suggests that the ability to validate generations against some set of criteria is incredibly valuable. The sets of these validations range from low level (i.e. does this compile?) to file level context (i.e. do unit tests associated with the changed function still pass?) all the way up to company context (i.e. does this change implement business logic?).
- code base context; determining which sections of the codebase are important for a specific generation remains an unsolved challenge. There have been two primary approaches, each with their own sets of deficiencies: a) heuristic based approaches, which are fast and can capture local task intent but lack codebase & project-level context and b) source graph/AST based approaches, which can capture a greater degree of code understanding, but are slow, brittle, and still lack project context. In practice, combining the two of these appears to be a necessity to do global code context understanding well.
We're excited to meet folks who have insights on how to solve these problems well (or believe you don’t need to in order to generate high quality code)!
AI Static Analysis Tools
Over the last five years, an increasingly large slice of security solutions have “shifted left,” born out of a realization that placing security checks at the end of the software development lifecycle results in waste and a larger communication burden. As part of that, tooling that integrates into integration & deployment processes or, ideally, software development itself, has proven to be extremely valuable. Static analysis tools, that automatically look for vulnerabilities and potentially fix them, have been a large part of that.
While extremely useful, the major issue with static tooling so far has been the high rate of false positives. While machines can often flag potential issues, it requires context like code structure, deployment status, and even historical application traffic to determine whether a potential vulnerability is harmless or urgent. In practice, static tooling sometimes has such high positive rates that engineers tend to ignore them entirely.
At larger tech companies (i.e. Google, Microsoft), we've heard of internal tooling that automatically triages and prioritizes issues identified by other systems. We think that language models may generalize well enough to bring this technology to smaller organizations as well. We also believe there are interesting related opportunities, such as building automatic remediation of identified issues and cloud resource provisioning as a result of the independent trend towards infrastructure as code.
AI-Native MMPGs and Social
LLMs can now plan against objectives (poorly) and carry on an engaging conversation - even be Sensible, Specific, Interest and Factiual (SSIF). What would a game world populated by AI's be like? If “The Sims” and the engagement with AI girlfriends, AI celebrities, and AI therapists are any indication, it would be wildly fun.
What if the next generation of entertainment is personalized generations? If one like to look at pictures of “cats where they shouldn't be,” let's generate them. In an era where one can increasingly produce any media (images, audio, video, memes), mass personalization feels within reach.
AI Video Generation, Editing and Understanding
Video is a major social, informational, educational, and marketing medium, and the fastest growing. Digital video ad spend is projected to rise 17% in 2023 to $55 billion (per IAB). However, production of “commercial” video remains prohibitively difficult and expensive. Short form, simple commercial video can cost $1,000 to $50,000+ to produce from start to finish, and the majority of commercial video is created by agencies and professionals.
Demand dramatically outstrips “supply” of video production. Only ~3,000 brand advertisers globally create video ads, but there are 250M video creation and editing web searches per year in English.
AI will revolutionize and democratize video production, editing, personalization and understanding. Video is a challenging frontier of AI research; it is computationally costly, there's limited input data, we are still figuring out how to ensure temporal consistency, and it deserves new interfaces for control. But the frontier is advancing rapidly, and we're interested in companies that both push that frontier and cleverly leverage these technology in usable products today: from indexing/semantic understanding, to captioning and translation, to style transfer, to generated backgrounds, avatar and even product videos from 3D models, there's a treasure trove of technical capability. The product opportunity (to cleverly cross the usefulness chasm with the capabilities we already have) is equally important.
Distributed Team Representation
In every working enterprise organization, there are a number of roles that have infrequent but “large swing” utility. Think of, for example, the role of compliance, legal, or security in most companies. Adding the coordination cost of getting every PRD reviewed by someone on the compliance team often doesn't seem like it's worth the loss of momentum & velocity for the product team, until a months long effort gets killed late in development by a fundamental compliance failure. In response to similar challenges, as part of the shift left seen in security in the past couple years, an increasing number of teams have appointed “security champions” whose responsibility is to represent the interests of the security team more broadly.
We think very lightweight “agents” that represent the point of view of organizations, people, or even individual documents can offer a solution to these kinds of challenges. The implementation can range from automatic document editing & commenting (i.e. “consider encryption strategy), to Slack channels for developers to ask what someone from another organization would think of their approach. These agents would both help distribute knowledge across the teams while also freeing up the core legal/compliance/security team to take on larger initiatives and do more focused work.
We think of these agents as the next step from “chat your document” style use cases that capture a more specific enterprise workflow. We're excited to talk to folks working on similar problems or that have unique points of view for where and how to integrate!
Web Content API
Language models benefit a great deal from access to "reliable web data" -- knowledge bases offer explicit checks against hallcuination, especially when combined with some research driven methods of revising (e.g. Gao et al 2022, Peng et al 2023). They also allow for citations to externally verifiable material, which are valuable both to build user trust and also to expand on first answers with reliable source material.
However, current web content APIs lack the flexibility & feature set required to power large scale web applications. Consider, for example, what sets of technology would be required to build a clone of ChatGPT with web browsing. While many startups use SerpAPI (or one of it's many competitors), there doesn't exist a web search API that has access to page content, parsed outlinks from the page, or even edit history. This set of features is clearly useful for more expansive language model applications, but, at least at first glance, would also be extremely helpful for many of the personal assistant style applications we can think of.
Another variant of this problem is in systematic crawl and parsing -- today, companies sign one of contracts to crawl & parse a pre-negotiated set of fields through third party providers. A modern crawl company could offer those sets of data, along with the orchestration to ask arbitrary questions of that dataset (i.e. "on each page that discusses AirPods, what's the sentiment?"). We think this power will be useful not just in e-commerce, but in a wide variety of other use cases, like pharmaceutical companies looking to gather data on side-effect frequency or market research firms assessing the success of a new product launch.
The Tireless (Junior) Financial Analyst
LLMs have the potential to transform financial and accounting software from databases to context-aware, proactive processors. These models could shift the human expert's role from manual “rules engine” to strategic oversight.
The initial success of domain-specific models such as BloombergGPT on financial NLP tasks (such as ConvFinQA), the “code interpreter” approach to increasing accuracy of calculations, as well as early research results of using specialized LLMs for tasks such as transaction classification are all encouraging.
We think this is a technically rich and commercially valuable application area: requiring robust interactions with PDFs and tabular data, increased domain-specific reasoning, task-specific research and engineering, and definite need for workflow product beyond the chatbox. From a data perspective, we’re particularly excited that global accounting, tax, financial reporting and compliance standards are all codified in natural language, with corresponding large crawl-able datasets of compliant examples. Some tasks that could be interesting starting points:
- Financial narrative generation
- Semantic search/Q&A against SEC filing, earnings calls, etc.
- Interpretation of documents (invoice line items, terms/conditions, currency conversions, industry-specific nuances) into ledger entries.
- Audit-assist (journal entry testing, financial ratio analysis, anomaly detection of historical fraud patterns)
- Consolidation (automated data extraction from subsidiary ledgers, reconciliation, currency translation)
Autonomous HR (and IT) Helpdesk
A high volume of HR events lead to end-user communication: new hires, exits, role changes, promotions, location changes, manager changes, and payroll/benefits changes. Large companies have hundreds of folks whose jobs are primarily to notify employees of these events, verify documents, answer questions, and update records in HRIS systems, often under the titles of HR Operations, Talent Support Operations, Talent Systems Coordinators, Employee Support Coordinators, Compliance Coordinators, and HR Service Desk.
Whatever the titles, we think these teams can be 10X more efficient — and deliver a dramatically better, faster employee experience. Over the past decade, companies have built “service catalogs” and “service request forms” to digitize their processes, but these still create too much manual operational burden.
The next “intranet” isn't a portal at all, but is instead a conversational search box that can intelligently retrieve in-context, localized, access-control aware answers from enterprise documentation and systems of record (and then, accurately updates those records). IT and HR processes are tightly intertwined, but HR is particularly poorly served, and ever harder for increasingly global/hybrid organizations.
A domain populated with process documentation, ever-changing compliance needs, complex policy application, forms, and natural language communication is ripe for attack by LLMs.
Technical Customer Support
There are many (promising!) startups working on solving customer support problems, beginning with a common set of simpler use cases, like processing returns on e-commerce sites or basic questions about planning on travel sites. We think that this is a large and promising market, but also believe there is a unique and new opportunity to target a more challenging & sophisticated set of "technical customer service" requirements i.e. issues with MongoDB, Databricks, Github, etc.
These issues are currently extremely expensive for companies to deal with, often requiring staffing (multiple!) full time engineers to support or "forward-deployed" roles. And, existing customer support solutions are unlikely to support this workflow; in order to solve for the technical support use case well, we think a startup would likely have to do multiple of the following:
- connect to documentation and keep responses up-to-date with product changes
- integrate with code environment to ingest current customer state
- generate potential code fixes
- run, test, & report on potential code solutions
An early version of this product might serve as a "debugging copilot" for the engineers currently working in that role, and, over time, enable them to spend more of their time actively building and deploying new product as opposed to purely on customer support. We also think it's possible targetting the top end of this market would lead a startup to build rigorous infra and eval that would enable them to serve the more traditional (i.e. less technical) use cases as well.
We're excited to talk to folks working both on the technical and more general variants of this problem!
Manufacturing Asset Generation
Foundation models have been increasingly multi-modal; we started with text, then image, and now there's a whole host of applications from singal processing to video generation. One class of models that has appeared to be consistently challenging, however, is generated 3D models, specifically with high enough fidelity to be used in precise end applications (i.e. construction, manufacturing, etc.)
Beyond simply meeting some set of requirements that user specifies, these generated 3D assets have demanding precision requirements, sometimes down to the millimeter, must be able to manufactured, and often have a complex optimization space. These are often processes that are still difficult for experts, much less AI models that still don't have a real understanding of "physics".
We think this is a problem worth tackling despite this challenge, for a couple reasons: first, the vast majority of the revenue of AutoDesk, one of the largest CAD players, comes from their engineering & construction division ($1.1 billion in 2021); second, we think it's possible to build assisstive tooling that either helps engineers narrow down their design space more efficiently or performs some set of menial tasks for them as a starting point; finally, we're optimistic that combining generative models (i.e. NeRFs, Dreambooth) with work to "clean up", like simulation for validation and other post-processing, can reduce the burden on zero-shot model output.