Hosted inference is expensive. No two ways about it. GPT4 and Claude 2 are amazing, but if you are a developer using them heavily in production, the bills can stack up pretty quickly. There are plenty of startups that are trying to help developers manage those costs, but I’ve come to believe that local inference is going to be an important dimension to the coming bloom of AI powered applications. There are tons of smaller, task specific open source models that can get the job done running locally on my Macbook Pro, Ipad, and at some point my phone. And the beauty of running inference locally is that it’s free. At some point, the venture dollars and free cloud credits companies rely upon to subsidize users’ inference costs will run out, and when they do, apps are going to (and already have started) to pass those costs along to the end user.
As a consumer, I’ve become accustomed to the vast majority of my applications being free, so how does that persist when expensive inference is at the center of all the new apps I want to use. Well…if I download an app and whatever relevant LLM it relies upon to my own device, that’s a way to maintain the costless status quo to which I’ve become accustomed. Oh, and btw, I don’t mind that all the personal data I pass to this app stays on my machine as opposed to joining the app developer’s training set.
Downloading and configuring LLMs to run locally on my devices is an extreme pain in the butt, and honestly too much to ask of me or any regular user, but I’ve been using an app called Faraday for the past few months which does all of that work for me. In Faraday, I download a desktop application to my Macbook and they handle all the LLM complexity for me. On it’s surface, Faraday is an app where I can design, discover, and chat with AI powered characters and assistants. The UX is really good. There’s a super expressive prompting layer where I can define all the attributes of the AI I want to talk to. But then there’s another dimension where I can browse different underlying LLMs that when paired with the character prompt I create, result in very different conversations and personalities. The same character I create may act and speak very differently if I choose to run it on a 13B parameter fine tune called Chronos Hermes vs a 7B parameter alternative called Luna. I’m personally interested enough in the underlying models to tinker with different options, but the cool part is that for those that aren’t, the community does that work for them. People create the perfect character prompts on the perfect model and upload them together to a marketplace called the Character Hub, as a package, for others to download and use. The result is a really diverse set of combinations (model and prompt) that to me feel like mini applications. I browse the character hub and I feel like i’m moving through an appstore of cool apps that others have designed, and I get to download them and interact without paying a dime.
Faraday is architecturally very different than most of the AI apps I’ve seen, but I think they are on to something and I expect more developers to follow their lead…who knows…maybe they’ll even open up and let 3rd party app developers build on their infrastructure and platform…admittedly I’m biased as an investor in the company, but there’s something nascent and special happening at faraday.dev if you want to check it out.