Looking for AI models for my SAAS
m looking for a fast, lightweight, and free (or very cheap) AI model for the first stage of my SaaS, which is similar to Notion or Obsidian. The AI should quickly format and organize programming notes, making them clean and structured without delay.
I don’t need anything super smart—just something fast, efficient, and low-cost. It should work instantly so developers can take notes without interruption.
I’m open to free, self-hosted models or cheap API alternatives. If there’s a way to define rules for formatting, that would be a bonus.
What are the best budget-friendly options for this?
9 Replies
Gemma 3 4B - API, Providers, Stats
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Run Gemma 3 4B with API
I came across this model
Seems good
I don’t know your exact usecase but you can also check way smaller models (like R1 1.5B, llama 3.2 1B or as you linked Gemma, but you can try 1B). Could even include in your app as craft is doing it.
imagine, i supose that your are a dev, so you want to document your work, or you study about something, but you want to be very fast writting, you can leave spelling errors, write everything in plain text, without worrying about putting headings or bullet lists, and I want the AI to, at the end of writing everything, or whenever you feel like it, click on the AI Button, and that will read all you doc and find the best and understandable way to organize your document, put headings, bullet lists, wrap code, maybe even improve the way explained
also the app will have some sort of a quick note system, and on the background i want that the AI to check all notes, and maybe the docs, and create connections between them, like obsidian graph
Now I know that it is possible to give certain orders to the model, so that it has a consistent output. But is it possible for it to learn from the user's "context", as if it were their assistant?
Then besides for the embedding, try some local 1B models and see if they are enough for you. And if people don’t want local models you can still offer e.g. Gemma via API.
After all, as you’re looking for a budget friendly version, this is probably your best bet as it’s … free 😄
yeah, i tested gemma 1b localy, and works pretty good, maybe i can fine tune it via google ai studio?
And then I would also try and check how e.g. Craft is doing it. I bet there are also Obsidian plugins to use models (I remember something called copilot).
I don’t think you need to fine tune anything tbh
Hm craft seams very cool
I wouldn't recommend self-hosting. It works, but LLMs have become pretty cheap recently, so it's not worth it.
Google's Gemini 2.0 flash gives you 15 requests per minute for free, which should be enough during dev, and it costs $0.4 per 1M tokens which is the equivalent to about 8 books worth of notes. And does structured output well.
Your explanation doesn't make it seem like you need a lot of expensive calls since it's just sending data and getting a formated JSON output, so the tokens used don't compound with each message as it does with chat apps.