Llama Token Counter - a Hugging Face Spa...
Hi, using llama2 from a cloudflare worker using the
When using the OpenAI API's we can pass an argument
ai.run
binding, and finding that the responses I get back get cut off after < 300 tokens. Is there a way to set the token limit for a response to something higher than whatever it's set to?
A silly example, to illustrate, where I ask for a recipe for potatoes au gratin with bubble gum syrup, gets cut off midway through the instructions...
If I take that response into a llama token counter:
https://huggingface.co/spaces/Xanthius/llama-token-counter
It's only 259 tokens but cut off, and my understanding is that llama2 is supposed to have to context window of 4096 tokens, so there should be no reason it couldn't have finished the instructions.When using the OpenAI API's we can pass an argument
max_tokens
to the chat completions API. I don't see an equivalent in workers ai. Is this something that you might add?9 Replies
Realized this should probably have been posted in workers-ai-beta, sorry
I see... 256 output tokens is not going to be enough for many use-cases.
I'm sure they'll increase these in time, it's very early
Yeah, I read the top part of the limits doc before I started playing around with this, but I didn't get down to the bottom of it. My bad.
Thanks for your help!
By the way, since you've used OpenAI before, how do you feel about the pricing of Cloudflare AI? I'm new to all this, and it seems pretty good to me, but maybe there's a catch.
I haven't really been able to wrap my head fully around it yet 😀 I think a lot depends on how many llm tokens = 1 neuron and how much someone really needs the "Fast Twitch Neurons" vs the regular ones, because the fast ones are 12x the price, (assuming that you're able to control if you use fast twitch or not???)
You'll be able to control it, but did I misread the blog post? I thought they're only 25% more. The token count is probably the current maximum.
Regular Twitch Neurons (RTN) - running wherever there's capacity at $0.01 / 1k neurons
Fast Twitch Neurons (FTN) - running at nearest user location at $0.125 / 1k neurons
Right, missed that 😅
@ian.b.taylor https://socket.dev/npm/package/llama-tokenizer-js I'm about to personally try it out, but haven't yet
Socket
llama-tokenizer-js - npm Package Security Analysis - Socket
JS tokenizer for LLaMA-based LLMs. Version: 1.1.3 was published by belladoreai. Start using Socket to analyze llama-tokenizer-js and its 0 dependencies to secure your app from supply chain attacks.