Unstable Response Times

I deployed a Remix prototype using a SaaS API, I am on a paid plan and using Smart Placement. Below is the response times I see after a fresh deployment, i refreshed the page 26 times. Response time varies between 288ms and 1.10s. First two ones over 2 seconds are understandable, first one with empty cache and Smart Placement used AMS, on second one Smart Placement changed to FRA and settled on FRA for the remaining requests. Why can't we have stable response times? I am sure it is not the SAAS API, on my local I get 60ms to 80ms response times. Deployment ID: 30d04e5d-2fd0-4f4d-b866-ca98e9aad893
No description
81 Replies
st
st6d ago
anyone?
Nob
Nob6d ago
same I get between 100 and 600ms response time depending on time and random for a nextjs project
st
st6d ago
hmm, at least this behavior is framework agnostic 😄
Nob
Nob6d ago
I think servers running workers are overloaded so sometimes you can get 600ms when its busy and 100ms when its free
Chaika
Chaika6d ago
Workers have a flat deployment, the same server which receives your request runs the worker, there's no load balancer or anything inbetween, so unlikely Smart Placement is.. interesting and has had some issues in the past. You'd have to look at the cf-placement header and location of your origin to try to tell more, and whatever else your app is doing
Nob
Nob6d ago
yes but why at 5am i get 100ms response time and at 6pm i get 600ms response time and i don't use smart placement
st
st6d ago
as I said, it settles on FRA which is aligned with my SAAS
Nob
Nob6d ago
atm I get 500ms response time while this morning it was around 100ms
Chaika
Chaika6d ago
either the origin you are connecting to, or an issue with nextjs. Nextjs's support on Pages has been an ongoing challenge for a while, Vercel doesn't want to help out unlike most of the other frameworks. If it's not smart placement you should make your own thread though. I would check your functions metrics (cpu time, etc)
Nob
Nob6d ago
No description
Chaika
Chaika6d ago
If you're in Germany/using DTAG that has its own set of challenges currently with free routing ouch, look at that p99
Nob
Nob6d ago
i don't really know why, i don't do much things but atm i can't get any request under 400ms response time
st
st6d ago
this is from the same page, all I do is refresh, so it's doing exactly the same thing in every request
Chaika
Chaika6d ago
Can you make your own thread? It seems your issue might be unrelated, and it's hard to keep both stories straight when you're not even using smart placement
Nob
Nob6d ago
okay
Chaika
Chaika6d ago
you're using smart placement, and you're in Germany/using dtag or not?
st
st6d ago
no, i'm in turkiye
Chaika
Chaika6d ago
turkey has its own set of issues with ip blocking but probably not going on here. You said even the slow ~1s requests are smart routed? Can you check what the cf-placement and cf-ray headers on one of those is?
st
st6d ago
local-FRA 89c922d18c552bf1-FRA
Chaika
Chaika6d ago
looks like smart placement isn't doing anything and you're being routed directly to fra
st
st6d ago
what do you mean, if I disable smart placement I'll probably be routed through IST?
Chaika
Chaika6d ago
no, wouldn't change that also probably wouldn't help latency curious that you said before you were being routed to ams. Well doesn't matter too much, if you go to your functions metrics, what you see for request latency? You should see smart routed vs non
st
st6d ago
No description
st
st6d ago
unfortunately I don't see a Request duration chart Doc says "The request duration chart is currently only available when your Worker has Smart Placement enabled." this was the main reason I switched to Smart Placement...
Chaika
Chaika6d ago
well it turns out you do have the exact same problem unrelated to subrequests or smart routing, your cpu time is way too high. Something's taking a ton of time to execute, cpu time does not include network wait time. In the past it's been silly things like huge svgs that have to render or other expensive operations, it's going to depend on your framework/libs a bit for reference, free gets 10ms of cpu time and it's plenty. I have a nextjs app deployed on Pages and its p99 is ~17.8ms
st
st6d ago
i am on a paid plan because in free I got exceeded errors after 3-4 requests but again
Chaika
Chaika6d ago
you found the wrong solution to the problem
st
st6d ago
this is the same request doing the same api call rendering the same html
Chaika
Chaika6d ago
and they're all pretty long, none under ~300-400ms or so the way cpu limits work is that each isolate/instance of your worker running on a metal gets a ton of startup time, and if you're going way over you're going to burn through it, which is why it took a few requests to fail
st
st6d ago
i am measuring the time in my loader(this is a remix app), from the entry to the return, it's 12ms for this request,
No description
st
st6d ago
there are no api/backend calls after loader finishes, just react rendering
Chaika
Chaika6d ago
if you're doing performance.now or date.now(), it doesn't move the time unless on network i/o
Chaika
Chaika6d ago
Cloudflare Docs
Performance and timers · Cloudflare Workers docs
Measure timing, performance, and timing of subrequests and other operations.
Chaika
Chaika6d ago
ex: var oldtime = Date.now() // 500ms of cpu var newtime = Date.now() // newtime = oldtime are going to be equal that's only true for deployed workers though, local wrangler will advance normally
st
st6d ago
i know that, i have at least 2 api calls cached through kv cache miss loaders take around ~600ms and as far as I know waiting for an api response does not count for cpu time limit
Chaika
Chaika6d ago
oh, you were saying before it was only 12ms for a cache hit of everything, and it's 600ms uncached?
st
st6d ago
yes, but that's just the measurement of loader function, not the overall response and pages cache evict too qucikly, i think it's around 60 seconds or so
Chaika
Chaika6d ago
well eitherway if it's taking ~600ms that's way too long pages doesn't have cache, which do you mean?
st
st6d ago
cache api
Chaika
Chaika6d ago
oh, you can pass in a cache-control max age but it may still be evicted faster, and cache is per location. Eitherway it shouldn't be taking ~600ms of cpu time to run it uncached
st
st6d ago
that's not the execution time it's waiting for the api response, I am not calculating anything for 600ms
Nob
Nob6d ago
@st try stopping requesting the worker for 5 minutes then test again
Chaika
Chaika6d ago
the chart you showed above is ~600ms p90 and ~150ms p75
st
st6d ago
yes, because I am the only user/developer
Chaika
Chaika6d ago
no, your site just shouldn't ever be that slow/that much cpu usage, uncached or whatever well, in my opinion and the way workers are priced at least, that is, it's just too slow if you think it's the api call that's slowing it down, you could make a worker with smart routing that purely calls it, isolate each component
st
st6d ago
loader is the only place I can meausre time i don't now what is the cost of render on pages, but on my local machine its around 100ms including API response
Nob
Nob6d ago
cloudflare workers have too many issues and we can't even know why because there is no logs 😐
st
st6d ago
I don't need to wait 5 minutes, 60 seconds is enough
Chaika
Chaika6d ago
workers are fine, there's request logs too. Under each deployment you can tail it and see all requests
Nob
Nob6d ago
yes but you don't see anything useful to debug performance you don't even see the cpu time for a single request
Chaika
Chaika6d ago
yea, a bit part of that is Spectre and speculative execution mitigations you can still time each api/kv call and such though, and try calling from a worker, etc
st
st6d ago
so, same page takes 100ms in my local environment, let's add 200ms latency to that, i need to see a constant ~300 ms response time
Chaika
Chaika6d ago
the bigger issue is just frameworks being a black box and doing so many things you don't know of. CF could help in that though if not for the spectre mitigations by giving timings of each function and such
Nob
Nob6d ago
yh but when i time that my api calls are taking 30ms but my request time is 600ms i don't understand :NotLikeThis:
Chaika
Chaika6d ago
it's the cpu usage or other requests (if you have any)
Nob
Nob6d ago
yh cpuTime make no sense
st
st6d ago
but as you see that is not the case, and even workers claim that it takes around a second
Nob
Nob6d ago
maybe nextjs have memory leak
Chaika
Chaika6d ago
memory shouldn't cost that though, it'd error out at going over 128 mb
Nob
Nob6d ago
i had 600ms response time 5 minutes ago now i have 200ms changing nothing just waiting
Chaika
Chaika6d ago
well something's changing internally lol I think next-on-pages does take advantage of cache api internally
Nob
Nob6d ago
thats the cpu time that changed so i don't understand its same page
Chaika
Chaika6d ago
if it was caching rendering or page chunks
st
st6d ago
yes, but we are not talking about caches, we are talking about the cpu time
Chaika
Chaika6d ago
I just can't help much with the framework specific stuff other then saying add logs around everything you think and test locally (keeping in mind cpu differences), you can see from a standalone worker the fetch doesn't take that long to your origin, it's a mix of cpu time + what the fetch actually takes and round trip-time
st
st6d ago
i showed you, 12 ms using cache, but server response is ~1 second one request later it's again 12ms but response is ~300ms the SAAS API is algolia there is no way that their response times will fluctuate that much
Chaika
Chaika6d ago
you mentioned having other net operations like kv in each call -- I would time all of them separately and log them, it'll give you some more insight. You could also use a separate worker to call algolia like I suggested before to see relative latency/how much it should be, and doing so would remove it being maybe cf/a worker from the mix (or point to it)
st
st6d ago
hmm, i think it is not possible, because I am using their react component which handles the api calls internally
Chaika
Chaika6d ago
well you could at least do the second part to try to debug that and confirm (or not confirm) that it's not workers/smart routing alone causing it
st
st6d ago
i can try but I have no clue do you mean I should create a naked js function and deploy it to workers?
Chaika
Chaika6d ago
either an actual worker or a plain pages project using nothing but normal functions workers are really easy to use, just normal fetch interface, ex:
export default {
async fetch(request, env, ctx) {
return fetch("https://google.com", {
headers: {
"cookie": "cookie"
}
})
},
};
export default {
async fetch(request, env, ctx) {
return fetch("https://google.com", {
headers: {
"cookie": "cookie"
}
})
},
};
st
st6d ago
i can proxy the algolia requests, to a worker i guess, but isn't this will be same with Subrequests Subrequests: "Requests triggered by calling fetch from within your Functions."
Chaika
Chaika6d ago
I was saying use a worker (or a pages function) to do a subrequest to algogia, same as your actual deployed app would. By doing so, you're isolating the subrequest out, and you can see the latency of just that subrequest and see if it varies or not. not make your app proxy the request or anything
st
st6d ago
algolia'a react library makes the api calls internally https://www.algolia.com/doc/guides/building-search-ui/what-is-instantsearch/react/
Chaika
Chaika6d ago
and you're ssring it and thus calling that api internally?
st
st6d ago
that's right
Chaika
Chaika6d ago
well to test it purely you'd need to find out the exact request it's doing (if it supports client-side, could maybe use that) otherwise you've just got too many pieces in play, like trying to figure out if the center of your cake is a lemon without pulling it apart
st
st6d ago
i can do that, but i need ssr for seo, thanks anyway
Chaika
Chaika6d ago
sure, I mean just for testing, figuring out the exact request it makes, reproducing it cleanly in a worker or function (separate from your app), and then you can see the latency of it directly you figure out if that subrequest is slow or not, and then you look at the rest of the layers. (ex: if the lib making too many requests? is react doing something? etc)
st
st6d ago
by the way, i did deploy the same app using next.js, it's much worse 😄
Nob
Nob6d ago
🥲