Production Build is continuously failing
I write to express my rather perturbed observations regarding our deployments. Over the past day, they've been ceaselessly terminated due to an exceeding memory limit, despite my valiant efforts to amend our build parameters.
It's rather puzzling that our most minor of CSS alterations could lead to such travesties, especially when just few days ago, all functioned seamlessly. Pray tell, has there been a shift in the underlying framework or system?
If this vexing situation remains as is, might you suggest employing the 'wrangler' tool to manually upload to a preview branch?
I would be remiss not to mention that previous communiques on this subject have, alas, gone unanswered. I ardently hope this one does not suffer the same fate.
A prompt elucidation would be most appreciated.
Warm regards,
33 Replies
Hi @Logan - there have been some changes with how the container security layer handles memory, but they ought to be reverted now. Do you have a deployment ID I can look at?
@JohnDotAwesome I believe it is 76401000-11b2-4c77-93bd-644552d03350
that is one of many of course.
I have tried to mess with all of the configuration options like max-old-space-size=3600, which is the lowest we can go without our build HEAP_ALLOC failing.
I have also tried gc_interval=100 and --optimize_for_size
and another that failed again just a few moments ago - aa0d634a-7933-46f0-9af5-1a8999c14543
We have seen this happen quite often on our preview deployments to be honest, but never on production branch. Either way - they both need to be resolved or we will be forced to have to move away from cloudflare pages.
Appreciate the info. I'm going to be digging into it today and Monday.
One last thing, has this always happened or did it start recently? My hunch is that it started on the 13th? If so, we have some rolling back to do
Exactly 7 days ago, which is the 13th. 🙂
Rogert that. Standby
We did see some of this on preview builds before that, but a couple retries fixed it. My assumption was less resources were provided for preview vs production builds
so please note that as well
The resources available are the same, so that would have been a fluke
I see okay. Thanks for the information. I guess we just never ran into it on master branch until now. But I can confirm from our build history the build failures got worse after the 13th so yes
We also don't have auto-push on master branch - we always "retry" for manual building
Long story short, we upgraded our build cluster. Most builds just got faster. it seems a small percentage of builds started seeing memory and network issues
got you
thanks for the information that helps. Praying we don't need any emergency pushes over today and the weekend.
I'm going to see about pulling you out of the new cluster to see if that helps
okay great thank you
FWIW, you can still deploy locally or from github actions via wrangler
I know that's not ideal, but I just wanted to make sure you knew it was an option
okay great. Is there a way to test that out on a preview branch or only to production directly via wrangler?
any docs would be helpful as we have never used wrangler on our end.
both preview and production work. One sec I'll get docs
thanks so much
Commands - Wrangler · Cloudflare Workers docs
Wrangler offers a number of commands to manage your Cloudflare Workers.
awesome.
I will get familiarized quickly with this and start testing on our dev/qa branches so I am prepared just in case
let me know how it goes with moving us back
🙏 It's getting close to the end of my day here, so I may not get back to you until Monday on how moving you off the new cluster is going. Do let me know how using wrangler goes. I can help out there more quickly
Okay so to clarify, is there any chance we will see it today, or should I just count for monday?
Count on Monday. I'm looking into it and it may not be easy to bring you back into the old cluster. That being said, let me know if you have any issues with wrangler.
Thank you!
Looking to do this in the couple of hours^
Awesome thanks @JohnDotAwesome
Hi there @Logan I've just reverted your project. You may find that build initialization is a little bit longer at first, but that will clear up after a little bit
If you can run a test a build, that would be wonderful
Sounds good. I will be able to test a few hours. Will let you know how it goes
Thank you for your patience and I really appreciate you letting us know your issue
trying now on a preview branch
preview branch worked! we are not planning on production push just yet, but the minute we do, I will update this thread.
just waiting on our release team for that.
@JohnDotAwesome - did we somehow get switched back to the new cluster? Our preview deployments are constantly failing again.
As of a few days ago, everyone has been rolled back to the old cluster. Let me take a looksie
I'm seeing errors in your build log like:
Is that it?
Oh interesting. I was just seeing the same issue with memory. Let me triple check with my team in that thank you
John, thanks for the that. For some reason the logs that were being downloaded by my team were not reflecting that. I went in myself and took a look and it is now showing that. thanks for the visibility.
EDIT: Things are still working as expected
Cooool so just to confirm, y'all are back in a good state? Anything I can do to help if not?
I think things are good @JohnDotAwesome - we still are having memory failures on our preview deployments but after about 3-4 rebuilds it works - which was exactly the same behavior as before the new cluster.
not ideal, but we have a workaround.
We still need to verify production builds, which I think may happen next week. I will keep you up to date on that
Sounds good. On the memory issues, let me know if I can help. It may just be a matter of tweaking your
NODE_OPTIONS