Production Build is continuously failing

I write to express my rather perturbed observations regarding our deployments. Over the past day, they've been ceaselessly terminated due to an exceeding memory limit, despite my valiant efforts to amend our build parameters. It's rather puzzling that our most minor of CSS alterations could lead to such travesties, especially when just few days ago, all functioned seamlessly. Pray tell, has there been a shift in the underlying framework or system? If this vexing situation remains as is, might you suggest employing the 'wrangler' tool to manually upload to a preview branch? I would be remiss not to mention that previous communiques on this subject have, alas, gone unanswered. I ardently hope this one does not suffer the same fate. A prompt elucidation would be most appreciated. Warm regards,
33 Replies
JohnDotAwesome
JohnDotAwesome15mo ago
Hi @Logan - there have been some changes with how the container security layer handles memory, but they ought to be reverted now. Do you have a deployment ID I can look at?
Logan
LoganOP15mo ago
@JohnDotAwesome I believe it is 76401000-11b2-4c77-93bd-644552d03350 that is one of many of course. I have tried to mess with all of the configuration options like max-old-space-size=3600, which is the lowest we can go without our build HEAP_ALLOC failing. I have also tried gc_interval=100 and --optimize_for_size and another that failed again just a few moments ago - aa0d634a-7933-46f0-9af5-1a8999c14543 We have seen this happen quite often on our preview deployments to be honest, but never on production branch. Either way - they both need to be resolved or we will be forced to have to move away from cloudflare pages.
JohnDotAwesome
JohnDotAwesome15mo ago
Appreciate the info. I'm going to be digging into it today and Monday. One last thing, has this always happened or did it start recently? My hunch is that it started on the 13th? If so, we have some rolling back to do
Logan
LoganOP15mo ago
Exactly 7 days ago, which is the 13th. 🙂
JohnDotAwesome
JohnDotAwesome15mo ago
Rogert that. Standby
Logan
LoganOP15mo ago
We did see some of this on preview builds before that, but a couple retries fixed it. My assumption was less resources were provided for preview vs production builds so please note that as well
JohnDotAwesome
JohnDotAwesome15mo ago
The resources available are the same, so that would have been a fluke
Logan
LoganOP15mo ago
I see okay. Thanks for the information. I guess we just never ran into it on master branch until now. But I can confirm from our build history the build failures got worse after the 13th so yes We also don't have auto-push on master branch - we always "retry" for manual building
JohnDotAwesome
JohnDotAwesome15mo ago
Long story short, we upgraded our build cluster. Most builds just got faster. it seems a small percentage of builds started seeing memory and network issues
Logan
LoganOP15mo ago
got you thanks for the information that helps. Praying we don't need any emergency pushes over today and the weekend.
JohnDotAwesome
JohnDotAwesome15mo ago
I'm going to see about pulling you out of the new cluster to see if that helps
Logan
LoganOP15mo ago
okay great thank you
JohnDotAwesome
JohnDotAwesome15mo ago
FWIW, you can still deploy locally or from github actions via wrangler I know that's not ideal, but I just wanted to make sure you knew it was an option
Logan
LoganOP15mo ago
okay great. Is there a way to test that out on a preview branch or only to production directly via wrangler? any docs would be helpful as we have never used wrangler on our end.
JohnDotAwesome
JohnDotAwesome15mo ago
both preview and production work. One sec I'll get docs
Logan
LoganOP15mo ago
thanks so much
Logan
LoganOP15mo ago
awesome. I will get familiarized quickly with this and start testing on our dev/qa branches so I am prepared just in case let me know how it goes with moving us back
JohnDotAwesome
JohnDotAwesome15mo ago
🙏 It's getting close to the end of my day here, so I may not get back to you until Monday on how moving you off the new cluster is going. Do let me know how using wrangler goes. I can help out there more quickly
Logan
LoganOP15mo ago
Okay so to clarify, is there any chance we will see it today, or should I just count for monday?
JohnDotAwesome
JohnDotAwesome15mo ago
Count on Monday. I'm looking into it and it may not be easy to bring you back into the old cluster. That being said, let me know if you have any issues with wrangler.
Logan
LoganOP15mo ago
Thank you!
JohnDotAwesome
JohnDotAwesome15mo ago
Looking to do this in the couple of hours^
Logan
LoganOP15mo ago
Awesome thanks @JohnDotAwesome
JohnDotAwesome
JohnDotAwesome15mo ago
Hi there @Logan I've just reverted your project. You may find that build initialization is a little bit longer at first, but that will clear up after a little bit If you can run a test a build, that would be wonderful
Logan
LoganOP15mo ago
Sounds good. I will be able to test a few hours. Will let you know how it goes
JohnDotAwesome
JohnDotAwesome15mo ago
Thank you for your patience and I really appreciate you letting us know your issue
Logan
LoganOP15mo ago
trying now on a preview branch preview branch worked! we are not planning on production push just yet, but the minute we do, I will update this thread. just waiting on our release team for that. @JohnDotAwesome - did we somehow get switched back to the new cluster? Our preview deployments are constantly failing again.
JohnDotAwesome
JohnDotAwesome15mo ago
As of a few days ago, everyone has been rolled back to the old cluster. Let me take a looksie I'm seeing errors in your build log like:
Error: libs/nxtg-shared/src/profile/components/experience-dialog/experience-dialog.component.html:37:23 - error NG8007: The property and event halves of the two-way binding 'formControl' are not bound to the same target.
Error: libs/nxtg-shared/src/profile/components/experience-dialog/experience-dialog.component.html:37:23 - error NG8007: The property and event halves of the two-way binding 'formControl' are not bound to the same target.
Is that it?
Logan
LoganOP15mo ago
Oh interesting. I was just seeing the same issue with memory. Let me triple check with my team in that thank you John, thanks for the that. For some reason the logs that were being downloaded by my team were not reflecting that. I went in myself and took a look and it is now showing that. thanks for the visibility. EDIT: Things are still working as expected
JohnDotAwesome
JohnDotAwesome15mo ago
Cooool so just to confirm, y'all are back in a good state? Anything I can do to help if not?
Logan
LoganOP15mo ago
I think things are good @JohnDotAwesome - we still are having memory failures on our preview deployments but after about 3-4 rebuilds it works - which was exactly the same behavior as before the new cluster. not ideal, but we have a workaround. We still need to verify production builds, which I think may happen next week. I will keep you up to date on that
JohnDotAwesome
JohnDotAwesome15mo ago
Sounds good. On the memory issues, let me know if I can help. It may just be a matter of tweaking your NODE_OPTIONS
Want results from more Discord servers?
Add your server