cpu load signalk on cerbo gx high
i am having a high cpu load for signalk node on my cerbo gx. I am trying to disable plugin per plugin, but cpu load stays high. it's on 2.5.0 - is there anyway how i can easily debug my cpu load per process inside signalk?
38 Replies
the SignalK is a very heavy application. It requires a lot of memory and CPU.
I've seen swap increase after a few days of work. There may be a memory leak.
for some reason it has been worse since i upgraded cerbo gx to firmware 3.20 which includes signalk 2.5. I upgraded manually signalk to newest version 2.7.1 but still high cpu load.
What do you have connected to sk? How busy is it, how many deltas per sec?
As you may have done disabling all plugins may rule out one problem area
Another would be discarding some not relevant pgns that have high update rate
Bottom of Data Fiddler gives you access to pgns that sources produce. We could add stats there to help in analysis…
i have nmea ve can enabled
ais has high update rate, let me play with that. i might just disable whole nmea2000 to test
i am trying to filter out address 43 which is my ais in the data connection settings. But it doesnt get filtered out. Am i forgettnig anything?
Devices on the n2k bus is a more likely culprit than ais
What id do you see in the data item list, 43 or canname?
Ah your ais is n2k…nevertheless
Have you restarted? Can’t remember if that is required for changes to take effect
Ah thanks, it was the restart likely. The data is not showing anymore in databrowser, so that works. But the CPU load is not decreasing, maybe because signalk is still processing the input to make the decision to even filter it
If I disable n2k as a whole, I go from 50% cpu load (which is 80% at 1 core almost if i look per core1 vs core2) to 6%. So it's definitely somewhere in n2k. Wondering what is the best option to tune that down. How to best 'disregard' pgns? should i use the filter? doesnt look like the load is going down though
delta / s is around 60 without ais and 100 with
OK, if i filter out ALL n2k addresses (until delta is 0 / s), the cpu load will still stay the same - only slidely reduce, so that hypotheses that filtering wont reduce cpu load seems to be true. If i disable n2k though it goes drastically down which is good, but then also critical n2k like speed and wind are not available unfortunately
I can just shut down the hardware of the AIS, but I like the functionality to be a station for marinetraffic.com, hmmmm. Should i consider going with a bigger capacity in terms of hardware? Like adding raspberry pi next to cerbo where the raspberry pi then can be dedicated for signalk?
is something else on the cerbo suffering from sk using cpu?
something we can do is that you capture some of your data by turning data logging on for the n2k connection (and then turn logging off!) and share the log file with me, i can take a look at the actual input data
signalk itself is suffering from the cpu usage. it gets very very slow, like opening node red 1 minute instead of 5 seconds
will turn on data logging!
ok turned it on for couple of minutes, now used: find /data/conf/signalk -type f -name "*.log"
which one should i choose? 😉 just the raw one? /data/conf/signalk/skserver-raw_2024-04-10T18.log
the raw log. will have a look over the weekend
i guess it's the AIS, lot of vessels around. Boat is in Amsterdam and I think it already catches there 100s of boats
you see lot of entries with PGN 129038 alone already
around 1200 AIS targets tracked per minute, see here: https://www.marinetraffic.com/ais/details/stations/35031 (it's turned off now but you can see historical data)
The SignalK by itself works fine with about 800 AIS targets. I checked. However through the NMEA 0183.
The FreeboardSK isn't.
have you tried disabling node-RED?
Yes, although not much change. Even when all plugins disabled, the cpu load is still high. THe plugin that reduce cpu load the most is the udp nmea 1803 one
That sounds really realiy strange. Are you sure you had restarted for these results?
Udp sender cpu consumption should practically zero
yeah it's very hard to debug per plugin and watching the cpu load i dont see changing that much. the 'most' is quite relative here, maybe few percentage.
the problem can also be the memory instead of the cpu load. Although the cpu load is at 60% for signalk, the system is 20% which stll has 20% idle left....
maybe there is a memory leak or something
the performance at start of signalk is also better then after a couple of hours
FWIW (caveat .. im not using a cerbo for signalk) … but on RPi/signalk i had some very similar symptoms, in my case it was combintion of influxdb plugin… writing/saving a lot of data and grafana.
.. couldnt see from the thread history if you have influx plugin operating?
.. all was good after a fresh boot, but over time it bogged down and became really sluggish.
(my solution was upgraded rpi to rpi4 with bigger memory, and whitelisted paths for influx)
no influx running
ok, same problem with can0 on raspberry pi directly. especially when opening data browser the things go wild as it is trying to load 100s of boats besides only context 'self'
To give a sense of the volume:
I think it might be because I upgraded my antenna and coax cable to good quality haha
Try GaladrielMap SignalK Edition, it no have problems with hundreds AIS targets. And their display can be quickly turned off - just in this case.
@Kees i've had the sample data file running now for a while, no evidence of a memory leak or performance degradation, everything working. maybe create a larger file, for a longer period? and maybe share it privately instead of on the channel, where it will stay for posterity...
Thanks Teppo, I found the issue. The ais targets are just updating to much (and/or there are to many). I moved from cerbo to raspberry pi4 and it can handle the nmea stream, however, as soon as i start consuming the data in a client (databrowser, wilhelmsk, kip) the cpu load gets abnormal and will break the experience. This is because it starts loading to much ais stuff. I need to find a way how to reduce updates / datastream for that. Would be interesting plugin to 1. max 100 targets ascending from closest (this is what axiom plotters do) or better 2. tune with update rate of some PGNs, like that SOG and COG not updating every second or something.
When i filter out the ais source as a whole in my data connection, at least the clients not crash, but then i not have AIS info anymore - so need to mitigate that with the above suggestions a bit, will find that out 😉
Gotcha! Makes sense. A larger capture would help. I think this is a scenario that needs addressing, but i need to be able to replicate your problem to make progress
So opening databrowser and vesselpositions cause the problem?
Vesselpositions is not so worse for whatever reason. But databrowser, as well as streams to KIP and WilhelmSK (once i start loading those clients on my devices it the signalk app/cpu goes wild).
btw side question. in raspberry pi 4b openplotter OS, if signalk (node) has 100% cpu load, the total cpu load of the raspberry is around 30%. Wondering why this is, can there be a setting where node can use all cpu power? probably architecture question and guess there is a reason for it. maybe even node is like designed like that, quite a noob here
designed like that, single threaded
could you create a larger log file? like i said, would be much easier to figure out improvements if i could reproduce your problem
cool, just started logging, how big you want it to be? 😉
ok, i was able reproduce Data Browser reconnect loop after having left the large log file running! two issues
- https://github.com/SignalK/signalk-server/issues/1718
- https://github.com/SignalK/signalk-server/issues/1717
GitHub
Initial delta burst causes send buffer overflow and webapp reconnec...
If there are enough cached deltas a webapp connecting via ws will get its connection killed by the send buffer check mechanism. The webapp will then reconnect, only to be killed again. We are blast...
GitHub
Excessive memory consumption of tracking sent metadata · Issue #171...
We are currently tracking sending metadata per ws connection: if we have not sent metadata for a context-path combination we will send metadata and add a marker that is formed from concatenating co...
to be continued
Not sure if this is also related to this issue, but thought I'd mention.
https://github.com/SignalK/freeboard-sk/issues/114
@Kees here is a version of
lib/interfaces/ws.js
that supposedly fixes the ws connections getting severed and going to reconnect loop (databrowser) as well as reduced memory consumption for ws clients
if you can locate the version that came with your SK install and overwrite it (first take a backup copy) with this you should be able to test drive this
once the server has gathered enough data and you open a ws client, like databrowser, you should see a warning message that includes outgoing buffer > max
message but not immediately the dreaded terminating connection
maybe @Scott Bender can give pointers on where to find the installed file?Gist
websocket code with fixes for too quick send buffer overflow and me...
websocket code with fixes for too quick send buffer overflow and memory consumption for multiple ws clients - ws.js
on VenusOS it's /usr/lib/node_modules/signalk-server/lib/interfaces/ws.js
would require root access and need to tun the script to make the root filesystem writable
probably same location on the pi
or could be /usr/local/lib/...
Thanks a lot. Sorry I am away for work this week. I will try later this week!
Works better now, this is tested with WilhelmSK client. Still spikes every minute, but they are short so easy to handle without things break. Data browser though is still challenging, when opening that, spikes are longer like 20 seconds, which then makes the other client (like WilhelmSK connected) break.
Thanks! So there’s more work there to make it play nicer