Weird communication issues out of nowhere
Hey guys, I have an octopus 1.1 and an EBB42 toolboard. Everything was fine until yesterday, but today I wanted to home the machine and the console told me "Communication timeout during homing z".
When I home the axis one by one (like first x, then y and then z), it works fine. Just homing them all at once leads reproducibelly to this issue.
Any ideas?
Regards!
128 Replies
wise-whiteOPโข14mo ago
wise-whiteOPโข14mo ago
of course I did. didnt help. Then I googled and found some other users with the same issue... someone recommended to disconnect the webcam, and this did the trick for me. With webcam, I always ran into that issue.... without, it was stable.
That gave me the idea that my power supply for the raspi is maybe to weak... swapped it with a stronger one, and since then the problems are gone....
Interesting. You should've gotten undervoltage warnings in a notification in mainsail if the power delivery was insufficient. But it makes sense that would be the cause.
Good you got it sorted ๐
Ive got the same issue, in process of adding toolboard ebb42 to my printer, was working last night, ran the updates and added a nozzle cam and tonight its giving the timeout error when homing Z, z was getting lower each time I tried as it drops when you do x and y, wound it back up manually so it was close and it managed to home all axes
correction its not fixed it, just be resetting z offset, reboot and its giving same error homing Z
but try again and it homes ok
Does it work if you home x, y and z individually?
x and y will home always, z only sometimes
So if you home x and y. Then
G28 Z
it will sometimes fail?just tried that , no it gave Communication timeout during homing z twice
sorry gives this error twice in the consol
I am getting under voltage errors , but I always have and its never caused a problem
I'll unplug my nozzle cam and see if it makes a difference.
Yeah i'd fix this first.
Adding a camera just increases the power requirements
If your pi is undervolted, it's unreliable.
buck converter its powered from is reading 8v
Problem isn't the buck converter, it's your wiring. Use 2 pairs.
Also put it down to 5 before you power it on again or the pi will blow.
3 home alls all worked
between 5.1 and 6 (IIRC) should be safe.
2 pairs ? what do you mean
5.1 is fine as long as your wires are thick enough (ie, use 2 pairs instead of 1).
2 5v, 2 gnd.
Assuming you're connecting it to the GPIO
no its a usb from the buck converter
Yeah that's problematic. The usb cable power wires are usually only 26AWG, sometimes 24AWG. Single pair.
That isn't enough.
22AWG should work. Or 2 pairs of 26 or 24.
ok so wire the power to the gpio pins, to 4 pins on the gpio or just 2?
4 is gauranteed to work
So do 4
Just assume the SKRat is your buck.
Ive always had the warnings about undervoltage and throttling but its never caused an issue and didnt know how to fix it, but I guess a toolboard and a camera are now overdoing it
toolboard shouldn't draw any power, it's powered by 24v. The camera will however.
so power the pi from the octopus board?
When you get power errors, it'll work until it doesn't. Ie. it's unreliable.
Sure you can do that, or you can keep using your buck, it's the connections on the pi that's important.
But yeah, the octopus is made to power a pi.
It has an 8A 5v rail, so plenty of juice.
ok Thanks for your help, one again, you guys are superstars for all the work you put in developing this stuff and the endless hours supporting it
๐
Btw, there's a guide on how to use the octopus to power the pi right here: https://ratrig.dozuki.com/Guide/12.+Wiring+Firmware+&+RatOS/143#s1706
Rat Rig
12. Wiring, Firmware & RatOS
Step by Step Guide on how to assemble and wire the electronics on the V-Core 3.
Ah just looking for where to wire it from, this stuff was not around when I wired my Vcore, build instructions were great but you were kinda left to it when it came to wiring it
Yeah they've been busy behind the scenes working on that ๐
by the way can I swap out the pi from a 3b to a 4b, just change it and insert the sd card, is everything saved on the sd card?
Yep!
Great , guess ive got a bust weekend wiring !
The toolboard came with a heatsink, like the ones on the motor drive boards but not been able to find a picture of where it sticks on, guess its the extruder motor driver, where is this on the ebb42?
Yup you stick it on the driver, look for the trinamic logo
wise-whiteOPโข14mo ago
unfortunately, the issue is back... communication timeout while probing a bed mesh... then I rebooted the raspi, and after that I ran into this:
wise-whiteOPโข14mo ago
klippy.log says
Last MCU build version: v0.11.0-279-g7bd32994
Last MCU build tools: gcc: (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110 binutils: (GNU Binutils for Raspbian) 2.35.2
Last MCU build config: ADC_MAX=4095 CLOCK_FREQ=50000000 MCU=linux PCA9685_MAX=4096 PWM_MAX=32768 STATS_SUMSQ_BASE=256
No build file /home/pi/klipper/klippy/../out/klipper.elf
mcu 'toolboard': got {'oid': 9, 'next_clock': 2979717251, 'value': 7481, '#name': 'analog_in_state', '#sent_time': 33.912559883, '#receive_time': 34.008338685}
mcu 'toolboard': got {'oid': 10, 'next_clock': 2980357251, 'value': 7876, '#name': 'analog_in_state', '#sent_time': 33.912559883, '#receive_time': 34.0203428}
mcu 'mcu': got {'oid': 17, 'next_clock': 847316416, 'value': 7575, '#name': 'analog_in_state', '#sent_time': 33.985662539, '#receive_time': 34.128321498}
wise-whiteOPโข14mo ago
wise-whiteOPโข14mo ago
Another reboot helped. @miklschmidt do you think I should start a new thread for that?
I think it's related, klipper recently made a bunch of changes to the usb communication, and i'm suspecting that might be the issue here.
You can try resetting klipper to before those changes and reflashing the boards. Here's how:
Then go to ratos.local/configure, click the action dropdown in the top right. Click "Flash all connected mcu's".
@_sebastianm If that fixes the issue we need to get Kevin involved.
(this is only relevant if you updated klipper within the last week)
wise-whiteOPโข14mo ago
Indeed. This happened after a full update.
If you didn't then your issues is likely a deteriorating USB cable or a dying toolboard. Could be due to lack of proper strainrelief at the USB connector.
wise-whiteOPโข14mo ago
Cable and toolboard are brand new
Then i'd definitely give the above a shot
Yeah that really points in the direction of a klipper issue
wise-whiteOPโข14mo ago
Will do and post an update within the next 90min ๐
Do I need to disconnect 24v and add the jumper on my ebb42 when I reflash it ?
nope it's all automated
It's the same thing that happens each time you update klipper
When klipper is already flashed to the board, you can programatically instruct it to enter the bootloader and flash it over usb via DFU.
That's how those work
The other thing is only necessary to get klipper onto a board initially
wise-whiteOPโข14mo ago
๐
wise-whiteOPโข14mo ago
wise-whiteOPโข14mo ago
homing and probing works now. print started successfully.
seems like that did the trick!
fuck
So i'm gonna have to explain to kevin that something is wrong with his code.
wise-whiteOPโข14mo ago
lol. I delete that thread for 5 bucks ๐
Ha, i'm not sure that's gonna do any of us any good ๐
I noticed in your log that your octopus was running an older version of klipper than your toolboard, i'm not sure if that could be part of the problem.
Is there a chance i can have you checkout master and reflash the boards again and see if the issues come back?
Would be:
It should flash the boards automatically, but just to be sure go ahead and flash them all from the configurator like last time.
wise-whiteOPโข14mo ago
Yes. Will try that tonight. will be on vacation for the next 7 days.
Much appreciated ๐
Just wanna cover all my bases before i go insinuating klipper jesus done goofed something ๐
wise-whiteOPโข14mo ago
just cause I am curious... which role does the octopus play here? the toolboard is attached directly to the raspi....
It's just a hunch, since the change was to usb buffering my thinking is that if one board is using double buffering and the other is not, then the klippy host may get confused.
wise-whiteOPโข14mo ago
ok. might also explain why unconnecting the webcam helped?
No that one is a mystery
Can't explain that :/
wise-whiteOPโข14mo ago
lol. ok. maybe the old powersupply was another issue that happened in parallel.
That would explain it yes, two separate issues.
wise-whiteOPโข14mo ago
doing it now..
wise-whiteOPโข14mo ago
sooo... I did as requested.. everything worked fine.... then I rebooted... did several homings without any problems... thought I do a last "stress test" and ran a full bed mesh... And... unfortunately.. on the last point, it failed again
wise-whiteOPโข14mo ago
wise-whiteOPโข14mo ago
wise-whiteOPโข14mo ago
Did the reset / downgrade again and ran a full bed mesh without any issues.
Aight this seems pretty convincing now, i'll try and bring it up with Kevin
wise-whiteOPโข14mo ago
Any idea why I seem to be the only one with that issue as of now?
You don't, @andyc5461 had similar issues and there's another 2 users over here: https://discord.com/channels/582187371529764864/1159249516264951818/1159249516264951818
@miklschmidt I just had a communication error with klipper
v0.11.0-279 on a fly sht 36 running via USB ... looking at the changelog that was before the changes
@Helge Keck ^
yeah -279 is quite old, they happened before this but mostly caused by other factors (undervolted pi, improperly secured connectors, shorts, etc etc. multiple causes), but it seems rather prevalent after the code changes.
I dont have the time right now to see if it gets worse after updating since I run dual toolboards on the idex branch and I am not entirely comfortable with updating that
Fair ๐
Mine is still doing it randomly, been running the warmu
warmup gantry macro on 500 repeats, seems to do this ok, but if I print 50% of the time it fails with the lost connection to mcu error, as Ive only just installed the EBB42 I thought it was something Ive done
Iโve even swapped out my pi 3B for a 4B and it still the same, 3b is old so thought usb ports might be dirty or corroded
I am running a 3B+ because my 4B died sudden death
This could just be a loose connection
So bought a new usb C cable from Amazon and replaced the 3DO one I was using, also did a reinstall of Ratos from scratch, have not updated klipper to the version that was giving the trouble but everthing else bar beacon is updated. with the new cable its still extemly touchy to the usb c socket on the toolboard, slightest touch and it will loose the connection to the toolboard. Finally managed to find a cable tying position with the cable tie both clamping the plug and pulling it towards the toolboard into the socket that seems to work, running the gantry warmup routine on repeat and it seems to be working for now, about to start a small print and I'll see what happens. Are the usb sockets usually this sensative / dodgy ? Maybe I have dry solder joint on the board around the usb c socket, if it gives trouble again I'll take it off and examine it under a microscope.
with the new cable its still extemly touchy to the usb c socket on the toolboard, slightest touch and it will loose the connection to the toolboard.Sounds like someone didn't do proper strain relief
That's why this section is in the docs
Im using this mount https://www.printables.com/model/553508-ebb42-lgx-lite-mount-for-eva-v3
Printables.com
EBB42 / LGX-Lite mount for EVA v3 by Tom Oinn | Download free STL m...
Compact mount for the EBB42 toolboard and LGX-Lite extruder for use on the EVA v3 tool-head | Download free 3D printable STL models
It has a mount bracket thats directly in line with the socket, Tom designed it this way, just seems the usb socket is a bit touchy, Ive now cable tied the plug to be pulling into the socket as well as against the bracket
What an odd choice of placement. Should work though, didn't you tie down the plug?
yes good and tight
So the the plug couldn't wiggle the USB connector/receptacle loose?
Doesn't take much to fracture those solder connections on the receptacle
but i i placed a finger on the socket it would error even though the plug was tightly mounted
That almost sounds more like something is shorting.. Really odd.
checked all the wiring and pins in the plugs onto the toolboard multiple times, all seem ok , I'm still thinking I might have a dry solder joint on the toolboard usb socket, but its printed a part so gona leave it alone ! the more I play and modify the more I break things.....
this arrangement works well for me
https://cdn.discordapp.com/attachments/750774627764142120/1161061980048719882/image.png
wise-whiteOPโข14mo ago
@miklschmidt is it confirmed now that my / our issues are related to the latest Klipper update?
Unfortunately no, i can't confirm it's the klipper changes :/
wise-whiteOPโข14mo ago
Oh. That means I have another issue?
There's a good chance of that yes. But i can't say for sure.
wise-whiteOPโข14mo ago
Weird that up/ downgrading toggled that behavior here
Yes, very. Unfortunately it's the only indication i have towards the klipper update being the cause.
wise-whiteOPโข14mo ago
Anything else I can do / provide ?
The only thing i can think of is test a different toolboard, but that's a fair bit of work, especially if you don't have one on hand ๐
I will let you know once i know more
Hmmmmmm could it be that 277 also had the issues already?
I always after a print is finished run into toolboard timer issues ๐
This happens after a print has been finished
Stats 242915.2: gcodein=0 mcu: mcu_awake=0.001 mcu_task_avg=0.000006 mcu_task_stddev=0.000004 bytes_write=202339974 bytes_read=34130105 bytes_retransmit=9 bytes_invalid=0 send_seq=3466763 receive_seq=3466763 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=180004104 toolboard: mcu_awake=0.002 mcu_task_avg=0.000013 mcu_task_stddev=0.000009 bytes_write=87754186 bytes_read=14059693 bytes_retransmit=9 bytes_invalid=0 send_seq=1540672 receive_seq=1540672 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=51630370 adj=-706097937 Octopus: temp=38.5 raspberry_pi: temp=54.0 heater_bed: target=0 temp=75.3 pwm=0.000 beacon: mcu_awake=0.000 mcu_task_avg=0.000000 mcu_task_stddev=0.000000 bytes_write=270123 bytes_read=8279498 bytes_retransmit=0 bytes_invalid=0 send_seq=42101 receive_seq=42101 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=16158589 adj=-883595196 beacon: coil_temp=45.7 toolboard: temp=36.9 Step: temp=31.1 chamber_temp: temp=28.9 chamber_heater: target=0 temp=27.9 pwm=0.000 sysload=20.94 cputime=27986.809 memavail=20372 print_time=38765.236 buffer_time=0.000 print_stall=53 extruder: target=0 temp=84.7 pwm=0.000 Resetting prediction variance 242914.269: freq=51630370 diff=-5557005841 stddev=64000.000 Resetting prediction variance 242915.209: freq=38286808 diff=-2334349439 stddev=64000.000 Resetting prediction variance 242914.135: freq=180004104 diff=-34359730188 stddev=4767.519 Resetting prediction variance 242915.206: freq=83495926 diff=-11007959081 stddev=180000.000 Resetting prediction variance 242916.185: freq=64661881 diff=-6338854173 stddev=180000.000 Resetting prediction variance 242916.185: freq=16158589 diff=-448625164 stddev=32000.000
No, didn't happen until 400+
That doesn't look like an actual error?
Can send you the whole log ๐ there is way more in it that leads to the sudden MCU Overload (when idle)
buffer_time=0.000 print_stall=53 extruder: target=0 temp=80.8 pwm=0.000
MCU 'toolboard' shutdown: Rescheduled timer in the past
clocksync state: mcu_freq=64000000 last_clock=2487435339675 clock_est=(242730.722 2481902273919 30853272.528) min_half_rtt=0.000106 min_rtt_time=242314.766 time_avg=242730.722(10480.871) clock_avg=2481902273919.413(323369164128.327) pred_variance=4096000000.000 clock_adj=(45205.313 -404167441.324)
Dumping serial stats: bytes_write=87754244 bytes_read=14061755 bytes_retransmit=9 bytes_invalid=0 send_seq=1540679 receive_seq=1540679 retransmit_seq=2 srtt=0.002 rttvar=0.003 rto=0.025 ready_bytes=0 upcoming_bytes=0
Dumping send queue 100 messages
Sent 0 242681.052071 242681.052071 62: seq: 13, queue_step oid=7 interval=8894 count=26 add=64, queue_step oid=7 interval=10871 count=15 add=275, queue_step oid=7 interval=15640 count=7 add=1146, queue_step oid=7 interval=25811 count=3 add=5432, queue_step oid=7 interval=62089 count=1 add=0, set_next_step_dir oid=7 dir=1, queue_step oid=7 interval=90970 count=2 add=-30996, queue_step oid=7 interval=34671 count=4 add=-4523
Sent 1 242681.052071 242
Rescheduled timer in the past is a MCU (in this case toolboard) speed issue indeed.
Odd though, i'll admit
All i can suggest for these is take them to klipper discord, especially if you suspect it's because of new code
I guess in any case you could argue that this should never ever happen when idle.
Something must be wonky
Exactly ... during print all works good ... print finished .. kaboom ๐
Anyway print running now and disabled a few things as test if not then i go pre 277 version.
Iโd update to the most recent version, might be fixed in master and you canโt report it as a bug on an old commit
wise-whiteOPโข14mo ago
I have been searching the Klipper discord for this issue and there are multiple explanation attemptsโฆ some say this behavior comes from deep within the Linux kernel, others say it has to do with the probe not being connected to the same mcu as the z steppers. https://discord.com/channels/431557959978450984/431559228050636820/1150102062474993715
And they say that trsync should be raised to 0.050ms (from 0.025)โฆ. But you (mikl) said here somewhere that this is a bad idea as it reduces the z accuracy
doubling the sync window basically allows for the various boards to deviate. 5 hundredths of a second are a lot if you move at high accelerations with 100mm/sec, which would be slow for the average v-core
wise-whiteOPโข14mo ago
Actually I donโt know how fast I do the homing (can check tomorrow), but we are talking about a delta of 2.5 hundredths. However, I am just quoting the guys from the Klipper discord and discourse. E.g https://klipper.discourse.group/t/canbus-communication-timeout-while-homing-z/3741
Klipper
CANBUS Communication timeout while homing Z
Whenever I let me printer heat soak for an hour before printing, the probe always gives a โCommunication timeout while homing Zโ error. Like it will fail 5 times in a row. Restarting the machine with firmware restart immediately fixes the problem. When I try running another print after that one completes, it fails again. Restart and it works aga...
wise-whiteOPโข14mo ago
The weird thing is that this communication timeout seems to occur for everyone only while homing/ probing. Never when printing or doing a resonance test, which I consider to be more stressing for the system.
I just had "no probe after full move" until I completely reset the machine, but only during prints, manually triggering a probe via mainsail worked fine. I really don't think this is a "weird Linux issue"
It's just the CAN fraction trying to piss off the USB fraction.
Get out the Tar and Feathers!!!!!!!!!!!!!!!!!!
hmmm? If I wanted to feel smugly superior, I would convert this machine to RRF and run proper CAN (-:
This isn't anywhere close to an issue on USB. The doubling of the sync timeout is only relevant for CAN really. USB is nearly instant, close to zero latency (not the case with CAN, on the contrary).
I've run a toolboard on my v-core for a year at this point, never had a single problem with this.
Even before that, the old voron's used to run 2x RAMPS, which is exactly the same setup just waaaay slower MCU's, it worked fine.
not everyone. Far from it, in fact the vast majority use this without any issues. It's something else.
Also you gotta remember, the beacon inherently works this way. It's its own usb MCU.
Thousands are running beacons without this issue.
This is also a thread about CANBUS not usb btw ๐
However, there's been a widespread "issue" for at least a year at this point, where klipper will sometimes pause during a bed mesh right after retracting from a probe dive randombly. It didn't cause any problems, as it was a pause during no critical operations, so it was just a small delay. I actually observed this in one of vez's videos ages ago (where he even pointed it out). If that short pause has been turned into an error, that could also explain it.
wise-whiteOPโข14mo ago
@miklschmidt today that issue happened again - even with the "downgraded" / older klipper version. That means you were right, it seems not to be related to that SW update.
It seems to help to deactivate crowsnest during the probing / homing procedure or just unplug the camera....
Hmmm..
Do you have another pi you could try? (you can just swap the SD cards, so fortunately that part is easy).
wise-whiteOPโข14mo ago
I dont, but I will try an active usb hub tomorrow. no idea if it helps
generous-apricotโข14mo ago
Similar/Same problem for me. Oct 1.1, EBB 42 connected via USB/USBC. Been working for about 10 months. Recent update I get the communication timeout while homing Z. It is only Z. X and Y always home. I have unconfigured and disconnected my camera, no help. I then disconnected my screen that was USB powered from my Pi3 (needs to be powered from the Pi because it provides the HID device back to the Pi and it now works. Plug in the screen and problem homing Z.
wise-whiteOPโข14mo ago
I think I have a solution / workaround for this. At least for me.
Yesterday, I bought a raspi4 and swapped it for my old raspi 3b+. At the beginning, nothing changed... I ran into the same issue as before. Then I googled a bit and found this link and tried the described solution
https://github.com/Klipper3d/klipper/issues/4861#issuecomment-961201108
long story short - I attached the octopus and the ebb42 to the USB3.0 ports and the webcam to the usb 2.0 ports. Since then it is running stable. I repeatedly ran a bedmesh (did probably over 400 probing points) with no issues so far. Before, I couldnt even complete one full bed mesh.
@_sebastianm that won't work for people with multiple toolboards, but I will move my toolboard with the Z probe to USB 3
wise-whiteOPโข14mo ago
It does with a usb hub, doesnโt it ? But for the vast majority this should be doable I guess
I don't have a USB hub. I am not using the HDMI so I will look into disabling that completely
wise-whiteOPโข14mo ago
wise-whiteOPโข14mo ago
๐คฌ
@miklschmidt The connection problems I was having have been solved by a new EBB42 board, Whilst waiting for it to arrive Ive also done a complete rewire and fitted an electrical panel ( was all screwed to a piece of plywood when first built) Ive have tried to get things running with the old EBB but was still getting disconnect issues, swapped to the new one now its arrived and it all works fine so far.
Sounds good, definitely points to a strain relief issue then. They really should make a toolboard with a JST connection (or something else) for the USB, makes it a bit harder to fuck up.
Are you using endstops?.. in that case if it was USB/communication related you should have the exact same issues for X.
Hyperlapse?... Could you try with a vanilla RatOS configuration / setup (well aside from the necessary changes to make your printer print).
wise-whiteOPโข14mo ago
You mean disabling the webcam/ timelapse? I havenโt changed or installed any plugins or extensions .. this is pretty much vanilla ratos.
Hyperlapse is Timelapse without parking the print head
Ah so you basically just enabled the timelapse module? But yeah, try disabling it.
wise-whiteOPโข13mo ago
Hello together, unfortunately, I have to "reopen" this thread. In the last weeks I swapped my raspberry 3b for a new raspi 4, I got a new EBB42 and I bought a new usb cable. But still, I randomly run into these errors.
Now, I have no clue what to do or how to proceed.
I can reproduce this error when I let my vcore do a constant loop of bed meshing... sometimes it fails after 1 hour, sometimes after 4 hours, and very rarely it fails before a print.
wise-whiteOPโข12mo ago
update: found this interesting topic
https://github.com/mainsail-crew/crowsnest/issues/202#issuecomment-1809054618
I know the setup from this user is not 100% comparable, but it seems like they narrowed it down to the crowsnest custom-flag "--camera-force_active=1" which has been added as default a few weeks ago in one of their updates....
Will try setting it to "0" and report back here...
GitHub
Crowsnest USB bandwidth requirements causing CAN bus timeouts ยท Iss...
What happened On my modded Voron 2.4, when having cameras enabled with high resolution (in my case, a chamber and a nozzle camera, each running at 720p), any devices that use CAN bus for communicat...
Good find and please let us know, I've been having the same issues on Manta 8 + CM4 and I am only running one cam at 15 frames rate.
wise-whiteOPโข12mo ago
Added --camera-force_active=0 to my crowsnest.conf, but not sure if its "working". Webcam is still active, although the printer has been idling for 30min now....
shouldnt the camera go to "sleep"?
I don't believe there is a sleep mode, at least I have never encountered it in Crowsnest 3 or 4.