Razberry--Strange device communication failures

Discussions about Z-Way software and Z-Wave technology in general
Post Reply
cdogg76
Posts: 49
Joined: 28 Sep 2014 23:13

Razberry--Strange device communication failures

Post by cdogg76 »

Hello,

I've got a Razberry with 45 Z-wave devices throughout my house, mostly mains-powered light switches, as well as about 5 outlet plugs. Most are Z-wave, but some are Z-wave+.

I've recently started having an issue where a large number (~20) devices will show as failed. I fiddle with the system using the Expert UI for a while, and somehow things start working again. However, within a day or so (sometimes much sooner), they fail again. I believe it's the same set of devices that show as failed. Strangely, sometimes even when they show as failed, they still receive commands and update. I'm seeing in the job queue that these devices will even get the NoOperation ("Delivered"), but it'll still log a job that says the node is failed. This seems very strange to me.

I do have a script that polls many of the switches, as they are old and do not offer instant update. I made the script max out at 15 jobs in the queue, so it won't continuously flood the network if it gets stalled. Normally, this runs fine. I mention this as I have a hunch that increased network activity may trigger the failure. Most recently, things were running mostly fine, but a couple nodes were require retransmission (but succeeding, or would get marked failed and then soon after marked operating from the automatic NoOp check). I started a network reorganization that worked great for the first 10 or so nodes, then everything started failing after that and I was in the bad state again.

In previous cases of trying to troubleshoot this, I thought maybe the battery powered motion sensors I have may be causing issues, so I excluded those. Upon excluding a device, it seems Z-way does a number of things and it somehow would restore all functionality to normal. I also got this to happen by re-including existing switches that are near the Razberry. But I'd really like to get to the bottom of the failure, especially since I don't know a reliable way to recover remotely.

Finally, something strange has happened where I can't even turn on node inclusion or exclusion via the Expert UI. Those buttons simply do not do anything (seem to be disabled). I've never experienced that before and I'm not sure why. The state even survives full power cycles of the Raspberry Pi. So, I also need to figure out what is wrong there.

Thanks for any help!
cdogg76
Posts: 49
Joined: 28 Sep 2014 23:13

Re: Razberry--Strange device communication failures

Post by cdogg76 »

To add to this, it's the same set of devices that end up showing as 'failed.' Strangely, in addition to the NoOp being delivered to these but still registering as failed, I can also send commands to these devices and they actually execute, but Z-way continues to send NoOps thereafter which get delivered and it still registers them as failed nodes.

So, things 'work,' but the problem is that all those NoOps and also slow times to deliver result in a very sluggish network.

Any thoughts?

Thanks!
User avatar
PoltoS
Posts: 7565
Joined: 26 Jan 2011 19:36

Re: Razberry--Strange device communication failures

Post by PoltoS »

We need the log. It is also with to test the Route Map available in the latest v3.0.0-rc19 and upgrade the firmware of your RaZberry to 5.36. this will give you a good understanding of routes and failures
cdogg76
Posts: 49
Joined: 28 Sep 2014 23:13

Re: Razberry--Strange device communication failures

Post by cdogg76 »

Hello,

I'll work on getting an appropriate log.


Can you provide info or links on how to upgrade the firmware? I found a page listing out release notes, but only up to 5.32 and no links to actually download them.

Also, how can I upgrade to 3.0.0-rc19 build?

Thanks!
cdogg76
Posts: 49
Joined: 28 Sep 2014 23:13

Re: Razberry--Strange device communication failures

Post by cdogg76 »

Hello,

Ok, I was running another network reorganization and the problem manifested again. Here's an example of a NoOp that shows delivered, but it still says the node is failed:

Code: Select all

[2019-01-16 10:11:18.010] [D] [zway] SENDING (cb 0x11): ( 01 08 00 13 0E 01 00 25 11 DF )
[2019-01-16 10:11:18.018] [D] [zway] RECEIVED ACK
[2019-01-16 10:11:18.817] [D] [zway] RECEIVED: ( 01 04 01 13 01 E8 )
[2019-01-16 10:11:18.817] [D] [zway] SENT ACK
[2019-01-16 10:11:18.817] [D] [zway] Delivered to Z-Wave stack
[2019-01-16 10:11:22.660] [D] [zway] Job 0x13: deleted from queue
[2019-01-16 10:11:22.671] [D] [zway] Job 0x62: deleted from queue
[2019-01-16 10:11:22.914] [D] [zway] RECEIVED: ( 01 05 00 13 11 01 F9 )
[2019-01-16 10:11:22.914] [D] [zway] SENT ACK
[2019-01-16 10:11:22.914] [I] [zway] Job 0x13 (NoOperation): Not delivered to recipient
[2019-01-16 10:11:22.915] [D] [zway] SENDING (cb 0x12): ( 01 08 00 13 0E 01 00 25 12 DC )
[2019-01-16 10:11:22.921] [D] [zway] RECEIVED ACK
[2019-01-16 10:11:23.721] [D] [zway] RECEIVED: ( 01 04 01 13 01 E8 )
[2019-01-16 10:11:23.722] [D] [zway] SENT ACK
[2019-01-16 10:11:23.722] [D] [zway] Delivered to Z-Wave stack
[2019-01-16 10:11:23.996] [D] [zway] Job 0x13: deleted from queue
[2019-01-16 10:11:24.007] [D] [zway] Job 0x62: deleted from queue
[2019-01-16 10:11:24.111] [D] [zway] Job 0x13: deleted from queue
[2019-01-16 10:11:24.120] [D] [zway] RECEIVED: ( 01 05 00 13 12 00 FB )
[2019-01-16 10:11:24.121] [D] [zway] SENT ACK
[2019-01-16 10:11:24.121] [I] [zway] Job 0x13 (NoOperation): Delivered
[2019-01-16 10:11:24.121] [D] [zway] SendData Response with callback 0x12 received: received by recipient
[2019-01-16 10:11:24.121] [D] [zway] SETDATA devices.14.data.lastSendInternal = **********
[2019-01-16 10:11:24.121] [D] [zway] SETDATA devices.14.data.lastSend = 5733 (0x00001665)
[2019-01-16 10:11:24.121] [D] [zway] Job 0x13 (NoOperation): success
[2019-01-16 10:11:24.121] [I] [zway] Adding job: Check if node is failed
[2019-01-16 10:11:24.121] [I] [zway] Removing job: NoOperation
[2019-01-16 10:11:24.121] [D] [zway] Job 0x62: deleted from queue
[2019-01-16 10:11:24.122] [D] [zway] SENDING: ( 01 04 00 62 0E 97 )
[2019-01-16 10:11:24.123] [D] [zway] RECEIVED ACK
[2019-01-16 10:11:24.124] [D] [zway] RECEIVED: ( 01 04 01 62 01 99 )
[2019-01-16 10:11:24.124] [D] [zway] SENT ACK
[2019-01-16 10:11:24.124] [D] [zway] SETDATA devices.14.data.isFailed = True
[2019-01-16 10:11:24.124] [I] [zway] Job 0x62 (Check if node is failed): Node 14 is failed
[2019-01-16 10:11:24.124] [D] [zway] Job 0x62 (Check if node is failed): success
Is there a reference for interpreting the sent/received data?

Thanks!
cdogg76
Posts: 49
Joined: 28 Sep 2014 23:13

Re: Razberry--Strange device communication failures

Post by cdogg76 »

Ok, I was poking through the log and there are quite a few "Got frame from device XX" lines, many that show the device isn't registered, and others receiving command classes that are not registered or are not supported. Might these be causing issue?

Some of the unregistered device numbers have never been used on my network (i.e., 66, 76, 106, 131, 170). I've never gotten that high. Others are within the range of what I have. Am I getting interference from a neighbor? If so, might that be causing my problems? Or is it radio interference corrupting the frames?

Log attached for about an hour of time. Not sure what the attachment size limit is, but it said a larger period of about 3 hours was too large at under 3MB.

Thanks!
Attachments
z-way-server.zip
(548.78 KiB) Downloaded 182 times
User avatar
PoltoS
Posts: 7565
Joined: 26 Jan 2011 19:36

Re: Razberry--Strange device communication failures

Post by PoltoS »

I see a lot of packet loss and CRC errors in your network.

I would move nodes closer and repeat thests. Please also look in Network Statistics page, Timing Info and in Route Map - all should show you problems.
cdogg76
Posts: 49
Joined: 28 Sep 2014 23:13

Re: Razberry--Strange device communication failures

Post by cdogg76 »

I *think* the issue was actually a bad light switch. I had one that wasn't responding and needed to be replaced. Once I replaced it, the network has been stable. Hopefully that solves it!
Post Reply