send queue hanging (z-way-server v2.0.0, v2.0.1rc7)

Discussions about RaZberry - Z-Wave board for Raspberry computer
Mirar
Posts: 113
Joined: 19 Oct 2014 16:54
Location: Stockholm

Re: send queue hanging (z-way-server v2.0.0, v2.0.1rc7)

Post by Mirar »

The story continues on the gen 5, and on a new pi, and on a new sd card, and on a new power supply.

This morning the gen5 hanged:

Code: Select all

[2015-02-27 07:56:54.586] [D] [zway] SENT ACK
[2015-02-27 07:56:54.587] [D] [zway] SETDATA devices.3.data.lastReceived = 0 (0x00000000)
[2015-02-27 07:56:54.587] [D] [zway] SETDATA devices.3.instances.0.commandClasses.49.data.1.deviceScale = 0 (0x00000000)
[2015-02-27 07:56:54.587] [D] [zway] SETDATA devices.3.instances.0.commandClasses.49.data.1.scale = 0 (0x00000000)
[2015-02-27 07:56:54.588] [D] [zway] SETDATA devices.3.instances.0.commandClasses.49.data.1.val = 23.400000
[2015-02-27 07:56:54.589] [D] [zway] SETDATA devices.3.instances.0.commandClasses.49.data.1.scaleString = "<C2><B0>C"
[2015-02-27 07:56:54.589] [D] [zway] SETDATA devices.3.instances.0.commandClasses.49.data.1 = Empty
[2015-02-27 07:56:56.123] [D] [zway] RECEIVED: ( 01 0C 00 04 00 0F 06 31 05 01 22 00 E8 01 )
[2015-02-27 07:56:56.123] [D] [zway] SENT ACK
[2015-02-27 07:56:56.123] [D] [zway] SETDATA devices.15.data.lastReceived = 0 (0x00000000)
[2015-02-27 07:56:56.124] [D] [zway] SETDATA devices.15.instances.0.commandClasses.49.data.1.deviceScale = 0 (0x00000000)
[2015-02-27 07:56:56.124] [D] [zway] SETDATA devices.15.instances.0.commandClasses.49.data.1.scale = 0 (0x00000000)
[2015-02-27 07:56:56.124] [D] [zway] SETDATA devices.15.instances.0.commandClasses.49.data.1.val = 23.200001
[2015-02-27 07:56:56.126] [D] [zway] SETDATA devices.15.instances.0.commandClasses.49.data.1.scaleString = "<C2><B0>C"
[2015-02-27 07:56:56.126] [D] [zway] SETDATA devices.15.instances.0.commandClasses.49.data.1 = Empty
[2015-02-27 07:56:57.304] [D] [zway] RECEIVED: ( 01 09 00 04 00 03 03 80 03 64 15 )
[2015-02-27 07:56:57.304] [D] [zway] SENT ACK
[2015-02-27 07:56:57.304] [D] [zway] SETDATA devices.3.data.lastReceived = 0 (0x00000000)
[2015-02-27 07:56:57.304] [D] [zway] SETDATA devices.3.instances.0.commandClasses.128.data.history.100 = 1425023817 (0x54f02349)
[2015-02-27 07:56:57.304] [D] [zway] SETDATA devices.3.instances.0.commandClasses.128.data.last = 100 (0x00000064)
[2015-02-27 07:56:58.977] [I] [zway] Adding job: Meter Get (v2)
[2015-02-27 07:56:58.978] [I] [zway] Adding job: Meter Get (v2)
Here it crashed.

Code: Select all

[2015-02-27 07:57:04.237] [D] [zway] Job 0x13: deleted from queue
[2015-02-27 07:57:04.278] [D] [zway] Job 0x13: deleted from queue
[2015-02-27 07:57:04.278] [D] [zway] Job 0x13: deleted from queue
[2015-02-27 07:57:04.504] [D] [zway] Job 0x13: deleted from queue
[2015-02-27 07:57:04.504] [D] [zway] Job 0x13: deleted from queue
[2015-02-27 07:57:08.097] [I] [zway] Adding job: SwitchBinary Set
[2015-02-27 07:57:08.097] [D] [zway] Job 0x13 (SwitchBinary Get): removing duplicate
[2015-02-27 07:57:08.979] [D] [zway] Job 0x13 (Meter Get (v2)): removing duplicate
[2015-02-27 07:57:08.979] [D] [zway] Job 0x13 (Meter Get (v2)): removing duplicate
[2015-02-27 07:57:09.531] [D] [zway] Job 0x13: deleted from queue
[2015-02-27 07:57:09.582] [D] [zway] Job 0x13: deleted from queue
[2015-02-27 07:57:13.506] [I] [zway] Job 0x13 (SwitchBinary Set): No callback received before timeout
[2015-02-27 07:57:13.506] [I] [zway] Job 0x13 (SwitchBinary Set): Cancelling
[2015-02-27 07:57:13.506] [I] [zway] Adding job: Abort data send
[2015-02-27 07:57:13.506] [D] [zway] SENDING: ( 01 03 00 16 EA )
[2015-02-27 07:57:13.709] [I] [zway] Job 0x16 (Abort data send): No ACK received before timeout
[2015-02-27 07:57:13.709] [D] [zway] SENDING: ( 01 03 00 16 EA )
[2015-02-27 07:57:13.912] [I] [zway] Job 0x16 (Abort data send): No ACK received before timeout
[2015-02-27 07:57:13.912] [D] [zway] SENDING: ( 01 03 00 16 EA )
[2015-02-27 07:57:14.119] [I] [zway] Job 0x16 (Abort data send): No ACK received before timeout
[2015-02-27 07:57:14.119] [W] [zway] Job 0x16 (Abort data send) dropped: too many resends
[2015-02-27 07:57:14.119] [D] [zway] Job 0x16 (Abort data send): fail
[2015-02-27 07:57:14.119] [I] [zway] Removing job: Abort data send
[2015-02-27 07:57:14.119] [D] [zway] SENDING (cb 0xe3): ( 01 0A 00 13 13 03 25 01 00 25 E3 14 )
[2015-02-27 07:57:14.322] [I] [zway] Job 0x13 (SwitchBinary Set): No ACK received before timeout
[2015-02-27 07:57:14.322] [D] [zway] SENDING (cb 0xe3): ( 01 0A 00 13 13 03 25 01 00 25 E3 14 )
[2015-02-27 07:57:14.528] [I] [zway] Job 0x13 (SwitchBinary Set): No ACK received before timeout
[2015-02-27 07:57:14.529] [W] [zway] Job 0x13 (SwitchBinary Set) dropped: too many resends
[2015-02-27 07:57:14.529] [D] [zway] Job 0x13 (SwitchBinary Set): fail
The razberry crashed so hard I had to power cycle it.

But more interesting things happened: All lights came on, and now not all nodes are functional. I will manually go around and reset them. Most noticable is the Philio Relay saying 0x17.

Could it have been some sort of alarm event from smoke or door detectors crashing everything? The smoke detectors both immediately complained that the z-wave central node was missing. (And woke me up.)
pofs
Posts: 688
Joined: 25 Mar 2011 19:03

Re: send queue hanging (z-way-server v2.0.0, v2.0.1rc7)

Post by pofs »

This looks more like chip hanging rather than software queue hanging.

Please send us a more complete log fragment for analysis.
Mirar
Posts: 113
Joined: 19 Oct 2014 16:54
Location: Stockholm

Re: send queue hanging (z-way-server v2.0.0, v2.0.1rc7)

Post by Mirar »

pofs wrote:This looks more like chip hanging rather than software queue hanging.

Please send us a more complete log fragment for analysis.
Yep, I'm pretty sure the gen5 hanged there. There is nothing more interesting in the log, what I could spot, but I did save it. I can upload it if you think there might be something of value, but it seems to have been something internal in the chip that z-way-server never saw... :(

Interestingly, it wasn't the only thing that hanged - three relays also hanged and the Philio also needed a hard reboot. (None of those are gen5.)

I suspect more bugs on the side of Sigma... :/

The chip couldn't be reset with serial communications this time.
User avatar
PoltoS
Posts: 7649
Joined: 26 Jan 2011 19:36

Re: send queue hanging (z-way-server v2.0.0, v2.0.1rc7)

Post by PoltoS »

Looks really strange. Please give us a list of affected devices and what is the smoke (alarm trigger) device manufacturer.

Philio are all on 4th or 5th gen They share pretty similar code base on the firmware side (3rd gen is very different).`
Mirar
Posts: 113
Joined: 19 Oct 2014 16:54
Location: Stockholm

Re: send queue hanging (z-way-server v2.0.0, v2.0.1rc7)

Post by Mirar »

They are Fibaro Smoke Detectors.

What's worse is that the razberry chip never seemed to recover. It couldn't build routing tables, the routing map was all red except the left column... the rest of the network worked fine, however (switches etc).

After restoring a backup (I can't tell how much hard resets that does and how much comes from the backup), routing tables and network organisation looks normal again.

I can't really interpret that as something else than that the chip did something really bad.

The Pi 2 with Razberry is behind an UPS, so there's no power fluctuation there. It's only driving the Razberry. The Pi itself went on fine, no hiccup and nothing interesting in kernel logs...
Post Reply