Page 8 of 16

Re: job queue runs full after upgrade to v4.1.0

Posted: 03 Jul 2023 08:48
by johanneslarsson
Finally its in a hanged state, again. I'll try to share the logs later today

Re: job queue runs full after upgrade to v4.1.0

Posted: 03 Jul 2023 10:04
by johanneslarsson
Email sent with logs and enabled remote support access.

I also see this when I run strace, unsure if its helpful. But it could indicate that process is stuck in a sleep?

Code: Select all

root@raspberrypi:~# strace -p 475
strace: Process 475 attached
[ Process PID=475 runs in 32 bit mode. ]
strace: WARNING: Proper structure decoding for this personality is not supported, please consider building strace with mpers support enabled.
restart_syscall(<... resuming interrupted io_setup ...>) = 0
clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=17844176517316214784}, 0xffd16628) = 0
clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=17844176517316214784}, 0xffd16628) = 0
clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=17844176517316214784}, 0xffd16628) = 0
clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=17844176517316214784}, 0xffd16628) = 0
clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=17844176517316214784}, 0xffd16628) = 0

queue screenshot
queue screenshot
Screenshot 2023-07-03 at 09.09.17.png (224.78 KiB) Viewed 6703 times

Re: job queue runs full after upgrade to v4.1.0

Posted: 03 Jul 2023 17:23
by johanneslarsson
The full queue as json

Code: Select all

[[0.20000000298023224, [0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 9, "Battery Get", null, [9, 2, 128, 2, 5], 4030764512, []], [20.100000381469727, [3, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 3, "Meter Get (v2/3)", "Not delivered to recipient\nNot delivered to recipient", [3, 3, 50, 1, 0, 37], 4048551728, []], [0.20000000298023224, [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0], 10, "Time Parameters Set", null, [10, 13, 86, 1, 139, 1, 7, 231, 7, 3, 3, 0, 0, 164, 234, 5], 4048364848, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 2, 0, 0], 13, "SwitchMultilevel Set", null, [13, 8, 108, 1, 139, 4, 38, 1, 255, 255, 37], 4048294280, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 4, "SwitchBinary Set", null, [4, 3, 37, 1, 255, 37], 4048354400, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 4, "SwitchBinary Get", null, [4, 2, 37, 2, 37], 4048421472, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 2, 0, 0], 13, "SwitchMultilevel Set", null, [13, 8, 108, 1, 140, 4, 38, 1, 99, 255, 37], 4030761496, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 2, "SwitchBinary Set", null, [2, 3, 37, 1, 0, 37], 4048462416, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 2, "SwitchBinary Set", null, [2, 3, 37, 1, 255, 37], 4030783320, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 2, "SwitchBinary Get", null, [2, 2, 37, 2, 37], 4048401256, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 2, 0, 0], 13, "SwitchMultilevel Set", null, [13, 8, 108, 1, 141, 4, 38, 1, 99, 255, 37], 4030779528, []], [0.20000000298023224, [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], 0, "Get statistics gathered by the Z-Wave protocol", null, [], 4030842608, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 2, "SensorMultilevel V5 Get", null, [2, 4, 49, 4, 4, 0, 37], 4048439232, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 2, "Meter Get (v2/3)", null, [2, 3, 50, 1, 0, 37], 4048456584, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 2, "Meter Get (v2/3)", null, [2, 3, 50, 1, 16, 37], 4048261632, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 2, "Alarm Get (v3)", null, [2, 5, 113, 4, 0, 8, 0, 37], 4048263672, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 3, "Meter Get (v2/3)", null, [3, 3, 50, 1, 0, 37], 4048265752, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 3, "Meter Get (v2/3)", null, [3, 3, 50, 1, 16, 37], 4048268168, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 3, "Alarm Get (v3)", null, [3, 5, 113, 4, 0, 8, 0, 37], 4048301880, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 4, "SensorMultilevel V5 Get", null, [4, 4, 49, 4, 4, 0, 37], 4048283208, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 4, "Meter Get (v2/3)", null, [4, 3, 50, 1, 0, 37], 4030778984, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 4, "Meter Get (v2/3)", null, [4, 3, 50, 1, 16, 37], 4048281056, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 128, 0, 0], 4, "Alarm Get (v3)", null, [4, 5, 113, 4, 0, 8, 0, 37], 4030814984, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 2, 0, 0], 13, "Meter Get (v2/3)", null, [13, 3, 50, 1, 0, 37], 4030842680, []], [0.20000000298023224, [0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 2, 0, 0], 13, "Meter Get (v2/3)", null, [13, 3, 50, 1, 16, 37], 4048255416, []], [0.20000000298023224, [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], 0, "Get background noise level", null, [], 4048310600, []]]

Re: job queue runs full after upgrade to v4.1.0

Posted: 05 Jul 2023 12:41
by thomas@unverzagt.org
PoltoS wrote:
03 Jul 2023 01:24
@thomas please provide your log file and the queue inspector view (screenshot or better save the html page) (click on Jobs on the top of expert UI)
Image

https://1drv.ms/i/s!AtbxelMsNXH-sUdUKzY ... C?e=axQKZC



Device 76 seems to be the bad guy (probably out of range during "set" command)
It became evident when I tried to switch on device 56 (Fibaro Wall Plug).

It is curious because many inbound things still work, like window contacts.

BR
Thomas

Re: job queue runs full after upgrade to v4.1.0

Posted: 06 Jul 2023 00:56
by PoltoS
Good news! The issue was reproduced and located.

Why was it so hard to find and reproduce in our lab? And how it passed our release tests.

The issue happened when a Security S0 encrypted packet was not delivered three times in a row right after receiving a Nonce Report (which is very rare). The clean-up of the undelivered Security S0 packet was incorrect, making the queue stuck. The node became undelivered due to routing failures that happen from time to time in real life.

We are preparing the fix. Thank you all for your support.

Re: job queue runs full after upgrade to v4.1.0

Posted: 06 Jul 2023 03:45
by PoltoS
Please check this version: z-way-4.1.0-27-g589c9812-lws16_armhf.deb

Re: job queue runs full after upgrade to v4.1.0

Posted: 06 Jul 2023 09:12
by johanneslarsson
PoltoS wrote:
06 Jul 2023 03:45
Please check this version: z-way-4.1.0-27-g589c9812-lws16_armhf.deb
Installed it now

Re: job queue runs full after upgrade to v4.1.0

Posted: 06 Jul 2023 12:49
by J.nissen
Installed
PKG=z-way-4.1.0-27-g589c9812-lws16_armhf.deb && wget -O /tmp/$PKG https://storage.z-wave.me/z-way-server/$PKG && sudo dpkg -i /tmp/$PKG

Re: job queue runs full after upgrade to v4.1.0

Posted: 07 Jul 2023 09:07
by J.nissen
Hello PoltoS,

it doesn´t work
2023-07-07 07_59_41-Z-Wave Expert UI – Mozilla Firefox.png
2023-07-07 07_59_41-Z-Wave Expert UI – Mozilla Firefox.png (50.51 KiB) Viewed 6532 times

Code: Select all

[2023-07-07 00:00:00.020] [I] [zway] Job 0x13 (Nonce Get to node 55): Not delivered to recipient
[2023-07-07 00:00:00.021] [D] [zway] SENDING (cb 0xa4): ( 01 09 00 13 37 02 98 40 25 A4 89 )
[2023-07-07 00:00:00.025] [D] [zway] RECEIVED ACK
[2023-07-07 00:00:00.026] [D] [zway] RECEIVED: ( 01 04 01 13 01 E8 )
[2023-07-07 00:00:00.026] [D] [zway] SENT ACK
[2023-07-07 00:00:00.026] [D] [zway] Delivered to Z-Wave stack
[2023-07-07 00:00:00.031] [D] [zway] RECEIVED: ( 01 1D 00 13 A4 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 57 )
[2023-07-07 00:00:00.031] [D] [zway] SENT ACK
[2023-07-07 00:00:00.031] [I] [zway] Job 0x13 (Nonce Get to node 55): Not delivered to recipient
[2023-07-07 00:00:00.032] [D] [zway] SENDING (cb 0xa5): ( 01 09 00 13 37 02 98 40 25 A5 88 )
[2023-07-07 00:00:00.035] [D] [zway] RECEIVED ACK
[2023-07-07 00:00:00.036] [D] [zway] RECEIVED: ( 01 04 01 13 01 E8 )
[2023-07-07 00:00:00.036] [D] [zway] SENT ACK
[2023-07-07 00:00:00.036] [D] [zway] Delivered to Z-Wave stack
[2023-07-07 00:00:00.041] [D] [zway] RECEIVED: ( 01 1D 00 13 A5 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 56 )
[2023-07-07 00:00:00.041] [D] [zway] SENT ACK
[2023-07-07 00:00:00.041] [I] [zway] Job 0x13 (Nonce Get to node 55): Not delivered to recipient
[2023-07-07 00:00:00.042] [I] [zway] Job 0x13 (Nonce Get to node 55): Removing job due to too much retransmitions
[2023-07-07 00:00:00.042] [I] [zway] Adding job: Check if node is failed to node 55
[2023-07-07 00:00:00.042] [D] [zway] Job 0x13 (Nonce Get to node 55): fail
[2023-07-07 00:00:00.042] [I] [zway] Node 55:0 CC Security: sending Nonce Get
[2023-07-07 00:00:00.042] [D] [zway] SETDATA devices.55.instances.0.commandClasses.152.data.rNonceAckWait = **********
[2023-07-07 00:00:00.042] [I] [zway] Adding job: Nonce Get to node 55
[2023-07-07 00:00:00.042] [I] [zway] Removing job: Nonce Get to node 55
[2023-07-07 00:00:00.042] [D] [zway] SENDING: ( 01 04 00 62 37 AE )
[2023-07-07 00:00:00.045] [D] [zway] RECEIVED ACK
[2023-07-07 00:00:00.046] [D] [zway] RECEIVED: ( 01 04 01 62 00 98 )
[2023-07-07 00:00:00.046] [D] [zway] SENT ACK
[2023-07-07 00:00:00.046] [D] [zway] SETDATA devices.55.data.isFailed = False
[2023-07-07 00:00:00.046] [I] [zway] Job 0x62 (Check if node is failed to node 55): Node 55 is operating
[2023-07-07 00:00:00.046] [D] [zway] Job 0x62 (Check if node is failed to node 55): success
[2023-07-07 00:00:00.046] [D] [zway] SETDATA devices.55.data.failureCount = 0 (0x00000000)
[2023-07-07 00:00:00.046] [D] [zway] SETDATA devices.55.data.lastSendInternal = **********
[2023-07-07 00:00:00.046] [I] [zway] Removing job: Check if node is failed to node 55
[2023-07-07 00:00:00.088] [D] [zway] SENDING (cb 0xa6): ( 01 09 00 13 37 02 98 40 25 A6 8B )
[2023-07-07 00:00:00.091] [D] [zway] RECEIVED ACK
[2023-07-07 00:00:00.092] [D] [zway] RECEIVED: ( 01 04 01 13 01 E8 )
[2023-07-07 00:00:00.092] [D] [zway] SENT ACK
[2023-07-07 00:00:00.092] [D] [zway] Delivered to Z-Wave stack
[2023-07-07 00:00:00.097] [D] [zway] RECEIVED: ( 01 1D 00 13 A6 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 )
[2023-07-07 00:00:00.097] [D] [zway] SENT ACK
[2023-07-07 00:00:00.097] [I] [zway] Job 0x13 (Nonce Get to node 55): Not delivered to recipient
[2023-07-07 00:00:00.098] [D] [zway] SENDING (cb 0xa7): ( 01 09 00 13 37 02 98 40 25 A7 8A )
[2023-07-07 00:00:00.102] [D] [zway] RECEIVED ACK
[2023-07-07 00:00:00.103] [D] [zway] RECEIVED: ( 01 04 01 13 01 E8 )
[2023-07-07 00:00:00.103] [D] [zway] SENT ACK
[2023-07-07 00:00:00.103] [D] [zway] Delivered to Z-Wave stack
[2023-07-07 00:00:00.103] [D] [zway] Job 0x13: deleted from queue
[2023-07-07 00:00:00.106] [D] [zway] Job 0x62: deleted from queue
[2023-07-07 00:00:00.107] [D] [zway] RECEIVED: ( 01 1D 00 13 A7 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 54 )
[2023-07-07 00:00:00.107] [D] [zway] SENT ACK
[2023-07-07 00:00:00.108] [I] [zway] Job 0x13 (Nonce Get to node 55): Not delivered to recipient
[2023-07-07 00:00:00.108] [D] [zway] SENDING (cb 0xa8): ( 01 09 00 13 37 02 98 40 25 A8 85 )
[2023-07-07 00:00:00.112] [D] [zway] RECEIVED ACK
[2023-07-07 00:00:00.113] [D] [zway] RECEIVED: ( 01 04 01 13 01 E8 )
[2023-07-07 00:00:00.113] [D] [zway] SENT ACK
[2023-07-07 00:00:00.113] [D] [zway] Delivered to Z-Wave stack
[2023-07-07 00:00:00.118] [D] [zway] RECEIVED: ( 01 1D 00 13 A8 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 5B )
[2023-07-07 00:00:00.118] [D] [zway] SENT ACK
[2023-07-07 00:00:00.118] [I] [zway] Job 0x13 (Nonce Get to node 55): Not delivered to recipient
[2023-07-07 00:00:00.119] [I] [zway] Job 0x13 (Nonce Get to node 55): Removing job due to too much retransmitions
[2023-07-07 00:00:00.119] [I] [zway] Adding job: Check if node is failed to node 55
[2023-07-07 00:00:00.119] [D] [zway] Job 0x13 (Nonce Get to node 55): fail
[2023-07-07 00:00:00.119] [I] [zway] Node 55:0 CC Security: sending Nonce Get
[2023-07-07 00:00:00.119] [D] [zway] SETDATA devices.55.instances.0.commandClasses.152.data.rNonceAckWait = **********

Re: job queue runs full after upgrade to v4.1.0

Posted: 09 Jul 2023 00:38
by AlesKO
I can confirm that installing 'z-way-4.1.0-27-g589c9812-lws16_armhf.deb' solved my problem.
Installed 24 hours ago and Job queue is empty right now :-)

Z-way works just like before upgrading to v4.1.0

Thank you PoltoS !