Page 1 of 5
send queue hanging (z-way-server v2.0.0, v2.0.1rc7)
Posted: 29 Dec 2014 19:43
by Mirar
In the latest software (well, v2.0.0) I have seen random hangings of the
sending queue. At random points in time, the queue no longer empties send-packets.
Packets coming in work fine, I'm still getting things from the sensors on the timers. Querying the sensors does not work, switching does not work. The packets show up in the queue, but get a permanent 0:20 timeout that never changes. Stopping sending doesn't work, I get queues about 48 long that's completely stuck and never times out.
Resetting the
"z-way chip" or "API" doesn't help, but resetting the
"z-way-server" service/process helps. After that all is fine again. So it seems like the problem is in "z-way-server", not the firmware (possibly both).
I haven't been able to catch anything special in the z-way-server logs. After it hangs, all I get is "duplicate request" on sending things, because all packets my system wants to send are already in the queue...
Anyone else seen this?
(I will upgrade to v2.0.1 now but this isn't in the
changelog.)
edit: better part of the forum
Re: send queue hanging (z-way-server v2.0.0)
Posted: 29 Dec 2014 20:12
by pofs
This may happen if some job with higher priority cannot be sent for some reason, blocking all other jobs.
It may be a bug, but it is hard to diagnose without seeing jobs in your queue (screenshot or output of
http://ip:8083/ZWaveAPI/InspectQueue) and log file.
Please provide these next time it hangs.
Re: send queue hanging (z-way-server v2.0.0)
Posted: 29 Dec 2014 21:29
by PoltoS
You can open Queue Inspector in our UI (Controller Info -> Queue) and attach a screenshot.
Re: send queue hanging (z-way-server v2.0.0)
Posted: 30 Dec 2014 00:39
by Mirar
I did save the log and the queue. I didn't see anything suspicious.
Event at about 21:40 where it stops sending (about 25h ago), grepping for "SENDING" gives the indication that it goes down from several a minute to once every few minutes.
Log is too big to attach, so it's
here.
Queue from when I started to debug (today):
Code: Select all
Legend
n = Send count, W = Wait wakeup, S = Wait security, E = Encapsulated, D = Done, U = Urgent
n U W S E D Ack Resp Cbk Timeout NodeId Description Progress Buffer
0 - - - 0.20 6 SwitchBinary Set 6 3 25 1 0 25
0 - - - 0.20 6 SwitchBinary Set 6 3 25 1 ff 25
0 - - - 0.20 7 SwitchBinary Set 7 3 25 1 0 25
0 - - - 0.20 7 SwitchBinary Set 7 3 25 1 ff 25
0 - - - 0.20 11 SwitchBinary Set b 3 25 1 0 25
0 - - - 0.20 2 SensorBinary Get 2 2 30 2 25
0 - - - 0.20 3 SensorBinary Get 3 2 30 2 25
0 - - - 0.20 10 SensorBinary Get a 2 30 2 25
0 W - - - 0.20 12 SensorBinary Get v2 c 3 30 2 8 5
0 W - - - 0.20 12 SensorBinary Get v2 c 3 30 2 a 5
0 W - - - 0.20 12 SensorBinary Get v2 c 3 30 2 c 5
0 - - - 0.20 5 SwitchBinary Set 5 3 25 1 0 25
0 - - - 0.20 4 SwitchBinary Set 4 3 25 1 0 25
0 W - - - 0.20 12 SensorMultilevel V5 Get c 4 31 4 1 0 5
0 W - - - 0.20 12 SensorMultilevel V5 Get c 4 31 4 3 0 5
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 3 8 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 5 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 a 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 c 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1a 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1b 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1d 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 1 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 3 8 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 9 8 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 12 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 13 18 25
0 W - - - 0.20 13 SensorMultilevel V1-4 Get d 6 56 1 31 4 1d a2 5
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 1 0 25
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 3 8 25
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 5 0 25
0 - - - 0.20 11 SwitchBinary Set b 3 25 1 ff 25
0 - - - 0.20 5 SwitchBinary Set 5 3 25 1 ff 25
0 - - - 0.20 4 SwitchBinary Set 4 3 25 1 ff 25
0 - - - 0.20 5 SwitchBinary Get 5 2 25 2 25
0 - - - 0.20 9 SwitchBinary Get 9 2 25 2 25
0 - - - 0.20 16 SwitchBinary Set 10 3 25 1 ff 25
0 - - - 0.20 11 SwitchBinary Get b 2 25 2 25
0 - - - 0.20 16 SwitchBinary Get 10 2 25 2 25
0 - - - 0.20 6 SwitchBinary Get 6 2 25 2 25
0 - - - 0.20 6 Meter Get (v2) 6 3 32 1 0 25
0 - - - 0.20 6 Meter Get (v2) 6 3 32 1 10 25
0 - - - 0.20 7 SwitchBinary Get 7 2 25 2 25
0 - - - 0.20 8 SwitchBinary Get 8 2 25 2 25
0 - - - 0.20 4 SwitchBinary Get 4 2 25 2 25
0 - - - 0.20 16 Meter Get (v2) 10 3 32 1 0 25
0 - - - 0.20 16 Meter Get (v2) 10 3 32 1 10 25
Queue length: 48
The queue never changes, except when it receives a sensor reading (then 2 more entries appear for a few seconds, I didn't catch them).
Thanks.
Re: send queue hanging (z-way-server v2.0.0)
Posted: 30 Dec 2014 00:44
by Mirar
A guess might be the events around 21:25, where this occurs:
Code: Select all
[2014-12-28 21:25:17.281] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.281] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.281] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.281] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.281] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.282] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.282] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.282] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.282] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.282] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.284] [D] [zway] Job 0x13: deleted from queue
After that "Adding job" seems to not be sent anymore and there's a lot of "removed duplicate".
Re: send queue hanging (z-way-server v2.0.0)
Posted: 30 Dec 2014 18:36
by pofs
I see so many reports in your log, so they can completely stuck sending queue (it is processed only when there's nothing to receive). Receiving ~10 reports per second is definitely not normal.
What is your usage scenario?
Re: send queue hanging (z-way-server v2.0.0)
Posted: 31 Dec 2014 14:15
by Mirar
pofs wrote:I see so many reports in your log, so they can completely stuck sending queue (it is processed only when there's nothing to receive). Receiving ~10 reports per second is definitely not normal.
What is your usage scenario?
I have three multisensors sending on 3 second interval. Is that amount of data a problem? It seems rather small, even in 9600 bps - the traffic received amounts to an average of 60 bytes/s.
After the event there's plenty of seconds with no traffic at all (but yes, 9000 seconds with more than 10 packets/s).
The network isn't full, other senders have no problem (wall switches doesn't have a problem at all) and the problem is solved by
restarting the z-way-server process.
Almost this amount of data has also worked perfectly without stalling any queue for almost a year - until I upgraded to 2.0.0.
I can easily make a script to detect this problem within seconds and restart the server. It's really easy to detect, since the queue is completely static after the event. But it doesn't seem like the correct solution. I already did hacks like that to get the Aeon Stick to work before I switched to a razberry.
Re: send queue hanging (z-way-server v2.0.0)
Posted: 02 Jan 2015 02:21
by Mirar
Hanged again. Sensor readings coming in fine.
Code: Select all
Legend
n = Send count, W = Wait wakeup, S = Wait security, E = Encapsulated, D = Done, U = Urgent
n U W S E D Ack Resp Cbk Timeout NodeId Description Progress Buffer
0 - - - 0.20 10 SensorBinary Get a 2 30 2 25
0 - - - 0.20 6 SwitchBinary Set 6 3 25 1 0 25
0 - - - 0.20 2 SensorBinary Get 2 2 30 2 25
0 - - - 0.20 7 SwitchBinary Set 7 3 25 1 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 9 8 25
0 W - - - 0.20 12 SensorMultilevel V5 Get c 4 31 4 1 0 5
0 W - - - 0.20 12 SensorMultilevel V5 Get c 4 31 4 3 0 5
0 W - - - 0.20 13 SensorMultilevel V1-4 Get d 6 56 1 31 4 1d a2 5
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 1 0 25
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 3 8 25
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 5 0 25
0 - - - 0.20 11 SwitchBinary Set b 3 25 1 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 3 8 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 5 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 a 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 c 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1a 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1b 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1d 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 1 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 3 8 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 9 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 12 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 13 8 25
0 - - - 0.20 7 SwitchBinary Get 7 2 25 2 25
0 - - - 0.20 8 SwitchBinary Get 8 2 25 2 25
0 - - - 0.20 5 SwitchBinary Get 5 2 25 2 25
0 - - - 0.20 11 SwitchBinary Get b 2 25 2 25
0 - - - 0.20 9 SwitchBinary Get 9 2 25 2 25
0 - - - 0.20 6 SwitchBinary Get 6 2 25 2 25
0 - - - 0.20 16 SwitchBinary Set 10 3 25 1 ff 25
0 - - - 0.20 16 SwitchBinary Get 10 2 25 2 25
0 - - - 0.20 16 Meter Get (v2) 10 3 32 1 0 25
0 - - - 0.20 16 Meter Get (v2) 10 3 32 1 10 25
0 - - - 0.20 17 SwitchBinary Get 11 2 25 2 25
0 - - - 0.20 6 Meter Get (v2) 6 3 32 1 0 25
0 - - - 0.20 6 Meter Get (v2) 6 3 32 1 10 25
Queue length: 38
A little after "service z-way-server restart" queue looks like this:
Code: Select all
Legend
n = Send count, W = Wait wakeup, S = Wait security, E = Encapsulated, D = Done, U = Urgent
n U W S E D Ack Resp Cbk Timeout NodeId Description Progress Buffer
1 D + + + 5.85 6 Meter Get (v2) Delivered 6 3 32 1 0 25
1 D + + + 5.93 6 Meter Get (v2) Delivered 6 3 32 1 10 25
1 D + + + 6.43 9 SwitchBinary Get Delivered 9 2 25 2 25
1 D + + + 11.52 16 SwitchBinary Set Delivered 10 3 25 1 ff 25
1 D + + + 11.61 16 SwitchBinary Get Delivered 10 2 25 2 25
1 D + + + 15.63 6 Meter Get (v2) Delivered 6 3 32 1 0 25
1 D + + + 15.70 6 Meter Get (v2) Delivered 6 3 32 1 10 25
Queue length: 7
Note that "timeout" is stuck at "0:20" on the hanged queue.
Re: send queue hanging (z-way-server v2.0.0, v2.0.1rc7)
Posted: 03 Jan 2015 23:26
by Mirar
Anything else I can do to debug this?
Re: send queue hanging (z-way-server v2.0.0, v2.0.1rc7)
Posted: 04 Jan 2015 01:49
by pofs
Are you performing polling, or you configured your devices to send unsolicited reports?
Polling might become an issue when done on timer basis. You shouldn't make a new request while not received the previous answer.