In the latest software (well, v2.0.0) I have seen random hangings of the sending queue. At random points in time, the queue no longer empties send-packets.
Packets coming in work fine, I'm still getting things from the sensors on the timers. Querying the sensors does not work, switching does not work. The packets show up in the queue, but get a permanent 0:20 timeout that never changes. Stopping sending doesn't work, I get queues about 48 long that's completely stuck and never times out.
Resetting the "z-way chip" or "API" doesn't help, but resetting the "z-way-server" service/process helps. After that all is fine again. So it seems like the problem is in "z-way-server", not the firmware (possibly both).
I haven't been able to catch anything special in the z-way-server logs. After it hangs, all I get is "duplicate request" on sending things, because all packets my system wants to send are already in the queue...
Anyone else seen this?
(I will upgrade to v2.0.1 now but this isn't in the changelog.)
edit: better part of the forum
send queue hanging (z-way-server v2.0.0, v2.0.1rc7)
send queue hanging (z-way-server v2.0.0, v2.0.1rc7)
Last edited by Mirar on 02 Jan 2015 12:46, edited 1 time in total.
Re: send queue hanging (z-way-server v2.0.0)
This may happen if some job with higher priority cannot be sent for some reason, blocking all other jobs.
It may be a bug, but it is hard to diagnose without seeing jobs in your queue (screenshot or output of http://ip:8083/ZWaveAPI/InspectQueue) and log file.
Please provide these next time it hangs.
It may be a bug, but it is hard to diagnose without seeing jobs in your queue (screenshot or output of http://ip:8083/ZWaveAPI/InspectQueue) and log file.
Please provide these next time it hangs.
Re: send queue hanging (z-way-server v2.0.0)
You can open Queue Inspector in our UI (Controller Info -> Queue) and attach a screenshot.
Re: send queue hanging (z-way-server v2.0.0)
I did save the log and the queue. I didn't see anything suspicious.
Event at about 21:40 where it stops sending (about 25h ago), grepping for "SENDING" gives the indication that it goes down from several a minute to once every few minutes.
Log is too big to attach, so it's here.
Queue from when I started to debug (today):
The queue never changes, except when it receives a sensor reading (then 2 more entries appear for a few seconds, I didn't catch them).
Thanks.
Event at about 21:40 where it stops sending (about 25h ago), grepping for "SENDING" gives the indication that it goes down from several a minute to once every few minutes.
Log is too big to attach, so it's here.
Queue from when I started to debug (today):
Code: Select all
Legend
n = Send count, W = Wait wakeup, S = Wait security, E = Encapsulated, D = Done, U = Urgent
n U W S E D Ack Resp Cbk Timeout NodeId Description Progress Buffer
0 - - - 0.20 6 SwitchBinary Set 6 3 25 1 0 25
0 - - - 0.20 6 SwitchBinary Set 6 3 25 1 ff 25
0 - - - 0.20 7 SwitchBinary Set 7 3 25 1 0 25
0 - - - 0.20 7 SwitchBinary Set 7 3 25 1 ff 25
0 - - - 0.20 11 SwitchBinary Set b 3 25 1 0 25
0 - - - 0.20 2 SensorBinary Get 2 2 30 2 25
0 - - - 0.20 3 SensorBinary Get 3 2 30 2 25
0 - - - 0.20 10 SensorBinary Get a 2 30 2 25
0 W - - - 0.20 12 SensorBinary Get v2 c 3 30 2 8 5
0 W - - - 0.20 12 SensorBinary Get v2 c 3 30 2 a 5
0 W - - - 0.20 12 SensorBinary Get v2 c 3 30 2 c 5
0 - - - 0.20 5 SwitchBinary Set 5 3 25 1 0 25
0 - - - 0.20 4 SwitchBinary Set 4 3 25 1 0 25
0 W - - - 0.20 12 SensorMultilevel V5 Get c 4 31 4 1 0 5
0 W - - - 0.20 12 SensorMultilevel V5 Get c 4 31 4 3 0 5
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 3 8 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 5 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 a 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 c 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1a 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1b 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1d 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 1 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 3 8 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 9 8 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 12 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 13 18 25
0 W - - - 0.20 13 SensorMultilevel V1-4 Get d 6 56 1 31 4 1d a2 5
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 1 0 25
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 3 8 25
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 5 0 25
0 - - - 0.20 11 SwitchBinary Set b 3 25 1 ff 25
0 - - - 0.20 5 SwitchBinary Set 5 3 25 1 ff 25
0 - - - 0.20 4 SwitchBinary Set 4 3 25 1 ff 25
0 - - - 0.20 5 SwitchBinary Get 5 2 25 2 25
0 - - - 0.20 9 SwitchBinary Get 9 2 25 2 25
0 - - - 0.20 16 SwitchBinary Set 10 3 25 1 ff 25
0 - - - 0.20 11 SwitchBinary Get b 2 25 2 25
0 - - - 0.20 16 SwitchBinary Get 10 2 25 2 25
0 - - - 0.20 6 SwitchBinary Get 6 2 25 2 25
0 - - - 0.20 6 Meter Get (v2) 6 3 32 1 0 25
0 - - - 0.20 6 Meter Get (v2) 6 3 32 1 10 25
0 - - - 0.20 7 SwitchBinary Get 7 2 25 2 25
0 - - - 0.20 8 SwitchBinary Get 8 2 25 2 25
0 - - - 0.20 4 SwitchBinary Get 4 2 25 2 25
0 - - - 0.20 16 Meter Get (v2) 10 3 32 1 0 25
0 - - - 0.20 16 Meter Get (v2) 10 3 32 1 10 25
Queue length: 48
Thanks.
Re: send queue hanging (z-way-server v2.0.0)
A guess might be the events around 21:25, where this occurs:
After that "Adding job" seems to not be sent anymore and there's a lot of "removed duplicate".
Code: Select all
[2014-12-28 21:25:17.281] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.281] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.281] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.281] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.281] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.282] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.282] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.282] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.282] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.282] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.283] [D] [zway] Job 0x13: deleted from queue
[2014-12-28 21:25:17.284] [D] [zway] Job 0x13: deleted from queue
Re: send queue hanging (z-way-server v2.0.0)
I see so many reports in your log, so they can completely stuck sending queue (it is processed only when there's nothing to receive). Receiving ~10 reports per second is definitely not normal.
What is your usage scenario?
What is your usage scenario?
Re: send queue hanging (z-way-server v2.0.0)
I have three multisensors sending on 3 second interval. Is that amount of data a problem? It seems rather small, even in 9600 bps - the traffic received amounts to an average of 60 bytes/s.pofs wrote:I see so many reports in your log, so they can completely stuck sending queue (it is processed only when there's nothing to receive). Receiving ~10 reports per second is definitely not normal.
What is your usage scenario?
After the event there's plenty of seconds with no traffic at all (but yes, 9000 seconds with more than 10 packets/s).
The network isn't full, other senders have no problem (wall switches doesn't have a problem at all) and the problem is solved by restarting the z-way-server process.
Almost this amount of data has also worked perfectly without stalling any queue for almost a year - until I upgraded to 2.0.0.
I can easily make a script to detect this problem within seconds and restart the server. It's really easy to detect, since the queue is completely static after the event. But it doesn't seem like the correct solution. I already did hacks like that to get the Aeon Stick to work before I switched to a razberry.
Re: send queue hanging (z-way-server v2.0.0)
Hanged again. Sensor readings coming in fine.
A little after "service z-way-server restart" queue looks like this:
Note that "timeout" is stuck at "0:20" on the hanged queue.
Code: Select all
Legend
n = Send count, W = Wait wakeup, S = Wait security, E = Encapsulated, D = Done, U = Urgent
n U W S E D Ack Resp Cbk Timeout NodeId Description Progress Buffer
0 - - - 0.20 10 SensorBinary Get a 2 30 2 25
0 - - - 0.20 6 SwitchBinary Set 6 3 25 1 0 25
0 - - - 0.20 2 SensorBinary Get 2 2 30 2 25
0 - - - 0.20 7 SwitchBinary Set 7 3 25 1 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 9 8 25
0 W - - - 0.20 12 SensorMultilevel V5 Get c 4 31 4 1 0 5
0 W - - - 0.20 12 SensorMultilevel V5 Get c 4 31 4 3 0 5
0 W - - - 0.20 13 SensorMultilevel V1-4 Get d 6 56 1 31 4 1d a2 5
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 1 0 25
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 3 8 25
0 - - - 0.20 2 SensorMultilevel V5 Get 2 4 31 4 5 0 25
0 - - - 0.20 11 SwitchBinary Set b 3 25 1 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 3 8 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 5 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 a 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 c 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1a 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1b 0 25
0 - - - 0.20 3 SensorMultilevel V5 Get 3 4 31 4 1d 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 1 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 3 8 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 9 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 12 0 25
0 - - - 0.20 10 SensorMultilevel V5 Get a 4 31 4 13 8 25
0 - - - 0.20 7 SwitchBinary Get 7 2 25 2 25
0 - - - 0.20 8 SwitchBinary Get 8 2 25 2 25
0 - - - 0.20 5 SwitchBinary Get 5 2 25 2 25
0 - - - 0.20 11 SwitchBinary Get b 2 25 2 25
0 - - - 0.20 9 SwitchBinary Get 9 2 25 2 25
0 - - - 0.20 6 SwitchBinary Get 6 2 25 2 25
0 - - - 0.20 16 SwitchBinary Set 10 3 25 1 ff 25
0 - - - 0.20 16 SwitchBinary Get 10 2 25 2 25
0 - - - 0.20 16 Meter Get (v2) 10 3 32 1 0 25
0 - - - 0.20 16 Meter Get (v2) 10 3 32 1 10 25
0 - - - 0.20 17 SwitchBinary Get 11 2 25 2 25
0 - - - 0.20 6 Meter Get (v2) 6 3 32 1 0 25
0 - - - 0.20 6 Meter Get (v2) 6 3 32 1 10 25
Queue length: 38
Code: Select all
Legend
n = Send count, W = Wait wakeup, S = Wait security, E = Encapsulated, D = Done, U = Urgent
n U W S E D Ack Resp Cbk Timeout NodeId Description Progress Buffer
1 D + + + 5.85 6 Meter Get (v2) Delivered 6 3 32 1 0 25
1 D + + + 5.93 6 Meter Get (v2) Delivered 6 3 32 1 10 25
1 D + + + 6.43 9 SwitchBinary Get Delivered 9 2 25 2 25
1 D + + + 11.52 16 SwitchBinary Set Delivered 10 3 25 1 ff 25
1 D + + + 11.61 16 SwitchBinary Get Delivered 10 2 25 2 25
1 D + + + 15.63 6 Meter Get (v2) Delivered 6 3 32 1 0 25
1 D + + + 15.70 6 Meter Get (v2) Delivered 6 3 32 1 10 25
Queue length: 7
Re: send queue hanging (z-way-server v2.0.0, v2.0.1rc7)
Anything else I can do to debug this?
Re: send queue hanging (z-way-server v2.0.0, v2.0.1rc7)
Are you performing polling, or you configured your devices to send unsolicited reports?
Polling might become an issue when done on timer basis. You shouldn't make a new request while not received the previous answer.
Polling might become an issue when done on timer basis. You shouldn't make a new request while not received the previous answer.