Blocked Job Queue on Z-Way Server 2.2.2

felix · Post by **felix** » 31 May 2016 15:28

Hello everybody,

I'm experiencing reproducible server hangs with z-way server 2.2.2 and need some support to track down the issue. Since the server revives sometimes after up to 15 minutes, I strongly suspect the job queue to be blocked somehow.

Originally, I tried to track down missing value updates on the server's API by comparing the logged sensor data with the sensor values shown in the server's user interface. Thus, I generated a vast amount of sensor data and observed the value changes in the user interface after the arrival of sensor information is logged to the z-way-server log.

Since I initially suspected broken/buggy devices to be the source of all evil, I tested with various sensors and actors. Finally, I reduced the test setup to a single device which was either a Fibaro Contact Sensor or a Fibaro Motion Detector (FGHMS-001). I did not create a vast amount of data with the motion sensor, but the server hangs nevertheless.

I'm using the following setup:
- Custom Arm 7 box, running Hypriot OS, kernel 3.10
- Sigma UZB stick, bound to /dev/ttyACM0

The system's log (/var/log/messages) do not show anything suspicious. All good related to the UZB device, it is properly identified, bound to /dev/ttyACM0 and is not mentioned further.

Follow these steps to reproduce the issue:
1. Ensure that the device is bound properly (check /var/log/messages, check /dev/ttyACM0)
2. Start the z-way sever, in my environment it's

Code: Select all

service z-way-server start

as root
3. OPTIONAL: Observe the server's work with

Code: Select all

tailf /var/log/z-way-server.log

4. Generate device data until incoming device data is not written to the log anymore. With a contact sensor, a vast amount is easily generated. This may take a long time.

With the Fibaro Contact sensor, the logs show the following just before the hang:

Code: Select all

[2016-05-27 13:35:43.031] [D] [zway] SENT ACK
[2016-05-27 13:35:43.031] [D] [zway] SETDATA devices.9.data.lastReceived = 0 (0x00000000)
[2016-05-27 13:35:43.031] [D] [zway] SETDATA devices.9.instances.0.commandClasses.48.data.1.level = True
[2016-05-27 13:35:43.031] [D] [zway] SETDATA devices.9.instances.0.commandClasses.48.data.1 = Empty

Followed by after server revival:

[2016-05-27 13:50:00.911] [I] [core] ---  ZWayVDev_zway_11-0-49-4 performCommand processing: {"0":"update"}
[2016-05-27 13:50:00.913] [I] [zway] Adding job: SensorMultilevel V1-4 Get
[2016-05-27 13:50:00.915] [I] [core] ---  ZWayVDev_zway_11-0-50-0 performCommand processing: {"0":"update"}
[2016-05-27 13:50:00.917] [D] [zway] SENDING (cb 0x08): ( 01 09 00 13 0B 02 31 04 25 08 F4 )
[2016-05-27 13:50:00.918] [I] [zway] Adding job: Meter Get (v2)
[2016-05-27 13:50:00.918] [I] [core] ---  ZWayVDev_zway_11-0-50-2 performCommand processing: {"0":"update"}
[2016-05-27 13:50:00.918] [D] [zway] RECEIVED ACK
[2016-05-27 13:50:00.919] [I] [zway] Adding job: Meter Get (v2)
[2016-05-27 13:50:00.924] [D] [zway] RECEIVED: ( 01 04 01 13 01 E8 )
[2016-05-27 13:50:00.925] [D] [zway] SENT ACK
[2016-05-27 13:50:00.925] [D] [zway] Delivered to Z-Wave stack
[2016-05-27 13:50:00.928] [D] [zway] RECEIVED: ( 01 0D 00 04 00 09 07 56 01 20 01 FF F4 CA 4F )
[2016-05-27 13:50:00.928] [D] [zway] SENT ACK

or

Code: Select all

[2016-05-30 11:59:24.832] [D] [zway] RECEIVED: ( 01 0D 00 04 00 09 07 56 01 30 03 FF D1 CB 79 )
[2016-05-30 11:59:24.832] [D] [zway] SENT ACK
[2016-05-30 11:59:24.832] [D] [zway] SETDATA devices.9.data.lastReceived = 0 (0x00000000)
[2016-05-30 11:59:24.832] [D] [zway] SETDATA devices.9.instances.0.commandClasses.48.data.1.level = True
[2016-05-30 11:59:24.832] [D] [zway] SETDATA devices.9.instances.0.commandClasses.48.data.1 = Empty

Server never revived. Restarted the sever manually.

The last lines before the hang with the Fibaro motion detector look very similar (but still might be totally unrelated to core issue):

Code: Select all

[2016-05-30 15:35:56.355] [D] [zway] SETDATA devices.12.data.lastReceived = 0 (0x00000000)
[2016-05-30 15:35:56.355] [D] [zway] SETDATA devices.12.instances.0.commandClasses.49.data.3.deviceScale = 1 (0x00000001)
[2016-05-30 15:35:56.355] [D] [zway] SETDATA devices.12.instances.0.commandClasses.49.data.3.scale = 1 (0x00000001)
[2016-05-30 15:35:56.357] [D] [zway] SETDATA devices.12.instances.0.commandClasses.49.data.3.scaleString = "Lux"
[2016-05-30 15:35:56.357] [D] [zway] SETDATA devices.12.instances.0.commandClasses.49.data.3.val = 716.000000
[2016-05-30 15:35:56.358] [D] [zway] SETDATA devices.12.instances.0.commandClasses.49.data.3 = Empty

Server never revived. Restarted the sever manually.

During the latter hang, I also checked the server's jop queue located in the expert user interface and took the following screenshot:

: Blocked Job Queue; 769E702A15D19A34C61ED7BC3DE27082.jpg (69.63 KiB) Viewed 3280 times

During the hang, the job queue always seem to contain some jobs with state Wait wakeup. I could provide additional screenshots, if desired. Since, the user interface is still alive and is able to obtain data, I suspect the job queue to be blocked.

- Is this a well known issue (then I was unable to find the solution in the forum

)?
- Is anybody able to reproduce this issue?
- Does anybody have any suggestions about possible reasons?
- Is there any way to increase the logging output? Maybe by switching the log level to trace?

Thank you in advance,

Felix