How to debug system failure

Discussions about Z-Way software and Z-Wave technology in general
Boxson
Posts: 29
Joined: 21 Sep 2016 20:00

How to debug system failure

Post by Boxson »

after about 30 days of normal operation today the Z-Wave.me web ui was not available nor did any control commands be proccessed by the z-way system. I have a wall-Switch (battery powered) which should activate scenes but it doesn't.
So i first tried to open the web ui at my browser but no connection could be established.
Next step was to connect to raspberry by ssh, which works without any problem.
The system wasnt under high load everything seems to be ok.
I had to restart the raspberry with "sudo reboot" to bring everything back online.

What can i now do to analyse where the problem was?

Regards

BoxSon
User avatar
PoltoS
Posts: 7601
Joined: 26 Jan 2011 19:36

Re: How to debug system failure

Post by PoltoS »

Send us last 20 lines of log from that crash (20 lines before next start of Z-Way). Seems Z-Way crashed. We have released v2.2.4 recently. It fixes few potential crashes. May be yours is fixed already. Anyway show us the log please.
Boxson
Posts: 29
Joined: 21 Sep 2016 20:00

Re: How to debug system failure

Post by Boxson »

I have upgraded to 2.2.4 two days ago and today again the Z-Way host was not reachable by browser.
The last 20 Lines of the log:

Code: Select all

[2016-10-31 10:44:00.708] [D] [zway] SENT ACK
[2016-10-31 10:44:00.708] [D] [zway] SETDATA controller.data.incomingPacket.nodeId = 15 (0x0000000f)
[2016-10-31 10:44:00.708] [D] [zway] SETDATA controller.data.incomingPacket.frameType = "singlecast"
[2016-10-31 10:44:00.708] [D] [zway] SETDATA controller.data.incomingPacket = **********
[2016-10-31 10:44:00.708] [D] [zway] SETDATA devices.15.data.lastReceived = 0 (0x00000000)
[2016-10-31 10:44:00.708] [D] [zway] SETDATA devices.15.instances.0.commandClasses.37.data.level = True
[2016-10-31 10:44:00.722] [I] [core] HK: updated ZWayVDev_zway_15-0-38
[2016-10-31 10:44:00.736] [I] [core] HK: updated ZWayVDev_zway_16-0-37
[2016-10-31 10:44:00.747] [D] [zway] RECEIVED: ( 01 07 00 13 E1 00 00 05 0F )
[2016-10-31 10:44:00.747] [D] [zway] SENT ACK
[2016-10-31 10:44:00.747] [I] [zway] Job 0x13 (SwitchBinary Get): Delivered
[2016-10-31 10:44:00.747] [D] [zway] SETDATA controller.data.outgoingPacket.delivered = True
[2016-10-31 10:44:00.747] [D] [zway] SETDATA controller.data.outgoingPacket.deliveryTime = 50 (0x00000032)
[2016-10-31 10:44:00.747] [D] [zway] SETDATA controller.data.outgoingPacket.packetLength = 5 (0x00000005)
[2016-10-31 10:44:00.747] [D] [zway] SETDATA controller.data.outgoingPacket = **********
[2016-10-31 10:44:00.747] [D] [zway] SendData Response with callback 0xe1 received: received by recipient
[2016-10-31 10:44:00.748] [D] [zway] SETDATA devices.16.data.lastSend = 47213509 (0x02d06bc5)
[2016-10-31 10:44:00.748] [D] [zway] Job 0x13 (SwitchBinary Get): success
[2016-10-31 10:44:00.748] [I] [zway] Removing job: SwitchBinary Get
[2016-10-31 10:44:00.748] [D] [zway] SENDING (cb 0xe2): ( 01 09 00 13 0F 02 26 02 25 E2 0B )
[2016-10-31 10:44:00.753] [I] [core] HK: updated ZWayVDev_zway_16-0-38
I could connect via SSH and a reboot of razberry made everything work again.

Any idea what causes the problem?
Anything i could do to get more detailed information?

Regards

Boxson
Boxson
Posts: 29
Joined: 21 Sep 2016 20:00

Re: How to debug system failure

Post by Boxson »

any help? System stopped working again today.

Regards

BoxSon
gsaw
Posts: 78
Joined: 22 Aug 2016 00:26

Re: How to debug system failure

Post by gsaw »

Boxson wrote:any help? System stopped working again today.

Regards

BoxSon
Try to edit /etc/init.d/z-way-server.

There is command line
start-stop-daemon --start --pidfile $PIDFILE --make-pidfile --background --no-close --chdir $DAEMON_PATH --exec $NAME > /dev/null 2>&1
at the end there is "/dev/null". Replace it with /tmp/zc.log. Maybe, if it crashes, it writes some thing in /tmp/zc.log file. Check your /var/log/messages or /var/log/syslog files, may be you have some thing suspicious there at the time of crash (for time of crash check last record in /var/log/z-way-server.log).

If it stops to response again, do not reboot the raspberry, but try at first to restart z-way server with

Code: Select all

sudo /etc/init.d/z-way-server restart
If restart of server didn't help, then probably you have troubles with raspberry pi, sd card or what ever you have.

check your space if you still have enough

Code: Select all

df -h
If restart of z-way server solves the problem, then you have definitely (maybe) a z-way server problem. If devs do not have any ideas, what you have, then you can try at least to restart z-way server every night per cron. :?

I would try these steps.
Boxson
Posts: 29
Joined: 21 Sep 2016 20:00

Re: How to debug system failure

Post by Boxson »

Hi,

thanks for you detailed information.
I've made the changes to the z-way-server. After restarting there is a line within zc.log:

Code: Select all

Error: 'debug-port' number is out of range. Skipping it.
I'll wait now till the system stops again.

df -h shows that there is plenty of space left:

Code: Select all

Filesystem      Size  Used Avail Use% Mounted on
/dev/root       7.2G  3.9G  3.0G  57% /
devtmpfs        459M     0  459M   0% /dev
tmpfs           463M     0  463M   0% /dev/shm
tmpfs           463M  6.3M  457M   2% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           463M     0  463M   0% /sys/fs/cgroup
/dev/mmcblk0p1   63M   21M   43M  33% /boot
tmpfs            93M     0   93M   0% /run/user/1000
Regards

BoxSon
lajaro
Posts: 9
Joined: 26 Dec 2016 04:11

Re: How to debug system failure

Post by lajaro »

I'm having the same problem, not being able to render the web pages on TCP 8083. This happens daily, and today has happened 3 times.

Environment is RPI2 B+ running Jessie and Z-Way 2.2.5.

Nothing useful in /var/log/z-way-server or other system logs. Disk space isn't a problem. /etc/init.d/z-way-server restart (or stop / start) doesn't work; only thing that works is a reboot just like the OP.
johannes
Posts: 11
Joined: 31 Aug 2016 16:28

Re: How to debug system failure

Post by johannes »

Hi,
I have the same problem. Any news on that?

My configuration:

Hardware
Vendor: RaZberry by Z-Wave.Me
Vendors Product ID: 1024 / 1
Z-Wave Chip: ZW0500
Firmware
Library Type: Static Controller
SDK Version: 6.51.03
Serial API Version: 05.00
Capabillities:
UUID: f7e89785eb05c5d1db83a1753d6d1661
Subvendor: 0x0000
Nodes limit: Unlimited
Capabillities: Slm
Software Information
Version number: v2.2.5
Compile-ID: d58b7617c4f553beca4c6f41705b48fa5a4cf56c
Compile-Date: 2016-11-15 11:42:15 +0300
UI
UI version 1.1.0

My last log entrie(s) before the latest crash yesterday:

Code: Select all

[2017-01-03 06:48:54.682] [D] [zway] RECEIVED: ( 01 18 00 04 00 10 12 60 0D 01 00 32 02 A1 6C 00 00 03 E4 00 00 00 00 00 00 97 )
[2017-01-03 06:48:54.682] [D] [zway] SENT ACK
[2017-01-03 06:48:54.682] [D] [zway] SETDATA controller.data.incomingPacket.nodeId = 16 (0x00000010)
[2017-01-03 06:48:54.682] [D] [zway] SETDATA controller.data.incomingPacket.frameType = "singlecast"
[2017-01-03 06:48:54.683] [D] [zway] SETDATA controller.data.incomingPacket = **********
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.data.lastReceived = 0 (0x00000000)
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.instances.1.commandClasses.50.data.5.val = 0.996000
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.instances.1.commandClasses.50.data.5.delta = 0 (0x00000000)
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.instances.1.commandClasses.50.data.5.ratetype = 1 (0x00000001)
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.instances.1.commandClasses.50.data.5.previous = 0.000000
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.instances.1.commandClasses.50.data.5 = Empty
[2017-01-03 06:48:54.695] [I] [core] HK: updated ZWayVDev_zway_16-1-50-5
ZWayVDev_zway_16-1-50-5 is a current sensor of my Aeon Labs Home Energy Meter G2.
mrnmukkas
Posts: 10
Joined: 15 Jan 2016 18:16

Re: How to debug system failure

Post by mrnmukkas »

I have the same issue. I've noticed that the log seems to be filled with "Too many open files" errors:

Code: Select all

[2017-01-05 11:10:08.270] [I] [core] Error: can not write back config to storage:  Error: Too many open files
[2017-01-05 11:10:08.318] [I] [core] [BaseModule-44] Set lastLevel to 1.2 for ZWayVDev_zway_7-1-50-2 (was 1.1)
[2017-01-05 11:10:08.334] [I] [core] Error: can not write back config to storage:  Error: Too many open files
[2017-01-05 11:10:08.351] [I] [core] Error: can not write back config to storage:  Error: Too many open files
[2017-01-05 11:15:00.463] [I] [core] ---  Code_Device_sensorMultilevel_30 performCommand processing: {"0":"update"}
[2017-01-05 11:15:00.470] [E] [core] Callback execution error: TypeError: Cannot read property 'vd_err_virtual_dev' of null
    at _.extend.performCommand (automation/classes/VirtualDevice.js:294:66)
    at automation/modules/SensorsPolling/index.js:82:21
    at Array.forEach (native)
    at AutomationController.onPoll (automation/modules/SensorsPolling/index.js:81:29)
    at AutomationController.EventEmitter.emit (automation/lib/eventemitter2.js:317:33)
    at automation/modules/Cron/index.js:133:42
    at Array.forEach (native)
    at automation/modules/Cron/index.js:121:27
    at Array.forEach (native)
    at Function.<anonymous> (automation/modules/Cron/index.js:118:37)
[2017-01-05 11:17:06.891] [E] [core] Callback execution error: Error: Too many open files
    at Error (native)
    at CodeDevice.eval (eval at <anonymous> (automation/modules/CodeDevice/index.js:125:29), <anonymous>:1:1)
    at CodeDevice.update (automation/modules/CodeDevice/index.js:125:24)
    at Function.<anonymous> (automation/modules/CodeDevice/index.js:101:18)
[2017-01-05 11:17:06.966] [E] [core] Callback execution error: Error: Too many open files
    at Error (native)
    at CodeDevice.eval (eval at <anonymous> (automation/modules/CodeDevice/index.js:125:29), <anonymous>:1:1)
    at CodeDevice.update (automation/modules/CodeDevice/index.js:125:24)
    at Function.<anonymous> (automation/modules/CodeDevice/index.js:101:18)
[2017-01-05 11:17:07.141] [E] [core] Callback execution error: Error: Too many open files
    at Error (native)
    at CodeDevice.eval (eval at <anonymous> (automation/modules/CodeDevice/index.js:125:29), <anonymous>:1:1)
    at CodeDevice.update (automation/modules/CodeDevice/index.js:125:24)
    at Function.<anonymous> (automation/modules/CodeDevice/index.js:101:18)
[2017-01-05 11:17:07.144] [E] [core] Callback execution error: Error: Too many open files
    at Error (native)
    at CodeDevice.eval (eval at <anonymous> (automation/modules/CodeDevice/index.js:125:29), <anonymous>:1:1)
    at CodeDevice.update (automation/modules/CodeDevice/index.js:125:24)
    at Function.<anonymous> (automation/modules/CodeDevice/index.js:101:18)
[2017-01-05 11:20:00.815] [I] [core] ---  Code_Device_sensorMultilevel_30 performCommand processing: {"0":"update"}
[2017-01-05 11:20:00.821] [E] [core] Callback execution error: TypeError: Cannot read property 'vd_err_virtual_dev' of null
    at _.extend.performCommand (automation/classes/VirtualDevice.js:294:66)
    at automation/modules/SensorsPolling/index.js:82:21
    at Array.forEach (native)
    at AutomationController.onPoll (automation/modules/SensorsPolling/index.js:81:29)
    at AutomationController.EventEmitter.emit (automation/lib/eventemitter2.js:317:33)
    at automation/modules/Cron/index.js:133:42
    at Array.forEach (native)
    at automation/modules/Cron/index.js:121:27
    at Array.forEach (native)
    at Function.<anonymous> (automation/modules/Cron/index.js:118:37)
mrnmukkas
Posts: 10
Joined: 15 Jan 2016 18:16

Re: How to debug system failure

Post by mrnmukkas »

And after checking lsof after the system once again stopped working, I found this line repeating ~65000 times:

Code: Select all

z-way-ser 26801 root 1026u  IPv4 1063147      0t0    TCP localhost:8083->localhost:39902 (ESTABLISHED)
Post Reply