How to debug system failure
How to debug system failure
after about 30 days of normal operation today the Z-Wave.me web ui was not available nor did any control commands be proccessed by the z-way system. I have a wall-Switch (battery powered) which should activate scenes but it doesn't.
So i first tried to open the web ui at my browser but no connection could be established.
Next step was to connect to raspberry by ssh, which works without any problem.
The system wasnt under high load everything seems to be ok.
I had to restart the raspberry with "sudo reboot" to bring everything back online.
What can i now do to analyse where the problem was?
Regards
BoxSon
So i first tried to open the web ui at my browser but no connection could be established.
Next step was to connect to raspberry by ssh, which works without any problem.
The system wasnt under high load everything seems to be ok.
I had to restart the raspberry with "sudo reboot" to bring everything back online.
What can i now do to analyse where the problem was?
Regards
BoxSon
Re: How to debug system failure
Send us last 20 lines of log from that crash (20 lines before next start of Z-Way). Seems Z-Way crashed. We have released v2.2.4 recently. It fixes few potential crashes. May be yours is fixed already. Anyway show us the log please.
Re: How to debug system failure
I have upgraded to 2.2.4 two days ago and today again the Z-Way host was not reachable by browser.
The last 20 Lines of the log:
I could connect via SSH and a reboot of razberry made everything work again.
Any idea what causes the problem?
Anything i could do to get more detailed information?
Regards
Boxson
The last 20 Lines of the log:
Code: Select all
[2016-10-31 10:44:00.708] [D] [zway] SENT ACK
[2016-10-31 10:44:00.708] [D] [zway] SETDATA controller.data.incomingPacket.nodeId = 15 (0x0000000f)
[2016-10-31 10:44:00.708] [D] [zway] SETDATA controller.data.incomingPacket.frameType = "singlecast"
[2016-10-31 10:44:00.708] [D] [zway] SETDATA controller.data.incomingPacket = **********
[2016-10-31 10:44:00.708] [D] [zway] SETDATA devices.15.data.lastReceived = 0 (0x00000000)
[2016-10-31 10:44:00.708] [D] [zway] SETDATA devices.15.instances.0.commandClasses.37.data.level = True
[2016-10-31 10:44:00.722] [I] [core] HK: updated ZWayVDev_zway_15-0-38
[2016-10-31 10:44:00.736] [I] [core] HK: updated ZWayVDev_zway_16-0-37
[2016-10-31 10:44:00.747] [D] [zway] RECEIVED: ( 01 07 00 13 E1 00 00 05 0F )
[2016-10-31 10:44:00.747] [D] [zway] SENT ACK
[2016-10-31 10:44:00.747] [I] [zway] Job 0x13 (SwitchBinary Get): Delivered
[2016-10-31 10:44:00.747] [D] [zway] SETDATA controller.data.outgoingPacket.delivered = True
[2016-10-31 10:44:00.747] [D] [zway] SETDATA controller.data.outgoingPacket.deliveryTime = 50 (0x00000032)
[2016-10-31 10:44:00.747] [D] [zway] SETDATA controller.data.outgoingPacket.packetLength = 5 (0x00000005)
[2016-10-31 10:44:00.747] [D] [zway] SETDATA controller.data.outgoingPacket = **********
[2016-10-31 10:44:00.747] [D] [zway] SendData Response with callback 0xe1 received: received by recipient
[2016-10-31 10:44:00.748] [D] [zway] SETDATA devices.16.data.lastSend = 47213509 (0x02d06bc5)
[2016-10-31 10:44:00.748] [D] [zway] Job 0x13 (SwitchBinary Get): success
[2016-10-31 10:44:00.748] [I] [zway] Removing job: SwitchBinary Get
[2016-10-31 10:44:00.748] [D] [zway] SENDING (cb 0xe2): ( 01 09 00 13 0F 02 26 02 25 E2 0B )
[2016-10-31 10:44:00.753] [I] [core] HK: updated ZWayVDev_zway_16-0-38
Any idea what causes the problem?
Anything i could do to get more detailed information?
Regards
Boxson
Re: How to debug system failure
any help? System stopped working again today.
Regards
BoxSon
Regards
BoxSon
Re: How to debug system failure
Try to edit /etc/init.d/z-way-server.Boxson wrote:any help? System stopped working again today.
Regards
BoxSon
There is command line
at the end there is "/dev/null". Replace it with /tmp/zc.log. Maybe, if it crashes, it writes some thing in /tmp/zc.log file. Check your /var/log/messages or /var/log/syslog files, may be you have some thing suspicious there at the time of crash (for time of crash check last record in /var/log/z-way-server.log).start-stop-daemon --start --pidfile $PIDFILE --make-pidfile --background --no-close --chdir $DAEMON_PATH --exec $NAME > /dev/null 2>&1
If it stops to response again, do not reboot the raspberry, but try at first to restart z-way server with
Code: Select all
sudo /etc/init.d/z-way-server restart
check your space if you still have enough
Code: Select all
df -h
I would try these steps.
Re: How to debug system failure
Hi,
thanks for you detailed information.
I've made the changes to the z-way-server. After restarting there is a line within zc.log:
I'll wait now till the system stops again.
df -h shows that there is plenty of space left:
Regards
BoxSon
thanks for you detailed information.
I've made the changes to the z-way-server. After restarting there is a line within zc.log:
Code: Select all
Error: 'debug-port' number is out of range. Skipping it.
df -h shows that there is plenty of space left:
Code: Select all
Filesystem Size Used Avail Use% Mounted on
/dev/root 7.2G 3.9G 3.0G 57% /
devtmpfs 459M 0 459M 0% /dev
tmpfs 463M 0 463M 0% /dev/shm
tmpfs 463M 6.3M 457M 2% /run
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 463M 0 463M 0% /sys/fs/cgroup
/dev/mmcblk0p1 63M 21M 43M 33% /boot
tmpfs 93M 0 93M 0% /run/user/1000
BoxSon
Re: How to debug system failure
I'm having the same problem, not being able to render the web pages on TCP 8083. This happens daily, and today has happened 3 times.
Environment is RPI2 B+ running Jessie and Z-Way 2.2.5.
Nothing useful in /var/log/z-way-server or other system logs. Disk space isn't a problem. /etc/init.d/z-way-server restart (or stop / start) doesn't work; only thing that works is a reboot just like the OP.
Environment is RPI2 B+ running Jessie and Z-Way 2.2.5.
Nothing useful in /var/log/z-way-server or other system logs. Disk space isn't a problem. /etc/init.d/z-way-server restart (or stop / start) doesn't work; only thing that works is a reboot just like the OP.
Re: How to debug system failure
Hi,
I have the same problem. Any news on that?
My configuration:
Hardware
Vendor: RaZberry by Z-Wave.Me
Vendors Product ID: 1024 / 1
Z-Wave Chip: ZW0500
Firmware
Library Type: Static Controller
SDK Version: 6.51.03
Serial API Version: 05.00
Capabillities:
UUID: f7e89785eb05c5d1db83a1753d6d1661
Subvendor: 0x0000
Nodes limit: Unlimited
Capabillities: Slm
Software Information
Version number: v2.2.5
Compile-ID: d58b7617c4f553beca4c6f41705b48fa5a4cf56c
Compile-Date: 2016-11-15 11:42:15 +0300
UI
UI version 1.1.0
My last log entrie(s) before the latest crash yesterday:
ZWayVDev_zway_16-1-50-5 is a current sensor of my Aeon Labs Home Energy Meter G2.
I have the same problem. Any news on that?
My configuration:
Hardware
Vendor: RaZberry by Z-Wave.Me
Vendors Product ID: 1024 / 1
Z-Wave Chip: ZW0500
Firmware
Library Type: Static Controller
SDK Version: 6.51.03
Serial API Version: 05.00
Capabillities:
UUID: f7e89785eb05c5d1db83a1753d6d1661
Subvendor: 0x0000
Nodes limit: Unlimited
Capabillities: Slm
Software Information
Version number: v2.2.5
Compile-ID: d58b7617c4f553beca4c6f41705b48fa5a4cf56c
Compile-Date: 2016-11-15 11:42:15 +0300
UI
UI version 1.1.0
My last log entrie(s) before the latest crash yesterday:
Code: Select all
[2017-01-03 06:48:54.682] [D] [zway] RECEIVED: ( 01 18 00 04 00 10 12 60 0D 01 00 32 02 A1 6C 00 00 03 E4 00 00 00 00 00 00 97 )
[2017-01-03 06:48:54.682] [D] [zway] SENT ACK
[2017-01-03 06:48:54.682] [D] [zway] SETDATA controller.data.incomingPacket.nodeId = 16 (0x00000010)
[2017-01-03 06:48:54.682] [D] [zway] SETDATA controller.data.incomingPacket.frameType = "singlecast"
[2017-01-03 06:48:54.683] [D] [zway] SETDATA controller.data.incomingPacket = **********
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.data.lastReceived = 0 (0x00000000)
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.instances.1.commandClasses.50.data.5.val = 0.996000
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.instances.1.commandClasses.50.data.5.delta = 0 (0x00000000)
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.instances.1.commandClasses.50.data.5.ratetype = 1 (0x00000001)
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.instances.1.commandClasses.50.data.5.previous = 0.000000
[2017-01-03 06:48:54.683] [D] [zway] SETDATA devices.16.instances.1.commandClasses.50.data.5 = Empty
[2017-01-03 06:48:54.695] [I] [core] HK: updated ZWayVDev_zway_16-1-50-5
Re: How to debug system failure
I have the same issue. I've noticed that the log seems to be filled with "Too many open files" errors:
Code: Select all
[2017-01-05 11:10:08.270] [I] [core] Error: can not write back config to storage: Error: Too many open files
[2017-01-05 11:10:08.318] [I] [core] [BaseModule-44] Set lastLevel to 1.2 for ZWayVDev_zway_7-1-50-2 (was 1.1)
[2017-01-05 11:10:08.334] [I] [core] Error: can not write back config to storage: Error: Too many open files
[2017-01-05 11:10:08.351] [I] [core] Error: can not write back config to storage: Error: Too many open files
[2017-01-05 11:15:00.463] [I] [core] --- Code_Device_sensorMultilevel_30 performCommand processing: {"0":"update"}
[2017-01-05 11:15:00.470] [E] [core] Callback execution error: TypeError: Cannot read property 'vd_err_virtual_dev' of null
at _.extend.performCommand (automation/classes/VirtualDevice.js:294:66)
at automation/modules/SensorsPolling/index.js:82:21
at Array.forEach (native)
at AutomationController.onPoll (automation/modules/SensorsPolling/index.js:81:29)
at AutomationController.EventEmitter.emit (automation/lib/eventemitter2.js:317:33)
at automation/modules/Cron/index.js:133:42
at Array.forEach (native)
at automation/modules/Cron/index.js:121:27
at Array.forEach (native)
at Function.<anonymous> (automation/modules/Cron/index.js:118:37)
[2017-01-05 11:17:06.891] [E] [core] Callback execution error: Error: Too many open files
at Error (native)
at CodeDevice.eval (eval at <anonymous> (automation/modules/CodeDevice/index.js:125:29), <anonymous>:1:1)
at CodeDevice.update (automation/modules/CodeDevice/index.js:125:24)
at Function.<anonymous> (automation/modules/CodeDevice/index.js:101:18)
[2017-01-05 11:17:06.966] [E] [core] Callback execution error: Error: Too many open files
at Error (native)
at CodeDevice.eval (eval at <anonymous> (automation/modules/CodeDevice/index.js:125:29), <anonymous>:1:1)
at CodeDevice.update (automation/modules/CodeDevice/index.js:125:24)
at Function.<anonymous> (automation/modules/CodeDevice/index.js:101:18)
[2017-01-05 11:17:07.141] [E] [core] Callback execution error: Error: Too many open files
at Error (native)
at CodeDevice.eval (eval at <anonymous> (automation/modules/CodeDevice/index.js:125:29), <anonymous>:1:1)
at CodeDevice.update (automation/modules/CodeDevice/index.js:125:24)
at Function.<anonymous> (automation/modules/CodeDevice/index.js:101:18)
[2017-01-05 11:17:07.144] [E] [core] Callback execution error: Error: Too many open files
at Error (native)
at CodeDevice.eval (eval at <anonymous> (automation/modules/CodeDevice/index.js:125:29), <anonymous>:1:1)
at CodeDevice.update (automation/modules/CodeDevice/index.js:125:24)
at Function.<anonymous> (automation/modules/CodeDevice/index.js:101:18)
[2017-01-05 11:20:00.815] [I] [core] --- Code_Device_sensorMultilevel_30 performCommand processing: {"0":"update"}
[2017-01-05 11:20:00.821] [E] [core] Callback execution error: TypeError: Cannot read property 'vd_err_virtual_dev' of null
at _.extend.performCommand (automation/classes/VirtualDevice.js:294:66)
at automation/modules/SensorsPolling/index.js:82:21
at Array.forEach (native)
at AutomationController.onPoll (automation/modules/SensorsPolling/index.js:81:29)
at AutomationController.EventEmitter.emit (automation/lib/eventemitter2.js:317:33)
at automation/modules/Cron/index.js:133:42
at Array.forEach (native)
at automation/modules/Cron/index.js:121:27
at Array.forEach (native)
at Function.<anonymous> (automation/modules/Cron/index.js:118:37)
Re: How to debug system failure
And after checking lsof after the system once again stopped working, I found this line repeating ~65000 times:
Code: Select all
z-way-ser 26801 root 1026u IPv4 1063147 0t0 TCP localhost:8083->localhost:39902 (ESTABLISHED)