Since taking 2.0.1-rc18 my z-way-server will become non-responsive after some amount of time. I have a sensor values logging module that logs changes to my energy meter to a REST web service. I will notice these updates stop and at that point the web ui (http://host:8083/) no longer responds.
The z-way-server log continues to handle the z-way related updates. I see regular z-wave update and activity in the log. I see nothing that indicates an error or anything wrong happened around the time the data started updating.
If I attempt to restart the z-way-server service using sudo service z-way-server restart it indicates it shutdown and restarts z-way-server but it doesn't. The same process is running before and after this command.
If I do a sudo kill on the z-way-server process it also will not exit. A kill -9 does kill it and allows it to be started using service and shows up with a new PID. The web server now runs again and everything seems to be working fine, for some time at least.
Any tips on diagnosing this or what I should be looking for to troubleshoot this?
Diagnosing crash of z-way-server
Re: Diagnosing crash of z-way-server
I should add this has happened multiple times and seems to occur after an uptime of around 2 days.
Re: Diagnosing crash of z-way-server
This continually happens to me on z-way using just about every rc release (currently 2.0.1-rc25). The z-way-server is using 97-99% of the CPU so something is "stuck". It seems to be processing z-wave stuff just fine. The z-way-server.log does not show anything abnormal and I see the incoming updates.
The web server will not respond on 8083 but the 8084 port works fine. I'm not sure how to diagnose but here is the backtrace from all threads (gdb> thread apply all backtrace)
It is thread 7 above that is using all the CPU
Hoping this provides any helpful information as this makes z-way unstable and really unusable. Thanks.
The web server will not respond on 8083 but the 8084 port works fine. I'm not sure how to diagnose but here is the backtrace from all threads (gdb> thread apply all backtrace)
Code: Select all
(gdb) thread apply all backtrace
Thread 8 (Thread 0xb5a6f460 (LWP 5091)):
#0 0xb6933770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1 0xb6d28fa4 in v8::internal::Semaphore::Wait() () from /opt/z-way-server/libs/libv8.so
#2 0xb6d9ba80 in v8::internal::SweeperThread::Run() () from /opt/z-way-server/libs/libv8.so
#3 0xb6e40f5c in v8::internal::ThreadEntry(void*) () from /opt/z-way-server/libs/libv8.so
#4 0xb692cc00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#5 0xb6418348 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#6 0xb6418348 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 7 (Thread 0xb5a5f460 (LWP 5092)):
#0 0xb657bb28 in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
#1 0xb6282bc0 in ?? ()
#2 0xb6282bc0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 6 (Thread 0xb525f460 (LWP 5093)):
#0 0xb69342c8 in __lll_lock_wait () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1 0xb692ed1c in pthread_mutex_lock () from /lib/arm-linux-gnueabihf/libpthread.so.0
#2 0xb6dbb8ac in v8::internal::ThreadManager::Lock() () from /opt/z-way-server/libs/libv8.so
#3 0xb6dbbe48 in v8::Locker::Initialize(v8::Isolate*) () from /opt/z-way-server/libs/libv8.so
#4 0xb5a7d58c in ?? ()
#5 0xb5a7d58c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 5 (Thread 0xb4a5f460 (LWP 5095)):
#0 0xb641199c in select () from /lib/arm-linux-gnueabihf/libc.so.6
#1 0xb6425ad8 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#2 0xb6f5d318 in zio_read () from /opt/z-way-server/libs/libzcommons.so
#3 0xb61aca28 in ?? ()
#4 0xb61aca28 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 4 (Thread 0xb425f460 (LWP 5099)):
#0 0xb63e6700 in nanosleep () from /lib/arm-linux-gnueabihf/libc.so.6
#1 0xb6425ad8 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#2 0xb6412280 in usleep () from /lib/arm-linux-gnueabihf/libc.so.6
#3 0xb6f387c0 in zwjs::Timers::TimersFunc(ZRefCountedPointer<zwjs::Thread>) () from /opt/z-way-server/libs/libzwayjs.so
#4 0xb6f369ac in zwjs::Thread::ThreadFuncNative(void*) () from /opt/z-way-server/libs/libzwayjs.so
#5 0xb692cc00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#6 0xb6418348 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#7 0xb6418348 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 3 (Thread 0xb38ff460 (LWP 5100)):
#0 0xb6934840 in read () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1 0xb69340b4 in __pthread_enable_asynccancel () from /lib/arm-linux-gnueabihf/libpthread.so.0
#2 0xb6258900 in ?? () from /usr/lib/arm-linux-gnueabihf/libdns_sd.so.1
#3 0xb6258900 in ?? () from /usr/lib/arm-linux-gnueabihf/libdns_sd.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 2 (Thread 0xb30ff460 (LWP 5101)):
#0 0xb641199c in select () from /lib/arm-linux-gnueabihf/libc.so.6
#1 0xb6425ad8 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#2 0xb628df38 in ?? ()
#3 0xb628df38 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (Thread 0xb6f91000 (LWP 5083)):
#0 0xb63e6700 in nanosleep () from /lib/arm-linux-gnueabihf/libc.so.6
#1 0xb6425ad8 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#2 0xb63e64d0 in sleep () from /lib/arm-linux-gnueabihf/libc.so.6
#3 0x0000ae74 in main ()
(gdb)
Code: Select all
Thread 7 (Thread 0xb5a5f460 (LWP 5092)):
#0 0xb657bab4 in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
#1 0xb6282bc0 in ?? ()
#2 0xb6282bc0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Re: Diagnosing crash of z-way-server
I should add I strongly suspect the sensor values logging module I use as being the culprit. It does a HTTPGET to another server (on the local network, and which is running properly) on data changes from the HEM (energy monitor).
http://xxxx.com/energy/post.php?id=${id}&value=${value}
When the automation UI becomes unresponsive the data updates to this server also stop.
Disabling this isn't a very nice option as this is a very valuable service of z-way for me.
http://xxxx.com/energy/post.php?id=${id}&value=${value}
When the automation UI becomes unresponsive the data updates to this server also stop.
Disabling this isn't a very nice option as this is a very valuable service of z-way for me.
Re: Diagnosing crash of z-way-server
Looks like some hang in JS code.
Did it happen with previous releases? Need to review recent changes in Home Automation.
Did it happen with previous releases? Need to review recent changes in Home Automation.
Re: Diagnosing crash of z-way-server
It has happened for some time (e.g. at least the last rc20 and later) and I would guess further back. The data is logged to a database so I can see when large gaps started to happen. The first as on Jan 30, 2015 but I only started using the sensor values logging module two days earlier. I would guess there have been 35 times the system has frozen in that time.
So basically I would say it has always happened for me and I have been regularly uptaking the 2.0.1 rc releases within a few days of their release.
Let me know if there is anything I can do/provide to assist in diagnosing.
So basically I would say it has always happened for me and I have been regularly uptaking the 2.0.1 rc releases within a few days of their release.
Let me know if there is anything I can do/provide to assist in diagnosing.