this inconsistent z-way-server crashes are an issue that is known by us since months ago but unfortunately hard to reproduce and to debug - I know thats no excuse for your circumstances.
We hoped latest stable release v2.3.1 could solve most of them but that seems to be not the case ...
This inconsistences also appear not on every installation. Most of them are running without problems.
At the moment we suspect the access to sockets. Means the registered ones got lost or stuck after several time what leads into errors and finally let's the OS kill the z-way-server process ...
Unfortunately in most cases the z-way-server.log gives us no hint why it crashes - only what happened in z-way before. But with help of gdb debugger it's possible to check for the error - but not the cause (that's what we need to find out)
Here is a short How To:
The best would be, to do that within one screen session.
- start ssh connection to your box
- start screen session (you have to install that eventually, you can do this with: "$sudo apt-get install screen")
Code: Select all
$ screen
- change current user to 'root' user - this will avoid errornous notification during server is running
Code: Select all
$ sudo su
- and then start the gdb procedure ... (see below)
You can log out of the session with: "$detach" and when you reconnect through SSH, then you will be logged in at the appropriate point.
You can alternatively keep that window open.
Code: Select all
$ exit
Here is more about that: https://wiki.ubuntuusers.de/Screen/
GDB:
- start ssh connection to your box
- stop current z-way-server:
Code: Select all
$ sudo /etc/init.d/z-way-server stop
- switch to the z-way-server directory:
Code: Select all
$ cd /opt/z-way-server
- start z-way-server with gdb debugger (should be already installed):
Code: Select all
$ LD_LIBRARY_PATH=./libs gdb ./z-way-server
- after first insert prompt type in ‘r’ for start
- confirm one time with ‘c’ for continue
- z-way-server is starting…
Code: Select all
Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x743ff450 (LWP 22061)]
0x769882f4 in send () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb)
Code: Select all
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x71fff450 (LWP 21411)]
0x75728588 in zwjs::SocketConnection::IsConfigured() const ()
from ./modules/modsockets.so
(gdb)
Code: Select all
Program received signal SIGHUP, Hangup.
0x76403360 in nanosleep () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb)
If such a message occurs please enter 'info thread' and 'bt' to get more information about this error, e.g:
Code: Select all
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x71fff450 (LWP 21411)]
0x75728588 in zwjs::SocketConnection::IsConfigured() const ()
from ./modules/modsockets.so
(gdb) info thread
Id Target Id Frame
11 Thread 0x71eff450 (LWP 20033) "zway/sockets" 0x7642e964 in select ()
at ../sysdeps/unix/syscall-template.S:81
10 Thread 0x726ff450 (LWP 20032) "zway/timers" 0x76403360 in nanosleep ()
at ../sysdeps/unix/syscall-template.S:81
9 Thread 0x738ff450 (LWP 20031) "zway/core" 0x7642e964 in select ()
at ../sysdeps/unix/syscall-template.S:81
8 Thread 0x742ff450 (LWP 20030) "zway/webserver" 0x7642e964 in select ()
at ../sysdeps/unix/syscall-template.S:81
7 Thread 0x74e7f450 (LWP 20029) "zway/core" 0x76403360 in nanosleep ()
at ../sysdeps/unix/syscall-template.S:81
6 Thread 0x74e8f450 (LWP 20028) "v8:SweeperThrea" 0x76986a40 in do_futex_wait (isem=isem@entry=0x64764) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
5 Thread 0x74e9f450 (LWP 20027) "v8:SweeperThrea" 0x76986a40 in do_futex_wait (isem=isem@entry=0x6465c) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
4 Thread 0x74eaf450 (LWP 20026) "v8:SweeperThrea" 0x76986a40 in do_futex_wait (isem=isem@entry=0x64554) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
3 Thread 0x74ebf450 (LWP 20025) "v8:SweeperThrea" 0x76986a40 in do_futex_wait (isem=isem@entry=0x6444c) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
2 Thread 0x756bf450 (LWP 20024) "OptimizingCompi" 0x76986a40 in do_futex_wait (isem=isem@entry=0x64304) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
* 1 Thread 0x7634b000 (LWP 20021) "z-way-server" 0x76403360 in nanosleep ()
at ../sysdeps/unix/syscall-template.S:81
(gdb) bt
#0 0x76403360 in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x76403098 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2 0x0000aeb4 in main ()
(gdb)
After this, you can exit the debugger with ‘q’ for quit and restart it with:
Code: Select all
$ sudo /etc/init.d/z-way-sever restart
Code: Select all
$ sudo killall -9 z-way-server
We need also information about your system characteristics (http://YOUR_BOX_IP:8083/expert/#/network/controller):
- installed z-way version (Software Information > Version Number)
- type of controller (RaZ / RaZ 2 / UZB) + f/w (Firmware > Serial API Version)
- list of your running apps (especially Sonos, MQTT, Global Caché, Fibaro API are interesting because they're using sockets)
- is your system plain and dedicated for z-way-server? Means are there no more libs or services running running in addition to z-way-server and the common OS installation? If yes which ones (are they using sockets)?
- snippet of your z-way-server.log (200 lines from the point before it crashes)
- OPTIONAL: also backups of your system can help us to reproduce your issues (local backups from Smarthome/Expert UI)
We'll handle them with care and are also able to channel issue news or requests direct to you.
Otherwise feel free to support us with information, statistics and debugging. Of course we'll share our new findings with you in this thread.
Hopefully this all will bring us big step forward to solve this issue asap

PS:
Seems that the attached screens are a bit mixed ...
- gdb_r.png
- gdb_c1.png
- gdb_running.png