ZWay crashing daily

Discussions about Z-Way software and Z-Wave technology in general
User avatar
ronie
Posts: 58
Joined: 29 Jun 2015 09:50

Re: ZWay crashing daily

Post by ronie »

Hi @ all,

this inconsistent z-way-server crashes are an issue that is known by us since months ago but unfortunately hard to reproduce and to debug - I know thats no excuse for your circumstances.

We hoped latest stable release v2.3.1 could solve most of them but that seems to be not the case ...
This inconsistences also appear not on every installation. Most of them are running without problems.

At the moment we suspect the access to sockets. Means the registered ones got lost or stuck after several time what leads into errors and finally let's the OS kill the z-way-server process ...
Unfortunately in most cases the z-way-server.log gives us no hint why it crashes - only what happened in z-way before. But with help of gdb debugger it's possible to check for the error - but not the cause (that's what we need to find out)

Here is a short How To:

The best would be, to do that within one screen session.
  1. start ssh connection to your box
  2. start screen session (you have to install that eventually, you can do this with: "$sudo apt-get install screen")

    Code: Select all

    $ screen
  3. change current user to 'root' user - this will avoid errornous notification during server is running

    Code: Select all

    $ sudo su
  4. and then start the gdb procedure ... (see below)
The advantage of this, is that this can be active in the background.
You can log out of the session with: "$detach" and when you reconnect through SSH, then you will be logged in at the appropriate point.
You can alternatively keep that window open.

Code: Select all

$ exit 
will end the screen session within the session.

Here is more about that: https://wiki.ubuntuusers.de/Screen/

GDB:
  • start ssh connection to your box
  • stop current z-way-server:

    Code: Select all

    $ sudo /etc/init.d/z-way-server stop
  • switch to the z-way-server directory:

    Code: Select all

    $ cd /opt/z-way-server
  • start z-way-server with gdb debugger (should be already installed):

    Code: Select all

    $ LD_LIBRARY_PATH=./libs gdb ./z-way-server
  • after first insert prompt type in ‘r’ for start
    continue first break
    continue first break
    gdb_c1.png (10.12 KiB) Viewed 49318 times
  • confirm one time with ‘c’ for continue
    start gdb
    start gdb
    gdb_r.png (26.02 KiB) Viewed 49318 times
  • z-way-server is starting…
    The attachment gdb_running.png is no longer available
When the server crashes, a message similar as one of the following appears:

Code: Select all

Program received signal SIGPIPE, Broken pipe. 
[Switching to Thread 0x743ff450 (LWP 22061)] 
0x769882f4 in send () at ../sysdeps/unix/syscall-template.S:81 
81 ../sysdeps/unix/syscall-template.S: No such file or directory. 
(gdb)

Code: Select all

Program received signal SIGSEGV, Segmentation fault. 
[Switching to Thread 0x71fff450 (LWP 21411)] 
0x75728588 in zwjs::SocketConnection::IsConfigured() const () 
from ./modules/modsockets.so 
(gdb)

Code: Select all

Program received signal SIGHUP, Hangup. 
0x76403360 in nanosleep () at ../sysdeps/unix/syscall-template.S:81 
81 ../sysdeps/unix/syscall-template.S: No such file or directory. 
(gdb)
If you detect other ones please let us know.


If such a message occurs please enter 'info thread' and 'bt' to get more information about this error, e.g:

Code: Select all

Program received signal SIGSEGV, Segmentation fault. 
[Switching to Thread 0x71fff450 (LWP 21411)] 
0x75728588 in zwjs::SocketConnection::IsConfigured() const () 
from ./modules/modsockets.so 
(gdb) info thread 
Id Target Id Frame 
11 Thread 0x71eff450 (LWP 20033) "zway/sockets" 0x7642e964 in select () 
at ../sysdeps/unix/syscall-template.S:81 
10 Thread 0x726ff450 (LWP 20032) "zway/timers" 0x76403360 in nanosleep () 
at ../sysdeps/unix/syscall-template.S:81 
9 Thread 0x738ff450 (LWP 20031) "zway/core" 0x7642e964 in select () 
at ../sysdeps/unix/syscall-template.S:81 
8 Thread 0x742ff450 (LWP 20030) "zway/webserver" 0x7642e964 in select () 
at ../sysdeps/unix/syscall-template.S:81 
7 Thread 0x74e7f450 (LWP 20029) "zway/core" 0x76403360 in nanosleep () 
at ../sysdeps/unix/syscall-template.S:81 
6 Thread 0x74e8f450 (LWP 20028) "v8:SweeperThrea" 0x76986a40 in do_futex_wait (isem=isem@entry=0x64764) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48 
5 Thread 0x74e9f450 (LWP 20027) "v8:SweeperThrea" 0x76986a40 in do_futex_wait (isem=isem@entry=0x6465c) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48 
4 Thread 0x74eaf450 (LWP 20026) "v8:SweeperThrea" 0x76986a40 in do_futex_wait (isem=isem@entry=0x64554) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48 
3 Thread 0x74ebf450 (LWP 20025) "v8:SweeperThrea" 0x76986a40 in do_futex_wait (isem=isem@entry=0x6444c) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48 
2 Thread 0x756bf450 (LWP 20024) "OptimizingCompi" 0x76986a40 in do_futex_wait (isem=isem@entry=0x64304) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48 
* 1 Thread 0x7634b000 (LWP 20021) "z-way-server" 0x76403360 in nanosleep () 
at ../sysdeps/unix/syscall-template.S:81 
(gdb) bt 
#0 0x76403360 in nanosleep () at ../sysdeps/unix/syscall-template.S:81 
#1 0x76403098 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137 
#2 0x0000aeb4 in main () 
(gdb)
This gives us some information what happend and why it has crashed but unfortunately not the reason(s) that lead into it ...

After this, you can exit the debugger with ‘q’ for quit and restart it with:

Code: Select all

$ sudo /etc/init.d/z-way-sever restart
If the port 8083 is still occupied, then please stop the Z-Way-Server manually with:

Code: Select all

$ sudo killall -9 z-way-server
Please collect this stacktraces over some time maybe (3-4 days) and report them to us.

We need also information about your system characteristics (http://YOUR_BOX_IP:8083/expert/#/network/controller):
  • installed z-way version (Software Information > Version Number)
  • type of controller (RaZ / RaZ 2 / UZB) + f/w (Firmware > Serial API Version)
  • list of your running apps (especially Sonos, MQTT, Global Caché, Fibaro API are interesting because they're using sockets)
  • is your system plain and dedicated for z-way-server? Means are there no more libs or services running running in addition to z-way-server and the common OS installation? If yes which ones (are they using sockets)?
  • snippet of your z-way-server.log (200 lines from the point before it crashes)
  • OPTIONAL: also backups of your system can help us to reproduce your issues (local backups from Smarthome/Expert UI)
If you don't want to share your private data with all other users please send them bundeled to support@zwaveeurope.com
We'll handle them with care and are also able to channel issue news or requests direct to you.
Otherwise feel free to support us with information, statistics and debugging. Of course we'll share our new findings with you in this thread.

Hopefully this all will bring us big step forward to solve this issue asap :)

PS:
Seems that the attached screens are a bit mixed ...
  • gdb_r.png
  • gdb_c1.png
  • gdb_running.png
is the correct order.
Attachments
gdb running
gdb running
gdb_running.png (36.65 KiB) Viewed 49318 times
stellavision
Posts: 10
Joined: 25 Feb 2013 21:54

Re: ZWay crashing daily

Post by stellavision »

Backed up everything, reinstalled from scratch, restored configuration and everything back to normal.
10neWulf
Posts: 41
Joined: 21 Nov 2016 14:16
Location: Australia

Re: ZWay crashing daily

Post by 10neWulf »

@ronie I've started the process, and will let you know when it crashes.

The sockets issue could be why this has started happening now, as I have more devices polling the API?

Anyway, it's crashing roughly twice per day now, so i'll let you know after a few crashes
10neWulf
Posts: 41
Joined: 21 Nov 2016 14:16
Location: Australia

Re: ZWay crashing daily

Post by 10neWulf »

My Server just crashed again, with the following message:

*** glibc detected *** /opt/z-way-server/z-way-server: free(): invalid next size (fast): 0x737d3ab0 ***

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x6c7ff460 (LWP 25850)]
0x763cf8dc in raise () from /lib/arm-linux-gnueabihf/libc.so.6

I've sent an email to the email address above with all the information - hopefully this helps :)
Benny
Posts: 48
Joined: 25 Jan 2017 15:50

Re: ZWay crashing daily

Post by Benny »

After my Sytem was broken I did a new Installation on my Pi3, Raz2, 80 devices. Newest Jessie, Zway V2.3.1. Low Automation grade, just a few lamps. It worked fine for two weeks. Then I got two Issues.

First one:
There were some kind of routing Problems, happens two times. First time Lamp 1 should turn on, but Lamp 2 did, second time Lamp 5 should turn on, but blind 1 got down. Doesn't recognize this issue again.

The Second:
After the two weeks Z-Way stopped working/logging. No automatet light, but I was able to login. The Log was empty, no Temperature or something else was transmitted and no app was working after "crash". I just had to switch a device in the Smart Home UI to bring it back to life. A restart of ZWay wasn't neccessary. This happend three times in the last 4 weeks. Everytime the same procedure, I just have to login and switch the state from one device to get Z-Way back to work.
pimth
Posts: 48
Joined: 09 Jul 2016 18:02

Re: ZWay crashing daily

Post by pimth »

I am testing 2.3.4 ....
pimth
Posts: 48
Joined: 09 Jul 2016 18:02

Re: ZWay crashing daily

Post by pimth »

not better!
User avatar
ronie
Posts: 58
Joined: 29 Jun 2015 09:50

Re: ZWay crashing daily

Post by ronie »

Please checkout this github post:

https://github.com/Z-Wave-Me/z-way-issues/issues/125

seems that the problem could be related in very big notification.json

As a workaround you can try one of the following options:

stop z-way-server

Code: Select all

$ sudo /etc/init.d/z-way-server stop
a.) remove notification.json from automation/storage:

Code: Select all

$ sudo rm /opt/z-way-server/automation/storage/notifications-f37bd2f66651e7d46f6d38440f2bc5dd.json
b.) overwrite notification.json from automation/storage with []

Code: Select all

$ sudo sh -c 'echo "[]" > /opt/z-way-server/automation/storage/notifications-f37bd2f66651e7d46f6d38440f2bc5dd.json'
c.) or reduce it's content with an editor to the last entries and save it

and finally restart the server:

Code: Select all

$ sudo /etc/init.d/z-way-server start
Keep in mind that if you clear it, all notifications of the last week are gone.

Meanwhile we'll refactor the notification handling und reduce there RAM usage.

Also 24 Hours Device History app related to a z-wave household with a lot of devices could be a reason for that. Stopping the app could be a try to solve the issue. In next future we'll also enhance this app, too.

If you still have problem's please have a look at the debugging steps on page 2. Collect your issues and send it to our support ( support@zwaveeurope.com ), post it here in the thread or on github.
CudaNet
Posts: 58
Joined: 10 Jul 2016 22:32

Re: ZWay crashing daily

Post by CudaNet »

40+ devices.. Rebooted yesterday.

uptime: 17:25:04 up 1 day, 7:49, 0 users, load average: 0.54, 0.27, 0.15

/opt/z-way-server/automation/storage

total 47704
-rw-r--r-- 1 root root 9374 Mar 29 08:20 admin1490793654441gif-a04e296a4b28afbaa1a336a7ba3dd07a.json
-rw-r--r-- 1 root root 54745 Apr 24 17:22 configjson-06b2d3b23dce96e1619d2b53d6c947ec.json
-rw-r--r-- 1 root root 149 Mar 29 20:48 expertconfigjson-0ef43e77bc4a34ec19a6c355a525b65c.json
-rw-r--r-- 1 root root 224458 Apr 24 17:20 incomingPacketjson-5fa134bd40ea2f6f328252a67a68d93d.json
-rw-r--r-- 1 root root 13 Mar 29 08:03 moduleTokensjson-e29993de748adf5cf0e062ce571f1bc1.json
-rw-r--r-- 1 root root 48336410 Apr 24 17:00 notifications-f37bd2f66651e7d46f6d38440f2bc5dd.json
-rw-r--r-- 1 root root 92396 Apr 24 17:20 originPacketsjson-a9224461330689488046261dbbc78a6d.json
-rw-r--r-- 1 root root 103637 Apr 24 17:20 outgoingPacketjson-5ac52a30a37d9ec108b8b540a6ee42cf.json
-rw-r--r-- 1 root root 178 Apr 23 09:35 storageContent-0c40aa7c27d2121efdfa27fff03c9548.json
-rw-r--r-- 1 root root 737 Apr 23 09:26 userSkinsjson-3b22ba526c58b899e53cfc3217141334.json

total used free shared buffers cached
Mem: 947732 375212 572520 6340 43736 151376
User avatar
ronie
Posts: 58
Joined: 29 Jun 2015 09:50

Re: ZWay crashing daily

Post by ronie »

-rw-r--r-- 1 root root 48336410 Apr 24 17:00 notifications-f37bd2f66651e7d46f6d38440f2bc5dd.json
gosh ... that seems to be realy huge ... usually every entry older than one week should be removed daily at 0:00 am but I think if the file is this big, this won't work - too much data. Possible that this routine will crash z-way-server after several time each day ...
we need to find out why the routine fails.

What happens if you try one of the steps above? Will it work after?

Of course you can make a backup of the whole folder and restore it if necessary.
Post Reply