Strange problem

Discussions about Z-Way software and Z-Wave technology in general
hapination
Posts: 13
Joined: 14 Jan 2013 16:35

Strange problem

Post by hapination »

I'm experiencing a really weird issue. This has probably been oing on for a long time but I have been trying to replicate the issue.

Everything works correctly once I start up the z-agent on my linux raspberry.

For about ten minutes all is fine. But then suddenly all by zwave devices go "offline". If I try to update their statuses I get the following log entries:
2013-04-29 21:36:35,401 Z-Wave.INFO Send to device 4, instance 0, switch binary get: Dropping command: too much resends
2013-04-29 21:36:35,394 Z-Wave.INFO Send to device 4, instance 0, switch binary get: No ACK received before timeout
2013-04-29 21:36:32,078 Z-Wave.INFO Send to device 4, instance 0, switch binary get: No ACK received before timeout
2013-04-29 21:36:28,774 Z-Wave.INFO Send to device 4, instance 0, switch binary get: No ACK received before timeout

If I then restart the zagent software again, everything works again for another 10 minutes.

If I execute the zagent with the debug option I don't see anything suspicious in the log file. Also the server log file doesn't reveal anything even if set to verbose or debug level.

I'm now a bit lost as to where to start checking next. I've tried re-installing the whole linux distro from scratch but the same problem still occurs.
hapination
Posts: 13
Joined: 14 Jan 2013 16:35

... and finally the z-wave.me

Post by hapination »

... and finally the z-wave.me website informs that:
Your Z-Connector is not connected to Z-Cloud server

If I restart the z-agent everything works again, for another 10 minutes.

After some time the website status changes to connected, but still no packets are being sent from/to the devices. Also the check for failed node is reporting the node to be alive.
2013-04-30 11:27:39,529 Z-Wave.INFO Send to device 2, instance 0, meter get (v2): Dropping command: too much resends
2013-04-30 11:27:39,526 Z-Wave.INFO Send to device 2, instance 0, meter get (v2): No ACK received before timeout
2013-04-30 11:27:36,222 Z-Wave.INFO Send to device 2, instance 0, meter get (v2): No ACK received before timeout
2013-04-30 11:27:32,912 Z-Wave.INFO Send to device 2, instance 0, meter get (v2): No ACK received before timeout
2013-04-30 11:20:27,037 Z-Wave.INFO Set new SUC/SIS: Failed
2013-04-30 11:20:26,708 Z-Wave.INFO Enabling SIS: Failed: trying to disable running SUC?
2013-04-30 11:20:24,191 Z-Wave.INFO Check if node is failed: Node 2 is operating
User avatar
PoltoS
Posts: 7571
Joined: 26 Jan 2011 19:36

MAy be you have two z-agents

Post by PoltoS »

MAy be you have two z-agents running?
User avatar
PoltoS
Posts: 7571
Joined: 26 Jan 2011 19:36

This is the log from your

Post by PoltoS »

This is the log from your last three connections:
2013-04-30 11:42:55,698 Serial.ERROR Could not read from socket None:0. [Errno 113] No route to host
2013-04-30 11:42:55,700 Serial.ERROR Could not close socket None:0. [Errno 107] Transport endpoint is not connected
...
2013-04-30 12:25:56,330 Serial.ERROR Could not read from socket None:0. [Errno 110] Connection timed out
2013-04-30 12:25:56,341 Serial.ERROR Could not close socket None:0. [Errno 107] Transport endpoint is not connected
...
2013-04-30 13:21:31,864 Serial.ERROR Could not read from socket None:0. [Errno 110] Connection timed out
2013-04-30 13:21:31,869 Serial.ERROR Could not close socket None:0. [Errno 107] Transport endpoint is not connected
...

So, this is certainly a connection problem on your side.
hapination
Posts: 13
Joined: 14 Jan 2013 16:35

I verified and I'm only

Post by hapination »

I verified and I'm only running one instance of zagent.
I even switched to the Windows version and I'm seeing the same issue there.

Here are the latest test results/timestamp. I kept pressing one of the update buttons forone of my device every few minutes. And as can be seen it worked until suddenly it just stopped working.
I then restarted my zagent software and was able to repeat the issue again.

Even now at 22:54, the website is still reporting that I'm connected. Also the z-connector software is just stating that 'Loop started'.

Also I kept surfing the web at the same time so the internet connection was alive.

I can also see that the tcp port is connected:
46.20.244.23:9087 ESTABLISHED


2013-05-08 22:31:15,911 Z-Wave.INFO Send to device 4, instance 0, switch binary get: Dropping command: too much resends
2013-05-08 22:31:15,908 Z-Wave.INFO Send to device 4, instance 0, switch binary get: No ACK received before timeout
2013-05-08 22:31:12,605 Z-Wave.INFO Send to device 4, instance 0, switch binary get: No ACK received before timeout
2013-05-08 22:31:09,296 Z-Wave.INFO Send to device 4, instance 0, switch binary get: No ACK received before timeout
2013-05-08 22:28:04,377 Rules.INFO Event: from 4:0 to 1:0, Report, value: [37, 255]
2013-05-08 22:28:04,175 Z-Wave.INFO Send to device 4, instance 0, switch binary get: Delivered
2013-05-08 22:27:32,388 Rules.INFO Event: from 4:0 to 1:0, Report, value: [37, 255]
2013-05-08 22:27:32,236 Z-Wave.INFO Send to device 4, instance 0, switch binary get: Delivered
2013-05-08 22:25:40,311 Rules.INFO Event: from 4:0 to 1:0, Report, value: [37, 255]
2013-05-08 22:25:40,161 Z-Wave.INFO Send to device 4, instance 0, switch bi


2013-05-08 22:49:09,231 Z-Wave.INFO Send to device 4, instance 0, switch binary get: Dropping command: too much resends
2013-05-08 22:49:09,229 Z-Wave.INFO Send to device 4, instance 0, switch binary get: No ACK received before timeout
2013-05-08 22:49:05,926 Z-Wave.INFO Send to device 4, instance 0, switch binary get: No ACK received before timeout
2013-05-08 22:49:02,610 Z-Wave.INFO Send to device 4, instance 0, switch binary get: No ACK received before timeout
2013-05-08 22:43:41,295 Rules.INFO Event: from 4:0 to 1:0, Report, value: [37, 255]
2013-05-08 22:43:41,190 Z-Wave.INFO Send to device 4, instance 0, switch binary get: Delivered
2013-05-08 22:42:52,384 Rules.INFO Event: from 4:0 to 1:0, Report, value: [37, 255]
2013-05-08 22:42:52,228 Z-Wave.INFO Send to device 4, instance 0, switch binary get: Delivered
2013-05-08 22:41:45,846 Rules.INFO Event: from 4:0 to 1:0, Report

I'm wondering what the logs say on your side for these timstamps. Since I have nothing which would point me in any direction.

hapination
Posts: 13
Joined: 14 Jan 2013 16:35

Now for the first time I got

Post by hapination »

Now for the first time I got the socket gone error dialog (using the Windows client). But this came about 15 minutes after the first occurrences of problems in the ui.

Also I had the 'reconnect if connection lost ' but that did not try to re-establish the connection.

My guess is that the root cause of this issue is a network related issue. However the software probably should notice the socket gone error much earlierand re-establish the connection automatically.

In the unix client I did not see the option of automatically reconnect anywhere.
User avatar
PoltoS
Posts: 7571
Joined: 26 Jan 2011 19:36

In unix it will run the

Post by PoltoS »

In unix it will run the script permanently (use Run_Z-Agent.sh).

It might not see that the socket was lost untill it tries to send something. So, if Z-Cloud tries to communicate, it will understand that there is no connection, but your side will still think it is connected. TCP Keep Alive is about 15 min or 1 hour (depends on OS).

May be it is a f/w issue?
hapination
Posts: 13
Joined: 14 Jan 2013 16:35

The problem very likely lies

Post by hapination »

The problem very likely lies in my internet connection. I'm using 3g. Although the internet connection remains open constantly, there is probably some logic on the operators side which cuts the socket (eg switching cells etc).
The whole setup works correctly at another location with a different internet connection.

What I will need to do is to try to create constant traffic to the socket. I can probably establish this by setting one of my power meter units to deliver it's values with a very short interval. Hopefully this will either keep the socket open or allow the zagent to notice the socket gone and re-establish connections.
hapination
Posts: 13
Joined: 14 Jan 2013 16:35

That did the trick :) I

Post by hapination »

That did the trick :) I changed a power meter to report it's status every 30 seconds. Apparently this creates enough traffic to keep the socket open constantly.

Would it be possible to add an option to the z-agent program to have a keep alive option?
User avatar
PoltoS
Posts: 7571
Joined: 26 Jan 2011 19:36

Great!

Post by PoltoS »

Great!

TCP KeepALive will not help. An easier solution is to poll something as you did.
Post Reply