RaZBerry High CPU Usage

Discussions about RaZberry - Z-Wave board for Raspberry computer
pofs
Posts: 688
Joined: 25 Mar 2011 19:03

Re: RaZBerry High CPU Usage

Post by pofs »

Lumberjack wrote:How can I do that? I did not explicitly install home automation, just ran the normal install shell script.
Normal install script includes HA. Just follow the instruction above to remove it :)
Lumberjack
Posts: 26
Joined: 04 Mar 2014 03:31

Re: RaZBerry High CPU Usage

Post by Lumberjack »

And some charts to go with it. Blue is the razberry with the problems. I am not measuring individual processes, but from looking at 'top' on a regular basis, I can clearly see it is the z-way-server process that causes it. I recently bought new SD cards and did a clean install. Monitoring server is on another system.

On each razberry I run a 10 seconds poll of the Z-way queue. Will post some info on that a bit later.
Attachments
Screenshot 2014-08-10 09.49.55-small.png
Screenshot 2014-08-10 09.49.55-small.png (138.82 KiB) Viewed 10172 times
Lumberjack
Posts: 26
Joined: 04 Mar 2014 03:31

Re: RaZBerry High CPU Usage

Post by Lumberjack »

pofs wrote:
Lumberjack wrote:How can I do that? I did not explicitly install home automation, just ran the normal install shell script.
Normal install script includes HA. Just follow the instruction above to remove it :)
Completely overlooked it. Sorry about that. Thanks anyway for the quick reply.
Lumberjack
Posts: 26
Joined: 04 Mar 2014 03:31

Re: RaZBerry High CPU Usage

Post by Lumberjack »

More analysis on my little python app that issues a:

Code: Select all

http://localhost:8083/ZWaveAPI/InspectQueue
for each of the raspberries every 10 seconds. It runs locally on each razberry so network issues cannot influence it. Default timeout for the call is 10 seconds.

When I get a timeout error it reports it in the log as

Code: Select all

2014-08-10 00:21:14,836 - ERROR - Unable to invoke webservice. Exception: HTTPConnectionPool(host='localhost', port=8083): Read timed out. (read timeout=10)
On razberry 2, 3, 4 I see these timeouts every 15 minutes and around 3 or 4 times in a row. Example:

Code: Select all

2014-08-10 08:54:08,736 - ERROR - Unable to invoke webservice. Exception: HTTPConnectionPool(host='localhost', port=8083): Read timed out. (read timeout=10)
2014-08-10 09:10:46,879 - ERROR - Unable to invoke webservice. Exception: HTTPConnectionPool(host='localhost', port=8083): Read timed out. (read timeout=10)
2014-08-10 09:11:06,998 - ERROR - Unable to invoke webservice. Exception: HTTPConnectionPool(host='localhost', port=8083): Read timed out. (read timeout=10)
2014-08-10 09:11:27,119 - ERROR - Unable to invoke webservice. Exception: HTTPConnectionPool(host='localhost', port=8083): Read timed out. (read timeout=10)
2014-08-10 09:23:50,391 - ERROR - Unable to invoke webservice. Exception: HTTPConnectionPool(host='localhost', port=8083): Read timed out. (read timeout=10)
On my raspberry 1 this happened much more frequent before I rebooted it. Basically the API was unavailable for 2 minutes, then got back for about 2 minutes and then back to unavailable for 2 minutes etc.

I hope the intervals may be of help to the z-way team to find what is causing this. Meanwhile I will disable the Home Automation part and see if this have any effect over a couple of days. If you need more info, I will be happy to provide it.
Lumberjack
Posts: 26
Joined: 04 Mar 2014 03:31

Re: RaZBerry High CPU Usage

Post by Lumberjack »

Ok, so did:

Code: Select all

cd /opt/z-way-server
sudo /etc/init.d/z-way-server stop
mkdir automationbackup
mv automation/* automationbackup/
sudo mv automation/* automationbackup/
cd automation
touch main.js
sudo /etc/init.d/z-way-server start
and it is running fine on all four raspberries since more than an hour. Not a single timeout and CPU load of the z-way-server is also at 5% which is good. Looking promising. Will monitor the load for the coming days and will post an update to see if it can be attributed to the home automation part of the software.
pofs
Posts: 688
Joined: 25 Mar 2011 19:03

Re: RaZBerry High CPU Usage

Post by pofs »

Actually even 5% is quite high for a standby :(
It is quite bothersome to fix it in current version, but the next one will have a redesigned JS core with smaller (about 1%) CPU usage.
pz1
Posts: 2053
Joined: 08 Apr 2012 13:44

Re: RaZBerry High CPU Usage

Post by pz1 »

pofs wrote:Actually even 5% is quite high for a standby :(
My setup of 18 devices takes from 3.9-5.5% with only battery polling active. (No calls from OpenRemote)
Since 29-12-2016 I am no longer a moderator for this forum
droll
Posts: 48
Joined: 20 Dec 2013 01:37

Re: RaZBerry High CPU Usage

Post by droll »

Just found this thread discusses some problems that are also discussed in thread viewtopic.php?f=3422&t=20397&start=20#p52396. Both threads mention HTTP request timeouts, high CPU loads, etc.
Mirar
Posts: 113
Joined: 19 Oct 2014 16:54
Location: Stockholm

Re: RaZBerry High CPU Usage

Post by Mirar »

I have this issue right now. I just upgraded to 1.7.2 so the server is quiet at the moment... I'm getting a lot of
"Error 500: Internal Server Error
Code took too long to return result"
when trying to do something on the API as well. strace revealed nothing interesting, nothing interesting in the logs. I'll check the threads when it starts eating CPU again, if it's still interesting.
Mirar
Posts: 113
Joined: 19 Oct 2014 16:54
Location: Stockholm

Re: RaZBerry High CPU Usage

Post by Mirar »

That didn't take long. :p

htop gives:

Code: Select all

 3306 root       20   0  527M 67044  9084 S 15.0 15.0  2h22:00 z-way-server                                                            
 3364 root       20   0  527M 67044  9084 S 11.0 15.0  2h07:40 z-way-server
 3308 root       20   0  527M 67044  9084 S  3.0 15.0  9:40.64 z-way-server
 3365 root       20   0  527M 67044  9084 S  1.0 15.0  2:39.96 z-way-server
 3313 root       20   0  527M 67044  9084 S  0.0 15.0  0:07.15 z-way-server
 3312 root       20   0  527M 67044  9084 S  0.0 15.0  0:18.78 z-way-server
 3316 root       20   0  527M 67044  9084 S  0.0 15.0  1:28.92 z-way-server
there's 56 threads in z-way-server in total. 3306 is the PID.

3386 is stuck in a very tight (100% CPU):

Code: Select all

gettimeofday({1413749642, 282848}, NULL) = 0
gettimeofday({1413749642, 286211}, NULL) = 0
open("automation/storage/configjson-06b2d3b23dce96e1619d2b53d6c947ec.json", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 11
fstat64(11, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6800000
write(11, "{\"controller\":{},\"vdevInfo\":{\"ZW"..., 12288) = 12288
write(11, "le\":\"Lux\",\"level\":499.0000128,\"i"..., 2793) = 2793
close(11)                               = 0
munmap(0xb6800000, 4096)                = 0
open("automation/storage/schemasjson-161515635bf81aaee9a368d9f07cfc85.json", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 11
fstat64(11, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6800000
write(11, "[]", 2)                      = 2
close(11)                               = 0
munmap(0xb6800000, 4096)                = 0
open("automation/storage/configjson-06b2d3b23dce96e1619d2b53d6c947ec.json", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 11
fstat64(11, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6800000
write(11, "{\"controller\":{},\"vdevInfo\":{\"ZW"..., 12288) = 12288
write(11, "le\":\"Lux\",\"level\":499.0000128,\"i"..., 2793) = 2793
close(11)                               = 0
munmap(0xb6800000, 4096)                = 0
open("automation/storage/schemasjson-161515635bf81aaee9a368d9f07cfc85.json", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 11
fstat64(11, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6800000
write(11, "[]", 2)                      = 2
close(11)                               = 0
munmap(0xb6800000, 4096)                = 0
I will now clear the automation directory since it seems to be the culprit and I don't need it (at the moment).
Post Reply