Repeated corruption of SDcards

Discussions about RaZberry - Z-Wave board for Raspberry computer
pz1
Posts: 2053
Joined: 08 Apr 2012 13:44

Re: Repeated corruption of SDcards

Post by pz1 » 15 Apr 2016 12:32

nochkin wrote:I know this is a step which you may not be able to do, but I'll try anyway:
When it crashes next time, please try to connect it to a monitor over HDMI to see what's on the screen.
If you reboot, you may see some errors too.
The source of the crash could be somewhere at the higher level (Z-Way bug?).
Yesterday one PI with ZWay, but no other Z-Way devices than the UZB1 did crash. I have been doing reliability testing with UPS Pico.
I did connect the Pi to my television set to only see lots of things passing the screen. To fast to cpature the essence. I think many memory blocks were being checked. When it halted I noted that CPUx had stopped. Two of those instances an the last screen. I am pretty sure this meant a severely damaged SD card. I have as yet to check this.

nochkin
Posts: 38
Joined: 29 Feb 2016 05:05

Re: Repeated corruption of SDcards

Post by nochkin » 16 Apr 2016 06:03

You did not take a screenshot on your phone or something?

pz1
Posts: 2053
Joined: 08 Apr 2012 13:44

Re: Repeated corruption of SDcards

Post by pz1 » 16 Apr 2016 10:03

Hardly worth the effort, because it scrolls very fast. That makes it difficult to catch the interesting parts.
Besides the production system with /var/log on the USB stick is still running fine since last restarted on 31/3.

DarS
Posts: 12
Joined: 24 Mar 2015 07:48

Re: Repeated corruption of SDcards

Post by DarS » 02 May 2016 08:16

@ pz1
Thanks for the hint on moving logs to USB flash stick! Nice workaround!
But I think we should put pressure on Z-Way developers to fix this permanently and not leave this crap with users to deal with. I am another victim of SD card being defective after several months on operation (and to put in right context: yes, I DO HAVE two other Raspberry Pi boards which work much longer without SD card corruption).

SD card wear out happens and will happen. Developers must understand this and apply appropriate measures to reduce it to the minimum. I hardly find appropriate that tiny Z-Wave event (like RECEIVED: 01 10 00 04 00 03 0A 32 02 21 34 00 00 00 00 00 00 C7 ) generates 200-400 character log message being written to SD card. Every couple of minutes or seconds, depending of the size of your Z-Wave environment.

Code: Select all

[2016-05-02 07:02:20.459] [D] [zway] SETDATA devices.3.data.lastReceived = 0 (0x00000000)
[2016-05-02 07:02:20.460] [D] [zway] SETDATA devices.3.instances.2.commandClasses.50.data.2.val = 0.000000
[2016-05-02 07:02:20.460] [D] [zway] SETDATA devices.3.instances.2.commandClasses.50.data.2.delta = 0 (0x00000000)
[2016-05-02 07:02:20.460] [D] [zway] SETDATA devices.3.instances.2.commandClasses.50.data.2.ratetype = 1 (0x00000001)
[2016-05-02 07:02:20.461] [D] [zway] SETDATA devices.3.instances.2.commandClasses.50.data.2.previous = 0.000000
[2016-05-02 07:02:20.461] [D] [zway] SETDATA devices.3.instances.2.commandClasses.50.data.2 = Empty
[2016-05-02 07:02:20.877] [D] [zway] RECEIVED: ( 01 10 00 04 00 03 0A 32 02 21 34 00 00 00 00 00 00 C7 )
[2016-05-02 07:02:20.878] [D] [zway] SENT ACK
[2016-05-02 07:02:20.878] [D] [zway] SETDATA devices.3.data.lastReceived = 0 (0x00000000)
[2016-05-02 07:02:20.878] [D] [zway] SETDATA devices.3.instances.0.commandClasses.50.data.2.val = 0.000000
[2016-05-02 07:02:20.879] [D] [zway] SETDATA devices.3.instances.0.commandClasses.50.data.2.delta = 0 (0x00000000)
[2016-05-02 07:02:20.879] [D] [zway] SETDATA devices.3.instances.0.commandClasses.50.data.2.ratetype = 1 (0x00000001)
[2016-05-02 07:02:20.879] [D] [zway] SETDATA devices.3.instances.0.commandClasses.50.data.2 = Empty
[2016-05-02 07:03:08.414] [D] [zway] RECEIVED: ( 01 10 00 04 00 05 0A 32 02 21 34 00 00 00 00 00 00 C1 )
This is suicide and a sign of careless coding.
Give us an option to reduce the level of logging! And include 'SD card wear out' best practices into your development process.

Regards,
-DarS

nochkin
Posts: 38
Joined: 29 Feb 2016 05:05

Re: Repeated corruption of SDcards

Post by nochkin » 02 May 2016 08:44

DarS,
I don't think this is the reason for your issue. At least, not completely.
I run it with no issues on my setup. Well, you just mentioned that you have 2 other boards with different results.

As a test, you could try to symlink the log to /dev/null or ramdisk to see if this is the only thing which causing the issue.

pz1
Posts: 2053
Joined: 08 Apr 2012 13:44

Re: Repeated corruption of SDcards

Post by pz1 » 02 May 2016 09:48

DarS wrote:Give us an option to reduce the level of logging!
You can. The official description is here http://razberry.z-wave.me/index.php?id=13
I think that information is outdated. My present config.xml looks like this:

Code: Select all

<config>
    <automation-dir>automation</automation-dir>
    <log-file>/var/log/z-way-server.log</log-file>
    <log-level>0</log-level>
    <debug-port></debug-port>
</config>
As their help page says 0 is most verbose, 6 is silent. (Unfortunately that information is not in the Developers Manual)

DarS
Posts: 12
Joined: 24 Mar 2015 07:48

Re: Repeated corruption of SDcards

Post by DarS » 06 May 2016 17:21

@pz1
Thanks for another hint! I WAS searching for 'log level' switch, but obviously not managed to locate it in FAQ section. Good to know :-) So another appeal to razberry manufacturer - pls update the main documentation to include this info.
And another question - why brand new installations of Z-Way (I made two of them recently) had the highest log level enabled by default?

@nochkin
Hmm, I think the impact of SD wear out is often being underestimated. See example of my logs three posts up - you can see there 13 lines of Z-Way log messages per minute. I guess each line (message) was written to SD card separately. So you have:
- 780 writes/h
- 18 720 writes/day
- 6 832 800 writes/year
Because these writes are so small in size (hundred bytes or so), it is very likely that they will modify the same 128kB block of SD card! Despite all algorithms used by SD card controller, intending to disperse the writes equally around the flash area (wear-leveling). Why?
SD card controller is 'wear-leveling' optimized for multi-MB pictures (so the 128kB chunks can be easily allocated all around). But it might be completely fooled when you add 200 bytes to the same file every 4 seconds.
Generic statements on SD card life span usually quote ~100 000 writes (per cell, or rather per 128kB blocks). This is way enough for multi-MB pictures. But might be not enough for poor logging occurring very frequently to the single file.

This is why I said that my two other Raspberries have been working flawlessly 24h/day for two years now. They simply don't run Z-Way (although they run other core services like DNS cache, DHCP, web cache and alike).

Regards,
-DarS

nochkin
Posts: 38
Joined: 29 Feb 2016 05:05

Re: Repeated corruption of SDcards

Post by nochkin » 06 May 2016 22:29

DarS wrote:This is why I said that my two other Raspberries have been working flawlessly 24h/day for two years now. They simply don't run Z-Way (although they run other core services like DNS cache, DHCP, web cache and alike).
But you did not disable logging on all other services for your other Pi boxes, right? Z-Way software is not the only kid on the block.
Regarding the number of writes. The numbers you mentioned are when you don't take write cache into account, but OS does some caching too, meaning writes don't go to the media immediately.

Of course, I don't have real stats to support my assumptions. But I have my Pi boxes where I run databases and other things which do constant writes. Not to mention logs of various processes.

Yes, wearing occurs with any media, but I doubt the rate per cell is that high and would not cause such crashes which can be "fixed" by re-imaging the card.
Normally, cards would go to Read-Only mode when they used up their resources for writes.

Post Reply