Repeated corruption of SDcards

Discussions about RaZberry - Z-Wave board for Raspberry computer
pz1
Posts: 2053
Joined: 08 Apr 2012 13:44

Repeated corruption of SDcards

Post by pz1 » 13 Mar 2016 19:50

My main production system (some 20 devices) always end in a crash after about 2-3 weeks of operation. In the last days before the crash I do notice that an increasing number of directories become corrupted. Looks like some process is performing random writes on the SD card.
I did check the failed SD card using an USB card reader on a Rapsberry Pi. Below the results:

Any ideas of what may go wrong? (See system description via my signature)

pi@rasppi:~ $ ls -la /dev/sd*

Code: Select all

brw-rw---- 1 root disk 8, 0 Mar 13 16:38 /dev/sda
brw------- 1 root root 8, 1 Mar 13 16:38 /dev/sda1
brw------- 1 root root 8, 2 Mar 13 16:38 /dev/sda2
pi@rasppi:~ $ sudo fsck /dev/sda

Code: Select all

fsck from util-linux 2.25.2
e2fsck 1.42.12 (29-Aug-2014)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/sda

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>
pi@rasppi:~ $ sudo fsck /dev/sda1

Code: Select all

fsck from util-linux 2.25.2
fsck.fat 3.0.27 (2014-11-12)
/dev/sda1: 75 files, 2535/7673 clusters
pi@rasppi:~ $ sudo fsck /dev/sda2

Code: Select all

fsck from util-linux 2.25.2
e2fsck 1.42.12 (29-Aug-2014)
/dev/sda2: clean, 37241/455952 files, 287772/1929216 blocks
Since 29-12-2016 I am no longer a moderator for this forum

nochkin
Posts: 38
Joined: 29 Feb 2016 05:05

Re: Repeated corruption of SDcards

Post by nochkin » 14 Mar 2016 08:16

I run it for weeks with no issues.
Yes, the server writes log file at least. Could be some more things as well.

Is it possible you power cycle the Pi from time to time which causes the corruption? If not, I would try another SD card just to isolate the possibility of having issues with the card itself.

pz1
Posts: 2053
Joined: 08 Apr 2012 13:44

Re: Repeated corruption of SDcards

Post by pz1 » 14 Mar 2016 11:20

Thanks for the reply. It happens with:
- 3 different brands of SD cards,
- 2 Raspberry Pi 2 devices.
- 2 UZB1 controllers
- I never power cycle the devices without proper shutdown.
- I did rebuild my system several times.

I do know that other installations do run for weeks.

I do use the following Apps activated (running)
- IF Then (2x)
- Trap events from Remotes(1x)
- Link other Z-Way controller(1x)
- SceduledScene (5x)
- Code Device (30x to send JS zway.devices[11].ThermostatMode.Set(11))
- LightScene (4x)
- Z-Wave (1x)
- OpenRemote Helper (1x)
- Cron (1x)

And a couple of other Apps which are deactivated.

There is some moderate polling from OpenRemote. In daily operation the Z-Way message queue hardly ever exceeds 20 entries. Mostly waiting for battery driven devices to wake up.

In a neighbouring location I do have an additional Pi B + RaZberry (first version) that runs for weeks without a problem. It only has 2 Z-Wave devices
Since 29-12-2016 I am no longer a moderator for this forum

nochkin
Posts: 38
Joined: 29 Feb 2016 05:05

Re: Repeated corruption of SDcards

Post by nochkin » 15 Mar 2016 04:19

It does not seems like the software is causing the corruption.
I wonder what brand and model of the CD cards. I tend to use Sandisk Ultra on my Pi installations.
If the uptime of the Pi looks good at the time of the fs crash, then this is not a sudden reboot issue.

I would try to check the power supply in this case.
I use industrial 12V DC power supply with a good step-down DC-DC converter (based on MP2307) to bring it to 5V DC, which is then soldered directly to the Pi board.

pz1
Posts: 2053
Joined: 08 Apr 2012 13:44

Re: Repeated corruption of SDcards

Post by pz1 » 15 Mar 2016 12:13

Thanks for continued help. I did use 2 Konig cards, 2 "no-brand". I recently got a plain 16GB Sandisk. I'll try that when it crashes again.
I have used a couple of different power supplies from 1.5-3A and voltages of 5.0-5.2. The bare rasbian-jessie system draws basically some 230 mA. At start-up (with UZB1 dongle) it peaks at around 460 mA according to this simple gauge.
DSCN0340.JPG
DSCN0340.JPG (111.71 KiB) Viewed 3145 times
Update: In my present experiment I do write log files to USB stick.
I do consider to disable Swapfile in the next round

nochkin
Posts: 38
Joined: 29 Feb 2016 05:05

Re: Repeated corruption of SDcards

Post by nochkin » 15 Mar 2016 19:42

Before you use the card, please try to check it using h2testw or similar tool to verify the real capacity.
A generic SD card may spec the card incorrectly: i.e. Class 10 card may be Class 4 in reality. Pi needs Class 10 at least.

Not all power supplies are equal. If it said "1.5A", it may not be true 1.5A in reality. Especially those generic USB chargers, these are the worst in this aspect.
The Pi board with Z-Wave and WiFi sticks should not consume more than 1A, but the "quality of power" may affect the stability of your system.

P.S.: I still think the SD card could be the issue in your case, not the power supply.

pz1
Posts: 2053
Joined: 08 Apr 2012 13:44

Re: Repeated corruption of SDcards

Post by pz1 » 17 Mar 2016 13:46

After 'routing' the log files to USB-stick, my filesystem looks like this:

Code: Select all

pi@rasppi2:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       7.3G  970M  6.0G  14% /
devtmpfs        459M     0  459M   0% /dev
tmpfs           463M     0  463M   0% /dev/shm
tmpfs           463M  6.2M  457M   2% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           463M     0  463M   0% /sys/fs/cgroup
/dev/mmcblk0p1   60M   20M   41M  34% /boot
/dev/sda1        15G  9.5M   15G   1% /var/log
Please note that all tmpfs-settings come from the rasbian-jessie-lite distro

pz1
Posts: 2053
Joined: 08 Apr 2012 13:44

Re: Repeated corruption of SDcards

Post by pz1 » 17 Mar 2016 14:33

I do notice a very high update rate from my Fibaro Universal Sensor's 4 temperature sensors. In the listing below you will see consequtive readings at 06:25:49.032, 06:25:49.123, 06:25:49.206, 06:25:49.318.

note: reported this on Github issue #337

From my configuration settings (see image at the bottom, I would only expect log entries every 200 seconds Parameter 11.
Could this high update rate cause problems with the SDCard?

Code: Select all

[2016-03-17 06:25:16.022] [D] [zway] SETDATA devices.31.data.lastReceived = 0 (0x00000000)
[2016-03-17 06:25:16.022] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.deviceScale = 0 (0x00000000)
[2016-03-17 06:25:16.025] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.scaleString = "°C"
[2016-03-17 06:25:16.025] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.val = 29.309999
[2016-03-17 06:25:16.025] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1 = Empty
[2016-03-17 06:25:27.026] [D] [zway] RECEIVED: ( 01 12 00 04 00 1F 0C 60 0D 05 05 31 05 01 44 00 00 0A AB 47 )
[2016-03-17 06:25:27.026] [D] [zway] SENT ACK
[2016-03-17 06:25:27.026] [D] [zway] SETDATA devices.31.data.lastReceived = 0 (0x00000000)
[2016-03-17 06:25:27.027] [D] [zway] SETDATA devices.31.instances.5.commandClasses.49.data.1.deviceScale = 0 (0x00000000)
[2016-03-17 06:25:27.028] [D] [zway] SETDATA devices.31.instances.5.commandClasses.49.data.1.scaleString = "°C"
[2016-03-17 06:25:27.029] [D] [zway] SETDATA devices.31.instances.5.commandClasses.49.data.1.val = 27.309999
[2016-03-17 06:25:27.029] [D] [zway] SETDATA devices.31.instances.5.commandClasses.49.data.1 = Empty
[2016-03-17 06:25:38.023] [D] [zway] RECEIVED: ( 01 12 00 04 00 1F 0C 60 0D 03 03 31 05 01 44 00 00 0B 60 8D )
[2016-03-17 06:25:38.023] [D] [zway] SENT ACK
[2016-03-17 06:25:38.023] [D] [zway] SETDATA devices.31.data.lastReceived = 0 (0x00000000)
[2016-03-17 06:25:38.023] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.deviceScale = 0 (0x00000000)
[2016-03-17 06:25:38.026] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.scaleString = "°C"
[2016-03-17 06:25:38.026] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.val = 29.120001
[2016-03-17 06:25:38.026] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1 = Empty
[2016-03-17 06:25:38.277] [D] [zway] RECEIVED: ( 01 12 00 04 00 1F 0C 60 0D 03 03 31 05 01 44 00 00 0B 60 8D )
[2016-03-17 06:25:38.278] [D] [zway] SENT ACK
[2016-03-17 06:25:38.278] [D] [zway] SETDATA devices.31.data.lastReceived = 0 (0x00000000)
[2016-03-17 06:25:38.278] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.deviceScale = 0 (0x00000000)
[2016-03-17 06:25:38.280] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.scaleString = "°C"
[2016-03-17 06:25:38.280] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.val = 29.120001
[2016-03-17 06:25:38.280] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1 = Empty
[2016-03-17 06:25:49.031] [D] [zway] RECEIVED: ( 01 12 00 04 00 1F 0C 60 0D 03 03 31 05 01 44 00 00 0B 54 B9 )
[2016-03-17 06:25:49.032] [D] [zway] SENT ACK
[2016-03-17 06:25:49.032] [D] [zway] SETDATA devices.31.data.lastReceived = 0 (0x00000000)
[2016-03-17 06:25:49.032] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.deviceScale = 0 (0x00000000)
[2016-03-17 06:25:49.035] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.scaleString = "°C"
[2016-03-17 06:25:49.035] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.val = 29.000000
[2016-03-17 06:25:49.035] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1 = Empty
[2016-03-17 06:25:49.123] [D] [zway] RECEIVED: ( 01 12 00 04 00 1F 0C 60 0D 03 03 31 05 01 44 00 00 0B 54 B9 )
[2016-03-17 06:25:49.123] [D] [zway] SENT ACK
[2016-03-17 06:25:49.123] [D] [zway] SETDATA devices.31.data.lastReceived = 0 (0x00000000)
[2016-03-17 06:25:49.124] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.deviceScale = 0 (0x00000000)
[2016-03-17 06:25:49.126] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.scaleString = "°C"
[2016-03-17 06:25:49.126] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.val = 29.000000
[2016-03-17 06:25:49.126] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1 = Empty
[2016-03-17 06:25:49.205] [D] [zway] RECEIVED: ( 01 12 00 04 00 1F 0C 60 0D 03 03 31 05 01 44 00 00 0B 54 B9 )
[2016-03-17 06:25:49.205] [D] [zway] SENT ACK
[2016-03-17 06:25:49.206] [D] [zway] SETDATA devices.31.data.lastReceived = 0 (0x00000000)
[2016-03-17 06:25:49.206] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.deviceScale = 0 (0x00000000)
[2016-03-17 06:25:49.208] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.scaleString = "°C"
[2016-03-17 06:25:49.209] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.val = 29.000000
[2016-03-17 06:25:49.209] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1 = Empty
[2016-03-17 06:25:49.317] [D] [zway] RECEIVED: ( 01 12 00 04 00 1F 0C 60 0D 03 03 31 05 01 44 00 00 0B 54 B9 )
[2016-03-17 06:25:49.318] [D] [zway] SENT ACK
[2016-03-17 06:25:49.318] [D] [zway] SETDATA devices.31.data.lastReceived = 0 (0x00000000)
[2016-03-17 06:25:49.318] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.deviceScale = 0 (0x00000000)
[2016-03-17 06:25:49.321] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.scaleString = "°C"
[2016-03-17 06:25:49.321] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1.val = 29.000000
[2016-03-17 06:25:49.321] [D] [zway] SETDATA devices.31.instances.3.commandClasses.49.data.1 = Empty
[2016-03-17 06:26:06.022] [D] [zway] RECEIVED: ( 01 22 00 04 00 0A 1C 8F 01 06 03 80 03 46 06 43 03 01 42 05 78 04 46 08 00 7F 02 81 05 02 46 04 02 84 07 8B )
[2016-03-17 06:26:06.022] [D] [zway] SENT ACK
[2016-03-17 06:26:06.022] [D] [zway] SETDATA devices.10.data.lastReceived = 0 (0x00000000)
FusReadParams.PNG
FusReadParams.PNG (57.34 KiB) Viewed 3365 times
Since 29-12-2016 I am no longer a moderator for this forum

nochkin
Posts: 38
Joined: 29 Feb 2016 05:05

Re: Repeated corruption of SDcards

Post by nochkin » 17 Mar 2016 19:14

pz1 wrote:Could this high update rate cause problems with the SDCard?
I'd be surprised if that causes the corruption on the card, unless the card is defective by itself.
I have other tasks running on my Pi, and those tasks usually write/update even more often without any corruption issues at all.

Did you have a chance to try another card and do "h2testw" before trying it?
I also wonder if the card you have is a real Class 10 card (must have at least 10MB/s speed on writing).

If the card is installed already, you can try to measure the speed by doing this command:

Code: Select all

sync && dd if=/dev/zero of=/testfile.bin bs=1024k count=1024 ; rm /testfile.bin
It will create a temporary file and provide the speed of the process.
Of course, that will not test it for the real capacity like h2testw does, but at least we can guess the Class rating of the card.

pz1
Posts: 2053
Joined: 08 Apr 2012 13:44

Re: Repeated corruption of SDcards

Post by pz1 » 17 Mar 2016 21:22

@nochkin,
I keep your suggestions in mind. I haven't done your suggested h2testw test yet, but I definitely will once I concluded my present experiment (may last 2 weeks)

Your last test says:

Code: Select all

root@rasppi:/home/pi# sync && dd if=/dev/zero of=/testfile.bin bs=1024k count=1024 ; rm /testfile.bin
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 72.0827 s, 14.9 MB/s
I think this is one of the two 8Gb Konig cards. The second Konig card has 30Mb/s printed on it.

Post Reply