Eeprom writes and config param changes => board reset

Discussion about Z-Uno product. Visit http://z-uno.z-wave.me for more details.
Post Reply
yves
Posts: 50
Joined: 17 Sep 2021 18:05

Eeprom writes and config param changes => board reset

Post by yves »

Hello,
For information of all, and may be a correction one day....

I think I have found (at last!) the cause for an erratic bug that resets my zuno boards from time to time:
Here is my config :
  • Z-UNO2, Z-Uno bootloader version:3.0.10, Security:none, Frequency:EU Device, included:yes ...
  • Fibaro Home Center 3

It seems that it is not safe to use EEPROM.put() while some configuration parameter changes/activities are occurring.

My test code writes 32 bytes of data to the EEPROM rolling over 16 addresses ranging from 0xA00 to 0xBFF. It happens every 5min.
From the home box, I send new config parameter value every 5min, asynchronously.
Every time the exchanges between Zuno and HC3 are not ‘simple’ ie some errors are detected by HC3 (here its console outputs):
  • with messages like: No report was received for parameter xx. The device might not have this parameter.
Or
  • duplicated many consecutive times: Received parameter xx report, value = 123456..
I get a board reset during/after the EEPROM.put() if it occurs 'meanwhile'.
during/after means that the console output that is right after EEPROM.put() in the code does not produce anything.
(Note that errors detected by HC3 are generally false alarms)

I have found an (99%?) efficient workaround that consists in delaying 3 to 5 s any user EEPROM.put() after my config_parameter_hndlr() has been called.
I think that concurrent access of EEPROM are not enough managed…

Side dish: did you notice that EEPROM.put always return 0 even if data are correctly written? (this is not what is stated in the doc)

Best Regards,
Yves
p0lyg0n1
Posts: 242
Joined: 04 Aug 2016 07:14

Re: Eeprom writes and config param changes => board reset

Post by p0lyg0n1 »

Hi,
thank you for your research. If you don't mind, let's try to dig deeper together. So far, I have several assumptions about what could be the reason for this behavior:
1. On SDK 7.16, in case of active use of FLASH memory, the process of defragmentation of the file system may occur and WatchDog may be triggered. With this behavior, problems with sending confirmations to the controller are also possible.
2. In addition to the active use of memory, the device is in the zone of a bad signal and HC simply does not always see answers from it and repeatedly sends the SET command, this behavior can also provoke the protocol to frequently change routes.
In any case, if the problem is reproduced on your network, it is better to get to the bottom of its cause. We will be able to understand everything in more detail if you include debugging output in your sketch and send the log of the Z-Uno console from the start to reboot. To do this, you need to do the following:
1. Add to the beginning of the sketch
ZUNO_ENABLE( LOGGING_DBG);
2. Compile and upload the resulting sketch.
3. Connect the USB-UART adapter to the TX0 Z-Uno pin (Z-Uno TX0->Adapter RX, Z-Uno GND -> Adapter GND).
4. Run any terminal client (we usually use CoolTerm) at a speed of 115200
5. Press the Z-Uno RST button and make sure that the start output has appeared (****************< BOOT ZUNO, etc.)
6. Wait for the reboot
7. Send us the text received in the terminal.

After that, it will be possible to switch to beta 3.12B8 (at the moment it is the last one) it is already more or less stable and has much more advanced debugging output in critical situations. Perhaps on this version, the problem will go away by itself - there is a heavily redesigned strategy for using the file system, there is additional caching. The only difficulty with newer versions is that you won't be able to go back to the older SDK, updates are now only possible forward. Attention: On 3.12 Beta, the console runs on 921600 baudrate.
Regarding the returned EEPROM.put, we will check, maybe there has been an error there for a long time.
Good luck!
Best regards,
Alex.
p0lyg0n1
Posts: 242
Joined: 04 Aug 2016 07:14

Re: Eeprom writes and config param changes => board reset

Post by p0lyg0n1 »

Update. The put/get functions did indeed return incorrect values. Fixed in the code: https://github.com/Z-Wave-Me/Z-Uno-G2-C ... cca40124fb . The changes will appear in the new beta version.

Best regards,
Alex.
yves
Posts: 50
Joined: 17 Sep 2021 18:05

Re: Eeprom writes and config param changes => board reset

Post by yves »

Hi Alex,

Thanks for your answers,

On my side I have made some progresses about my workaround(s)…
I had already set LOGGING_DBG and see that maybe I also had a saturation of Zwave network “meanwhile”.
So I have:
  • added a ‘break;’ in the ‘for loop‘ that scans channels that should be updated toward HC3. Now there are no more burst of ‘zunoSendReport()’ in case many channels are to be updated at the same time. They are at least separated by 330ms.
  • also increased the time between two reports for ‘1 byte’ values. That is to say: from 15 to 20s for ‘refresh in any case’ and set at least to 5s for ‘refresh if changed’.
4 bytes values where not changed, they are refreshd every 60s, and 30s in cases of changes.

For this application I have 5 ‘1 Byte’ output channels and 4 ‘4 bytes’ channels.
This leads to around 20 zunoSendReport() per minute, on average (it was 25 in previous configuration).
I don’t know if this is excessive or not, and I understand that it depends on the target used (HC3 here)?

With these “strategies” I have now less and less bugs and tests become longer and longer ;), so that the log files have been thrown away :( ….

But I had saved one, and you will find joined a file containing 2 extracts of it, 'around' 2 resets.
  • My setup() code contains a Serial.println("RESTART RESTART RESTART RESTART RESTART") that is printed in the log file when application restarts.
  • Time is also printed in this file (by the loop function every 1s) as <12:24:14>17074 where: the first part is UTC hour; the second is the returned value of millis().
    This allows for comparisons between HC3 console and the Z uno one and make sure that the application is called by the OS (*)
For both two cases of "bug/restart" the final words of LOGGING_DBG where:
  • >>> (1005947) OUTGOING PACKAGE: SRC_ID:0.0 DST_ID:0.0 KEY:3 OPTS:26 DATA:(\n)
    *** PROCESSED:1
    *** CLEANUP:0
If I presume correctly, SRC & DST values do not sound right, unless it is supposed to be a broadcast message?

Last: I am not sure that it is meaning full (or event real/reproducible) but it seems that after a 'bug-reset' the recovering time is longer than after a simple press button reset (about 8-10s more).

Best Regards,
Yves

(*) My application asks to get back CPU every 100ms, but individual tasks occupy less than 1ms (averaged over 10 calls) so that OS is getting a lot of time....
There is one exception with EEPROM.put calls that take at least 500ms!
Attachments
LOG1extract.zip
Log extracts of The bug
(846 Bytes) Downloaded 486 times
Post Reply