version 3.0.12

Discussion about Z-Uno product. Visit http://z-uno.z-wave.me for more details.
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12

Post by yves »

Hi Alex,
I am still using 3.0.10... My first post was to know if it was wise to go to 3.0.12 knowing that there were some moves in git hub...

Anyways, I have made some progresses with 3.0.10 and find some improvements:
As explained above, my stronger 'crash test' is to send 2 'SET' commands from my HC3 to the Z-uno device with a short time delay between them (still in a place where RF comms seem difficult).
This morning the crash test passed with a 0 delay!!!

I may add some details if you want, but the final result/modifications are:
  • in the ZWQProcess's block that start with "if(p->flags & ZUNO_PACKETFLAGS_GROUP){" ,
    I have added, at the end :
    processed_indexes[processed_indexes_cnt++] = qi;
    if(processed_indexes_cnt >= MAX_PROCESSED_QUEUE_PKGS) break;

    This reduce ZWave traffic (I am not using groups/association)
  • I have modified zuno_CommandHandler so that it only copy content of 'cmd' in a fifo to be emptied latter (but ASAP) using zuno_CCTimer
  • Last (late yesterday) I have modified "case ZUNO_JUMPTBL_CMDHANDLER:" in LLCore.c and put zunoAwakeUsrCode(); AFTER zuno_CommandHandler(...).
    I am not using sleep mode (or I am not aware) so I have also modified zunoAwakeUsrCode restoring the commented test: if ((zunoThreadIsRunning(g_zuno_sys->hMainThread)) == false) to uncommented; thus avoiding unnecessary calls to zunoResumeThread()
This last modifications allow for a '0ms' delay crash test instead of 250ms...
So I still feel that it is VERY important that zuno_CommandHandler() is called fast and use a very short CPU ressources.

This is with 3.0.10 but, from what I have seen/tested, these improvements might be also usefull for next version; most probably in a less brutal way? :D...

Best Regards,
Yves
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12

Post by yves »

Hi,

____ 3.12 vs 3.10 ____
I have made some tests with the same application running under 3.12(29/12/2023) or 3.10, with a Fibaro Home Center 3 as controler.
Here are the results:

3.12 ‘better’ points
  • There is a far better management of weak ZWave RF conditions => no more ‘disconnected’ state of the device on HC3
  • As a side dish: Faster upload time
  • and many other...
3.12 ‘worse‘ points
  • No management of user configuration parameter (ids 64 to 95).
    (const ZunoCFGParameter_t CFGPARAM_DEFAULT is missing in ZWCConfiguration.cpp).
  • There are still unexplained reboots, over 5 times more than 3.10
    (See below)
3.12 unsolved issues:
  • Unplug-replug device, pressing reset btn or uploading code does not lead to the same initial state. This is clearly visible with ZWave traffic. That is miss-leading!
  • Under weak RF condition, sending (from HC3) configuration parameters is still ‘hazardous’, see question about SYSTEM_PKG_DOMINATION_TIME below.
  • There are 'unexplained' reboots:
    • Higher ZWave input traffic => more reboot
    • But what trigs reboot does not seem to be related to the reception of ZWave inputs (ie reboot occurs few seconds after the last ‘GETTER’ call and application is still running after this call).
    • LOGGING_DBG does not show something prior reboot.
    • There are no memory leaks,
    • And no outputs from checkSystemCriticalStat() (I mean never).
    • SYSEVENT_HANDLER does not seem to be called.
    • It happens even if my application is an empty loop (only delay(74);) with some serial.print in the getter/setter functions.
I finally found the main culprit(s): That is due to EEprom Read Write in ZWCCBasic.cpp (these writes were not in 3.10. This explains that 3.12 reboots 5 times more often than 3.10)
  • EEprom writes are still very good triggers for reboots !
    I have noticed that after an EEPROM.put() there is a 500ms 'cpu hanging time'. Most often it happen during the PUT execution, but it may happen up to / over 10sec later!
Side question: I didn’t go deeply into ‘why do they write switches values to eeprom?’ But I hope there is a very good reason: If I am right 1 write per minute should kill the EEprom in less than 3 months!

Unfortunatly, it is only the main culprit, there are still unexplained reboots. I have noticed that after those reboots the rtcc output is not reset to 0.

missUnderstoods common to both version:
maybe I don’t have to know but:
• What is the use of SYSTEM_PKG_DOMINATION_TIME?
• & who should update g_zuno_sys->rstat_priority_counts[channel]? :
More often when ZWave RF is weak (but not only), an uninterrupted flow of calls to zuno_CommandHandler() with the same cmd packet appears (around 10/s); So that zunoCheckSystemQueueStatus always returns true.
Other buggy state is when zunoCheckSystemQueueStatus stays endlessly true because
g_zuno_sys->rstat_priority_counts[channel] stays > 0.
(there are many chickens or eggs issues in these observations, I did not succeed in making a decision…)

It is really uninterrupted with 3.10, explaining the ‘disconnected’ states mentioned above.
3.12 seems to better avoid this bug and most of the time it exits the endless loop alone.

Hope it helps
Have a good day!
niky94547
Posts: 22
Joined: 31 May 2019 10:51

Re: version 3.0.12 // BUG in 3.0.10, solved(?)

Post by niky94547 »

yves wrote:
09 Jan 2024 16:12
Hi everyone (even if this post if mainly for developers)

:D BINGO! :D
I think I have found THE problem.

And, to me, it is not 100% related to Fibaro Home Center 3 behavior: it should be reproducible with every kind of ZWave controller.
In fact, previous post and proposed solutions were aiming to shorten response time of ZWave exchangess.
It was only ‘half working’ because it was only reducing the number of occurrences of the problem I have found:

Main issue is that zuno_CommandHandler (in llcore.c) is not re-entrant (*).
So that, if two ‘SET’ command occur within a too short frame time, ‘unpredictable results occur’ (up to a reset/reboot).
This seems to happen quite frequently when the z-uno board is ‘almost out of reach’ of the controller, but it may also happen when the board is the target of more that one application in the controller or when it is associated with ‘many’ other devices.

zuno_CommandHandler is a quite long code (to read & to execute) so I ‘replace’ it with a very simple one that just copies data received from the controller in a FiFo. Then this FiFo is emptied using the zuno_CCTimer that now call the 99% original zuno_CommandHandler (the 1% left is that it takes its inputs from the fifo). zuno_CCTimer is not an interrupt handler (or without too big constraints?)

Following the test bench described In previous post, it is now working for more than 24h in my “electricaly/RF dark/difficult” room without any problem (before, on average, it was around 50 disconnections/days )

RQ: I have also kept modifications explained in previous post, but I am not sure that they are required now.
Regards,

(*) re-entrant issues have ben caught using a static flag set to true upon entering and false on exiting the handler, if not false upon entry => problem...
Hi Ives,

I've been following your posts and detailed issues description on 3.0.10. Really a great debug from your side.

Actually experiencing same erratic behavior (reboots mostly). Could you please share your latest mods on zuno_CommandHandler. Did you received stability of operation in long term?

Thanks a lot.
Kind Regards,
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12

Post by yves »

Hi Niky,

I will send you today all modifications I have made to core files in 3.12.
[I have to add some comments to split what I feel is important (from what is 'nice to have' to me :) )].

Yes, I am still having reboots issue with 3.12 around 3/days (!)
Compared to less than 2/week for my 3.10 (also with some core modifications...)

The fact that the more you write to EEprom, the more reboot you get is may be a track?

Regards,
Have a nice day!
Yves
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12

Post by yves »

Hi Niky,

You will find attached my modifications to 3.12 dec 2023.

As a foreword:
All modifications are under compilation flags that start with YGN. Today I have defined:
#define YGNCHECKDIFFMESSAGE //remove 2 identical consecutive received msg
#define YGNREDUCE_hp_time // in relation with above define reduce rstat_pkgs_hp_time
#define YGNEXTERNVARACCESS //allow to get acces to some FW variables (g_channels_...)
#define YGNSENDONLYONEPKT // faster zwave send data on average
#define YGNRESETBUSYQUEUE // reset queue if it stays too long 'busy'
#define YGNPROTECTEEPROMACCESS // delay Zwave traffic if EEprom is used & prevent eepromW if queue busy
#define YGNNO_EEPROMUSE //remove ZWCCBasic EEprom use
and main undefined:
#undef FULL_LOG // if undef, LOGGING_DBG is “simplified”
#undef YGNINCOMINGLOGS //To log also incoming messages
#undef YGNOUTGOINGLOGS //To log also outgoing messages
#undef YGNCHECKSYSCRITIC //add checkSystemCriticalStat stuff to compilation
#undef YGNLOGBUSYQUEUE //log 'stats' on busy queue state
#undef YGNINTERVALSTART //100ms spared after rstat_pkgs_hp_time



In the description that follows I have only explained modifications for which the flag is DEFINED…
In CommandQueue:
  • YGNPROTECTEEPROMACCESS : Added a function to be called before EEPROM.put so that zunoCheckSystemQueueStatus() returns ‘true’ during 1sec. Efficiency : ???
  • YGNRESETBUSYQUEUE: in zunoCheckSystemQueueStatus() if the queue is busy for too long (15s) do a reset Queue executing g_zuno_sys->rstat_priority_counts[channel] = 0; g_zuno_sys->rstat_pkgs_hp_time = 0;
    This is probably not fair 😊 but it is really efficient for 3.10.
    For 3.12 I would say that 15s are never reached.
  • YGNSENDONLYONEPKT: added _isToSend() to decide if the packet is to be sent by ZWQProcess(). This to remove some packet that are send twice. Here also: really efficient for 3.10 and not harmful for 3.12
In LLCore, Sync,:
only cosmetics with LOGGING_DBG flag

In zuno_time.h :
give me access to rtcc_micros() to be able to have a 64 bits ms timer (at last!!! :D )

In ZWCCBasic:
  • YGNNO_EEPROMUSE: remove zunoBasicSaveInit(...) zunoBasicSaveSet(...) and zunoBasicSaveGet(...) to avoid systematic epprom writes when receiving switch info. This is the most efficient modification for 3.12!
In ZWCCConfiguration:
  • (without flag...) added const ZunoCFGParameter_t CFGPARAM_DEFAULT and const ZunoCFGParameter_t *zunoCFGParameter(size_t param) copied from 3.10's code.
    This to be able to use userConfigParam from 64 to 95.
    100% efficient.

In ZWCCTimer:
  • YGNEXTERNVARACCESS: to copy g_Usrchannels_data into a g_channels_data to be able to read g_Usrchannels_data values form loop code. It is (was) useful to me…
In ZWSupport:
  • YGNPROTECTEEPROMACCESS: added zunoGetReadyForEEpromWrite() to be called (by my loop code) before EEPROM.put to check that zuno traffic is null, if it is the case, call setEEpromBusy() to stop zwave traffic for a while (see above).
    Efficiency: difficult to evaluate may be it is only a ‘straps and belt’ option ?
  • YGNCHECKDIFFMESSAGE: added checkIfDifferentCmd() that is called by zuno_CommandHandler(). If two consecutive messages are identical, then it prevents re-execution of zuno_CommandHandlerbefore its main processing is done.
  • YGNREDUCE_hp_time (YGNCHECKDIFFMESSAGE must be defined): receiving consecutive messages imply that g_zuno_sys->rstat_pkgs_hp_time is kept close to current time so that zunoCheckSystemQueueStatus returns always true. If two successive messages are the same, this function resets g_zuno_sys->rstat_pkgs_hp_time to the previous value.
    These two modifications are:
    Efficient with 3.10 (but not sufficient)
    For 3.12 There are never succession of same 'usual HC3' packs. But I had to (re)implement it because of COMMAND_CLASS_CONFIGURATION exchanges, for which succession of same pack occur very often. So, for 3.12, this modification is restricted to if (ZW_CMD_CLASS == COMMAND_CLASS_CONFIGURATION).
Conclusions:

*3.12 have a far better management of zwave traffic, there are no more saturation of 'zuno_CommandHandler()'. Excepted during COMMAND_CLASS_CONFIGURATION exchanges. RQ configuration params uses, imply EEprom operations...

* There are still some reboots with 3.12 that occurs even whith an empty loop().

* For 3.12, most useful modifications are:
  • to reduce reboots: those in ZWCCBasic, and, more generally, to reduce EEprom.puts in the loop() code
  • for configuration parameters: those in ZWCCConfiguration and ZWSupport
  • I think that a ‘clean’ resetBusyQueue in CommandQueue could be a good thing, in weak RF environnement there are still strange things...
  • and an official millis64() would be great!

Regards,
Yves
Attachments
CoreModif_3.12.3.zip
(40.66 KiB) Downloaded 192 times
niky94547
Posts: 22
Joined: 31 May 2019 10:51

Re: version 3.0.12

Post by niky94547 »

Hi Ives,

Thanks a lot on your support and detailed description. Will look thought and test on my project as well. Then reporting observation from my side

Really appreciated.

Have a nice evening.

Kind Regards,
Niky
niky94547
Posts: 22
Joined: 31 May 2019 10:51

Re: version 3.0.12

Post by niky94547 »

Tomorrow will test some of yours modification on 3.0.10.

Actually receive stability on eratic reboots using wd feed:

https://z-uno.z-wave.me/Reference/WDOG_Feed/

You could check on your side too. Will continue to test and report.

Br,
Niky
niky94547
Posts: 22
Joined: 31 May 2019 10:51

Re: version 3.0.12

Post by niky94547 »

Hi Yves,

Have some time to test. You're absolutely right.

Mine observation:

On 3.0.10
- eeprom writes had great impact on stability.
- when disable it during runtime reboots are minimized.
- leaving device on "idle" state, without reports and setters. Had stability and no any reboots over 12h

On 3.0.12b21 last one without any mods:
- i think they rework eeprom strategy here, was read in other post explanation by Polygon
- so device is stable when enable eeprom writes here.
- but opposite to 3.0.10 when left in idle state we receive reboot in 6-8h.

I think Z-wave.Me should step in and debug further, following your detailed explanations.

Thanks again for provided mods, descriptions and support.

Br,
Nik
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12

Post by yves »

Hi Nick,

for 3.12:
  • One reboot every 6-8hours looks high but finaly not so far from what I see (around 3/days)...
  • I also agree that EEPROM.put() works and does not imply a reboot every time.
    BUT, if you keep ZWCCBasic writing to EEPROM whenever a switch value is received, then you increase greatly number of reboots (it is quite proportionnal to the number of switch changes).
  • already mentionned earlier : I have noticed that after an EEPROM.put() there is a 500ms 'cpu hanging time'.
    Most often it happen during the PUT execution, but it may happen up to / over 10 sec later!
    Maybe EEPROM.put() trigs a system task that force a reboot when something else happen at the same time?
Have a nice day!
Yves
niky94547
Posts: 22
Joined: 31 May 2019 10:51

Re: version 3.0.12

Post by niky94547 »

Hi Yves,

Yes on 3.0.12b21 commenting eeprom R/W within zunoBasicSave... in ZWCCBasic.c, gives stable idle state (no Z-Wave set/get/report). Over 36h for now without reboot.

But that means issue still presented on latest 3.12b21, nevertheless of changes made on eeprom strategy by Z-Wave.Me team.

After idle state, i will test for stability under extensive network traffic to see if there is any other reason, than eeprom call for reboots. Will cut my code as using some interrupt and timers and then cycling setter, reports... First will test on close proximity with controller then on far distance, as you said there could be an unexpected behavior in such scenario.

Have a great day,
Nik
Post Reply