version 3.0.12

Discussion about Z-Uno product. Visit http://z-uno.z-wave.me for more details.
yves
Posts: 52
Joined: 17 Sep 2021 18:05

version 3.0.12

Post by yves »

Hello,
I am (still) fighting with zwave protocol issues using 3.0.10.
You advice me to try 3.0.12 few monthes ago.

Reluctant up to now, this morning I was ready to test, but I went to github and i see that 'things' are moving...

Is it still worth to try 3.0.12 or should I wait for imminent release of new version BETA_00_04_02_B14 ?

Thanks!
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12

Post by yves »

Hi Again,
Some details about my current analysis of my Zwave issues with FW 3.0.10 (and HomeCenter3) if it may helps.

I have 4 different applications, but I will focus here on the two one that run physically close together. Behavior of the two other ones does not contradict my conclusions.

Both 2 are in my garage, 'far' from the HC3 but surrounded by devices that does not show any problem.
Recently I have installed on both external antennas, honestly with poor gain.

Looking closer, in fact there is one application/device that is really worse.

Both applications essentially send data to the HC3, the one that 90% works only receive two binary information per day. The one that does not work also essentially send data to the HC3 but receive SWITCH_MULTILEVEL and SWITCH_BINARY values up to 1 every minute…
Exchanging antennas does not change performance of the boards (so it is not a RF problem?)

I have included in the HC3 management a test board/application:
• If I only send info from HC3 to Z-Uno => OK
• If I only send info from Z-Uno to HC3 => OK
• If I mix send/receive I got quite rapidly problems….

"Quite rapidly" increases with:
• number of exchanges per hour
• poor quality of Zwave reception
• (And by the way I think that if 1 Z-uno module is having troubles it also impacts other Zmodules)

Looking to LOG :
HIGH PRIORITY PKG DOMINATION appears quite frequently (but not in the worse cases?)
QUEUE CHANNEL is BUSY is the most often observed, but the queue(s) are never full.


Side dishes: behavior during first minutes largely depends on the way the Zuno board has been rebooted:
switch reset , upload application or power off-on does not have the same result, last method is the best (even if it is in 1 sec).

Happy new year...
lanbrown
Posts: 333
Joined: 01 Jun 2021 08:06

Re: version 3.0.12

Post by lanbrown »

No idea what Razberry board you are using and if the antenna is good or not. You're expecting people to make a lot of assumptions in your setup.

While many wired devices act as a repeater, depending on the network it could cause stability or congestion issues. Z-Wave is half duplex. It cannot receive and send at the same time. So a repeater has to receive the data, possibly buffer it until the network is clear to send. As devices get more advanced and more chatty, this can lead to congestion issues.

You could use a second system and have one the master node and the other a second z-way network. You still see all of the devices on the master node, the second system only knows of the devices on it. Even if you couldn't connect them via Ethernet, you could use wireless. At least WiFi has a far greater bandwidth to reduce latency and congestion.

I have multiple Z-Way systems all feeding into a master node that has no physical devices added to it. It does have a Razberry board but only for Z-Way licensing; It has a Gen 5 Razberry board. The others all have Razberry 7 Pro boards. The master node can have devices, I just have decided for my setup I did not want that.
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12

Post by yves »

Hi lanbrown,

My master is a fibaro home center 3, I do not use Razberry board.
This subject is in fact following previous exchanges https://forum.z-wave.me/viewtopic.php?p=98750#p98750

Congestion issues is for sure the subject. How to avoid them, still staying in a simple HW environmement is the question.

I have around 50 nodes in my Zwave network, 2 of them have many "childs" sensors (more than 20).
Only my 4 Z-uno boards have strange behavior (with, on average, 12 I/O channels each).

End of the day, I have noticed that they are the only one that are not only a list of sensors toward the master but have also input(s) from it.
Statistically, issues seem to appear when 'sends' crosses 'receives' (but it is not 100% true from debug probes) .

Yes, Zwave is not bidirectionnal. In my case there are 46 (bought) devices that works 100%, and 4 (DIY) that are below 90%.

Many bugs stay somewhere between the keyboard and the chair, that is what is worrying me :D !!

Happy new year!
lanbrown
Posts: 333
Joined: 01 Jun 2021 08:06

Re: version 3.0.12

Post by lanbrown »

Creating multiple threads and expecting others to remember your setup on a forum is a high expectation.

The 3.1.x train is old:

31.08.2020 v3.1.0

A lot has changed since then and you should be asking Fibaro about software updates.

Good luck with YOUR problem.
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12

Post by yves »

HI lanbrown,
I agree with you that I should not have opened a new thread. May be someone will pull this one back where it should be...

I am not using 3.1 but 3.0.10 on Z-Uno2 boards, this is the last stable version.
And on Fibaro side I have also the last version...

YES this is MY problem.
I was thinking that sharing some solutions I have found was part of the aims for this forum?

Sorry if it is not the case...
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12

Post by yves »

Hi to everyone that may be interested in Z-uno(2) life with Fibaro controller…

This is a long (too long) post that describe what could be called a bug in 3.0.10 firmware. I dont kwow if this is the right place but I don't want to open a new one...

I have 4 Z-uno controllers included in a Fibaro Home Center 3 network. At least 2 of them show erratic bugs (zwave network disconnection and/or “unexplained” reset). My zwave network includes around 50 more nodes (not Z-unos) that does not show (at all) same issues.
Looking closer, my ‘faulty‘ devices have a common distinctive point, they have both inputs AND outputs interfaces.
Note that I call it here a "Z-uno bug" but I do not know if it is an untold HC3 requirements or Z-uno unfilled reqs (by the way, the zwave protocol is not on github , as far as I known).

So I have tried to do some ‘reverse engineering’ of ZWSupport code to solve communication problems between (my) Home Center 3 and Z-uno boards. At first glance, problems seem to arise when two SET commands (from controller to Z-uno device) are received in a too short frame time. Noisy environments (ie ‘electrically’ far from the controller) increases problems for sure (see tests conditions farther).

I was fighting ‘softly’ for 6 months but it becomes urgent as the faultiest of my Z-uno device is the one in charge of my heat pump, and it is now wintertime :D For sure heat pump requires more exchanges (inputs AND outputs) in winter…

So I have tried to understand how things where working, here is my

Reverse Engineering:

zuno_CommandHandler (llcore.c) is called after controller message is received by Z-uno device.
  • it prepares the answer (if any) calling specific handler (specific = child device’ type dependent)
    for ex : for multilevel switch, it calls zuno_CCSwitchMultilevelHandler that do what is required, depending on the command (GET or SET or ...)
    if it is a SET, calls ZWCC_BASIC_SETTER_1P (that mainly set the channel as ‘modified’)
  • and call zunoSendReport to report back the value received (sort of Ack)

zunoSendReport (ZWSupport.c)
  • This function DOES not send report! It just flags the channel as ‘to be reported’. In g_channels_data.report_map
These two functions are called under interrupt scheme (asynchronously) when a message is received. The two next one ar called on a regular basis using 10ms timer.

zunoSendReportHandler is called by zuno_CCTimer (every ticks & ZUNO_REPORTTIME_DIVIDER around 80ms as default value)
  • if g_channels_data.report_map != 0 ; it tries to send report for every reports required in this var (bitwise) :
    "Send": No, in fact it just pushes message to be send (later) by ZWQProcess in a fifo that is a linked list.
    "it tries" Except if zunoCheckSystemQueueStatus() say that « QUEUE IS BUSY » ( and that is where things get worse…)
ZWQProcess (in CommandQueue.c) is called par zuno_CCTimer (every 10ms)
  • Physically send messages included in above fifo to the controller.
  • It uses _ZWQSend that calls zunoSysCall(ZUNO_SYSFUNC_SENDPACKET, …
zunoCheckSystemQueueStatus()
  • Check that queue may be used (in every day life it is QUEUE_CHANNEL_LLREPORT that is checked)
Queue is declared busy :
  • if currentTime - g_zuno_sys->rstat_pkgs_hp_time < SYSTEM_PKG_DOMINATION_TIME (=2000ms)
    Notice that g_zuno_sys->rstat_pkgs_hp_time is set to current millis() when a message is received;
    so that after a received SET message no Ack can be send before 2s !!!!???
    (important vars named g_zuno_sys->rstat_* cannot be tracked in source code)
  • or if g_zuno_sys->rstat_priority_counts[channel] > 0,
    This is a more or les misterious var that is not equal to the number of message still to be sent to the controller.
Bugs/Improvements
At least if the controler is an HC3, there are 2 'required' improvement:
  • 1 In ZWQProcess the block starting with

    Code: Select all

    if(p->flags & ZUNO_PACKETFLAGS_GROUP){
    Should be terminated by:

    Code: Select all

    processed_indexes[processed_indexes_cnt++] = qi;
     if(processed_indexes_cnt >= MAX_PROCESSED_QUEUE_PKGS) break;

    Without that, every ‘Ack’ message is sent twice. And “calls” to the user application are also multiplied by 2 (that may explain unexpected resets?).
    I have tested and validated this modification very efficient in fact. I think that there are some cases where these missing lines are not a bug. But my knowledge about ZWave is too weak…
  • 2 After receiving a ‘SET’ command, zunoCheckSystemQueueStatus() return false during 2s (and it may get even worse if more set commands are received during those 2s) HC3 controler seems to urge for an Ack answer in a shorter delay that 2s. As a proof, whatever happen, after a SET, HC3 asks for a ‘GET’ 2s later…

    If have set up a ‘firewall’ that after receiving a ‘SET’ command prevents zunoCheckSystemQueueStatus() from answering ‘QUEUE IS BUSY’ during the first 200ms.
    This have also been tested and validated…
PS: at first, I have tried to decrease SYSTEM_PKG_DOMINATION_TIME and ZUNO_REPORTTIME_DIVIDER, it works, but not reliably (in fact it just makes above ‘bugs’ less pregnant/frequent)

Test conditions:
  • Controler: Fibaro Home Center 3 still managing other ZWave devices and its own code. FW version 5.150.18 (last available)
  • Device: Z-uno HW version 2 FW 3.0.10 included in network. The application that run is very simple and do not use much CPU (1%?) excepted when writing EEPROM (every 300s). Tasks are sliced and the application try to get back CPU every 100ms. Application use Getter/Setter architecture.
  • Zuno sends to HC3 a SENSOR value every 30s + 3 other every 500s
  • Zuno receives from HC3 2 binary switches every 600s and 2 multilevel switches every 120 s (ie 1 per 60s)
  • On manual request, it is also possible to send 2 SET commands toward the Z-uno board with 500ms delay between the 2 sends. That is the most difficult test to pass!!
Tests are mainly made in a room where communications with the controller are not ‘top level’ :D .

Bug Description:
When things do not work (ie before above modifications)
LOGGING_DBG shows long list of consecutive:
  • *** QUEUE CHANNEL is BUSY:
    sometime because of (interval - GstSaPresse.u32Last_ZWSetTime) < SYSTEM_PKG_DOMINATION_TIME
    But also/mainly because of g_zuno_sys->rstat_priority_counts[channel] > 0
    This state may last for long (>30s). It is reset by HC3 that declares the device as ‘disconnected’ and the reconnect later…
  • The other bug that is clearly seen is that upon a SET command there is an almost endless loop of
    >>> INCOMING packet desc
    >>> UNPACKED: packet desc

    With the same "packet desc" every 30ms It is so fast that nothing else happen (remind that it is an interupt handler).
    I don’t know from where these packet copies come, frequency is so high that I doubt that It came from HC3…
Bug frequency is directly related to the quality of communications with HC3, that is not surprising. But I think that it is also related to the occupation of HC3/zwave network. I believe that when one Zuno board is occupying the HC3 because of bug it has also consequences on other Z-uno.

Things that I still would like to understand:
  • What is behind SYSTEM_PKG_DOMINATION_TIME and ZUNO_REPORTTIME_DIVIDER?
    what it was intended for, if I change values what I might expect.
  • Who, How, Where g_zuno_sys->rstat_xxxxxxxxx values are changed?
    For ex: Why g_zuno_sys->rstat_priority_counts[channel] may stay > 0 for long?
Thank for your work anyway!!! and Happy New Year
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12 // BUG in 3.0.10, solved(?)

Post by yves »

Hi everyone (even if this post if mainly for developers)

:D BINGO! :D
I think I have found THE problem.

And, to me, it is not 100% related to Fibaro Home Center 3 behavior: it should be reproducible with every kind of ZWave controller.
In fact, previous post and proposed solutions were aiming to shorten response time of ZWave exchangess.
It was only ‘half working’ because it was only reducing the number of occurrences of the problem I have found:

Main issue is that zuno_CommandHandler (in llcore.c) is not re-entrant (*).
So that, if two ‘SET’ command occur within a too short frame time, ‘unpredictable results occur’ (up to a reset/reboot).
This seems to happen quite frequently when the z-uno board is ‘almost out of reach’ of the controller, but it may also happen when the board is the target of more that one application in the controller or when it is associated with ‘many’ other devices.

zuno_CommandHandler is a quite long code (to read & to execute) so I ‘replace’ it with a very simple one that just copies data received from the controller in a FiFo. Then this FiFo is emptied using the zuno_CCTimer that now call the 99% original zuno_CommandHandler (the 1% left is that it takes its inputs from the fifo). zuno_CCTimer is not an interrupt handler (or without too big constraints?)

Following the test bench described In previous post, it is now working for more than 24h in my “electricaly/RF dark/difficult” room without any problem (before, on average, it was around 50 disconnections/days )

RQ: I have also kept modifications explained in previous post, but I am not sure that they are required now.
Regards,

(*) re-entrant issues have ben caught using a static flag set to true upon entering and false on exiting the handler, if not false upon entry => problem...
yves
Posts: 52
Joined: 17 Sep 2021 18:05

Re: version 3.0.12

Post by yves »

HI,

About the 'improvement' (1) proposed in 2 posts before:
Each time child's data are sent to the controler they are repeated for every association group the 'child' belongs to.
I am not using association groups, this is why my proposition is efficient for reducing ZWaave traffic and works without noticed drawback.
It might not be the case for every one/configuration...

Have a nice day
p0lyg0n1
Posts: 242
Joined: 04 Aug 2016 07:14

Re: version 3.0.12

Post by p0lyg0n1 »

Hi,
It sounds strange, especially in the context of the latest beta (release date December 29, 2023).
I'll try again, but as far as I know, the CommandHandler is always called by the SDK from only one thread, but at the same time it can be called recursively inside this thread (although we removed this point). To verify this, you can print the handle of the current thread (#include <Threading.h>
zunoGetCurrentThreadHandle()) in addition to that flag. Please also write the build number where you have the problem (available when uploading the sketch if detailed logging in Arduino is selected, or manually zme_make boardInfo -d <yourcomportortty>).

Best regards,
Alex.
Post Reply