Hi to everyone that may be interested in Z-uno(2) life with Fibaro controller…
This is a long (too long) post that describe what could be called a bug in 3.0.10 firmware. I dont kwow if this is the right place but I don't want to open a new one...
I have 4 Z-uno controllers included in a Fibaro Home Center 3 network. At least 2 of them show erratic bugs (zwave network disconnection and/or “unexplained” reset). My zwave network includes around 50 more nodes (not Z-unos) that does not show (at all) same issues.
Looking closer, my ‘faulty‘ devices have a common distinctive point, they have
both inputs
AND outputs interfaces.
Note that I call it here a "Z-uno bug" but I do not know if it is an untold HC3 requirements or Z-uno unfilled reqs (by the way, the zwave protocol is not on github , as far as I known).
So I have tried to do some ‘reverse engineering’ of ZWSupport code to solve communication problems between (my) Home Center 3 and Z-uno boards. At first glance, problems seem to arise when two SET commands (from controller to Z-uno device) are received in a too short frame time. Noisy environments (ie ‘electrically’ far from the controller) increases problems for sure (see tests conditions farther).
I was fighting ‘softly’ for 6 months but it becomes urgent as the faultiest of my Z-uno device is the one in charge of my heat pump, and it is now wintertime
For sure heat pump requires more exchanges (inputs
AND outputs) in winter…
So I have tried to understand how things where working, here is my
Reverse Engineering:
zuno_CommandHandler (llcore.c) is called after controller message is received by Z-uno device.
- it prepares the answer (if any) calling specific handler (specific = child device’ type dependent)
for ex : for multilevel switch, it calls zuno_CCSwitchMultilevelHandler that do what is required, depending on the command (GET or SET or ...)
if it is a SET, calls ZWCC_BASIC_SETTER_1P (that mainly set the channel as ‘modified’)
- and call zunoSendReport to report back the value received (sort of Ack)
zunoSendReport (ZWSupport.c)
- This function DOES not send report! It just flags the channel as ‘to be reported’. In g_channels_data.report_map
These two functions are called under interrupt scheme (asynchronously) when a message is received. The two next one ar called on a regular basis using 10ms timer.
zunoSendReportHandler is called by zuno_CCTimer (every ticks & ZUNO_REPORTTIME_DIVIDER around 80ms as default value)
- if g_channels_data.report_map != 0 ; it tries to send report for every reports required in this var (bitwise) :
"Send": No, in fact it just pushes message to be send (later) by ZWQProcess in a fifo that is a linked list.
"it tries" Except if zunoCheckSystemQueueStatus() say that « QUEUE IS BUSY » ( and that is where things get worse…)
ZWQProcess (in CommandQueue.c) is called par zuno_CCTimer (every 10ms)
- Physically send messages included in above fifo to the controller.
- It uses _ZWQSend that calls zunoSysCall(ZUNO_SYSFUNC_SENDPACKET, …
zunoCheckSystemQueueStatus()
- Check that queue may be used (in every day life it is QUEUE_CHANNEL_LLREPORT that is checked)
Queue is declared busy :
- if currentTime - g_zuno_sys->rstat_pkgs_hp_time < SYSTEM_PKG_DOMINATION_TIME (=2000ms)
Notice that g_zuno_sys->rstat_pkgs_hp_time is set to current millis() when a message is received;
so that after a received SET message no Ack can be send before 2s !!!!???
(important vars named g_zuno_sys->rstat_* cannot be tracked in source code)
- or if g_zuno_sys->rstat_priority_counts[channel] > 0,
This is a more or les misterious var that is not equal to the number of message still to be sent to the controller.
Bugs/Improvements
At least if the controler is an HC3, there are 2 'required' improvement:
- 1 In ZWQProcess the block starting with
Code: Select all
if(p->flags & ZUNO_PACKETFLAGS_GROUP){
Should be terminated by:
Code: Select all
processed_indexes[processed_indexes_cnt++] = qi;
if(processed_indexes_cnt >= MAX_PROCESSED_QUEUE_PKGS) break;
Without that, every ‘Ack’ message is sent twice. And “calls” to the user application are also multiplied by 2 (that may explain unexpected resets?).
I have tested and validated this modification very efficient in fact. I think that there are some cases where these missing lines are not a bug. But my knowledge about ZWave is too weak…
- 2 After receiving a ‘SET’ command, zunoCheckSystemQueueStatus() return false during 2s (and it may get even worse if more set commands are received during those 2s) HC3 controler seems to urge for an Ack answer in a shorter delay that 2s. As a proof, whatever happen, after a SET, HC3 asks for a ‘GET’ 2s later…
If have set up a ‘firewall’ that after receiving a ‘SET’ command prevents zunoCheckSystemQueueStatus() from answering ‘QUEUE IS BUSY’ during the first 200ms.
This have also been tested and validated…
PS: at first, I have tried to decrease SYSTEM_PKG_DOMINATION_TIME and ZUNO_REPORTTIME_DIVIDER, it works, but not reliably (in fact it just makes above ‘bugs’ less pregnant/frequent)
Test conditions:
- Controler: Fibaro Home Center 3 still managing other ZWave devices and its own code. FW version 5.150.18 (last available)
- Device: Z-uno HW version 2 FW 3.0.10 included in network. The application that run is very simple and do not use much CPU (1%?) excepted when writing EEPROM (every 300s). Tasks are sliced and the application try to get back CPU every 100ms. Application use Getter/Setter architecture.
- Zuno sends to HC3 a SENSOR value every 30s + 3 other every 500s
- Zuno receives from HC3 2 binary switches every 600s and 2 multilevel switches every 120 s (ie 1 per 60s)
- On manual request, it is also possible to send 2 SET commands toward the Z-uno board with 500ms delay between the 2 sends. That is the most difficult test to pass!!
Tests are mainly made in a room where communications with the controller are not ‘top level’
.
Bug Description:
When things do not work (ie before above modifications)
LOGGING_DBG shows long list of consecutive:
- *** QUEUE CHANNEL is BUSY:
sometime because of (interval - GstSaPresse.u32Last_ZWSetTime) < SYSTEM_PKG_DOMINATION_TIME
But also/mainly because of g_zuno_sys->rstat_priority_counts[channel] > 0
This state may last for long (>30s). It is reset by HC3 that declares the device as ‘disconnected’ and the reconnect later…
- The other bug that is clearly seen is that upon a SET command there is an almost endless loop of
>>> INCOMING packet desc
>>> UNPACKED: packet desc
With the same "packet desc" every 30ms It is so fast that nothing else happen (remind that it is an interupt handler).
I don’t know from where these packet copies come, frequency is so high that I doubt that It came from HC3…
Bug frequency is directly related to the quality of communications with HC3, that is not surprising. But I think that it is also related to the occupation of HC3/zwave network. I believe that when one Zuno board is occupying the HC3 because of bug it has also consequences on other Z-uno.
Things that I still would like to understand:
- What is behind SYSTEM_PKG_DOMINATION_TIME and ZUNO_REPORTTIME_DIVIDER?
what it was intended for, if I change values what I might expect.
- Who, How, Where g_zuno_sys->rstat_xxxxxxxxx values are changed?
For ex: Why g_zuno_sys->rstat_priority_counts[channel] may stay > 0 for long?
Thank for your work anyway!!! and Happy New Year