Reduction of stack usage

Discussion about Z-Uno product. Visit http://z-uno.z-wave.me for more details.
A.Harrenberg
Posts: 201
Joined: 05 Sep 2016 22:27

Reduction of stack usage

Post by A.Harrenberg »

Hi p0lyg0n1,

I started to change the code of the first library to reduce the stack usage and have a few questions about that...

In a first step I took a function with several arguments

Code: Select all

unsigned char RFID::MFRC522ToCard(unsigned char command, unsigned char *sendData, unsigned char sendLen, unsigned char *backData, unsigned int *backLen)
, created a global struct (outside of the class) to hold all the parameters and then only pass the pointer to that struct to this function while inside the function I changed all references to the original parameter to point to the new global struct instead. I kept the original function and change on call to the function to use the global struct.

For the original function I found this in the *.rst file:

Code: Select all

1489 ;------------------------------------------------------------
1490 ;Allocation info for local variables in function '__cxx__RFID__method__MFRC522ToCard0505p0505p05p09'
1491 ;------------------------------------------------------------
1492 ;command                   Allocated to stack - _bp -3
1493 ;sendData                  Allocated to stack - _bp -6
1494 ;sendLen                   Allocated to stack - _bp -7
1495 ;backData                  Allocated to stack - _bp -10
1496 ;backLen                   Allocated to stack - _bp -13
1497 ;v__this                   Allocated to registers r5 r6 r7 
1498 ;this                      Allocated to stack - _bp +3
1499 ;status                    Allocated to stack - _bp +11
1500 ;irqEn                     Allocated to stack - _bp +6
1501 ;waitIRq                   Allocated to stack - _bp +7
1502 ;lastBits                  Allocated to registers r6 
1503 ;n                         Allocated to stack - _bp +10
1504 ;i                         Allocated to stack - _bp +8
1505 ;sloc0                     Allocated to stack - _bp +1
1506 ;------------------------------------------------------------
1507 ;	C:\Users\andre\AppData\Local\Temp\build6796448123814796796.tmp\ZUNO_RFID_ucxx.c:642: unsigned char __cxx__RFID__method__MFRC522ToCard0505p0505p05p09(void * v__this, unsigned char command, unsigned char * sendData, unsigned char sendLen, unsigned char * backData, unsigned int * backLen)
From this I understand that there are +11 - (-13) + 2 = 26 byte of stack used. Is that correct?

For the modified function I found this:

Code: Select all

2247 ;------------------------------------------------------------
2248 ;Allocation info for local variables in function '__cxx__RFID__method__MFRC522ToCard_ah01prtoCard'
2249 ;------------------------------------------------------------
2250 ;toCard_p                  Allocated to stack - _bp -5
2251 ;v__this                   Allocated to registers r5 r6 r7 
2252 ;this                      Allocated to stack - _bp +14
2253 ;status                    Allocated to stack - _bp +9
2254 ;irqEn                     Allocated to stack - _bp +8
2255 ;waitIRq                   Allocated to stack - _bp +11
2256 ;lastBits                  Allocated to stack - _bp +1
2257 ;n                         Allocated to stack - _bp +10
2258 ;i                         Allocated to stack - _bp +12
2259 ;sloc0                     Allocated to stack - _bp +4
2260 ;sloc1                     Allocated to stack - _bp +5
2261 ;sloc2                     Allocated to stack - _bp +1
2262 ;------------------------------------------------------------
2263 ;	C:\Users\andre\AppData\Local\Temp\build6796448123814796796.tmp\ZUNO_RFID_ucxx.c:858: unsigned char __cxx__RFID__method__MFRC522ToCard_ah01prtoCard(void * v__this, struct toCard * toCard_p)
2264 ;	-----------------------------------------
Here +14- (-5) + 2 = 21 bytes of stack are used, so there is a reduction of 5 bytes stack usage. (correct ?)

I expected that "removing" the parameters would lead to a higher reduction...

I noticed that in the modified function there additional values put on the stack...
"lastbits" was previously in a register and is now put on stack. lastbits is used to calculate a value that is then assigned to one of the parameter / the corresponding value in the global struct.

Next, there are now to more local values "sloc1" and "sloc2"... Which I can't correlate to any part of the code as I am not so good in reading assembler code...

Can you tell why lastbits is now on the stack and no longer in a register? The part where lastbits is used is here:

Code: Select all

				lastBits = readMFRC522(ControlReg) & 0x07;
				if (lastBits) {   
					*toCard_p->backLen = (n-1)*8 + lastBits;   
				} else {   
					*toCard_p->backLen = n*8;   
				}
In the original code there was "*backlen" instead of the "*toCard_p->backlen".

Next question, is there a way to tell why the compiler created two more local variables and where they are used without fully understanding the assembler code?

So far I did not even check the modified function, I just want to make sure that I understood the approach correctly and can use the numbers from the *.rst to check If some changes really reduce the use of the stack and that in this case I would have a reduction from 26 to 21 bytes.

Best regards,
Andreas.
fhem.de - ZWave development support
A.Harrenberg
Posts: 201
Joined: 05 Sep 2016 22:27

Re: Reduction of stack usage

Post by A.Harrenberg »

Hi,

I have now changed all function calls for this function to the modified version and also moved some variables to global variables.

The sketch seems to run now, I haven't experienced any reset so far and I can detect an RFID tag and read out the 5 byte serial number of the tag!

So p0lyg0n1 was absolutely right when he suspected that to be an issue with stack use!

The suggested tool/debugging warning for the stack use sounds like a good idea for developers... ;)

Best regards,
Andreas.
fhem.de - ZWave development support
p0lyg0n1
Posts: 242
Joined: 04 Aug 2016 07:14

Re: Reduction of stack usage

Post by p0lyg0n1 »

The small notes about reduction of stack usage.
We have 3 main usage of stack:
1. to pass parameters inside our function. they will be with negative values in assign map, like:

Code: Select all

;command                   Allocated to stack - _bp -3
;backLen                   Allocated to stack - _bp -13
2. to make some local varriables:
like these:

Code: Select all

;status                    Allocated to stack - _bp +11
;irqEn                     Allocated to stack - _bp +6
3. to return to previous function:
return address pushing to the stack and it uses anyway 2bytes in any function.

You reduce the 1 group of stack usage when you reduce number of passing parameters.
But you have a lot of 2-nd group vars - move some of them to global area...
and there are some strange ones that called "sloc".
Compiler automatically create them when it can't make some calculations with current free set of registers/loca varriables.
You can't do with them nothing directly, but you can reduce a number of them simplifying your code.
Use RISC style of coding:
instead of

Code: Select all

 a = b * c - 20;
do

Code: Select all

a = b;
a *= c;
a -= 20;
Divide your function into little sub-function blocks, but don't use very deep subfunctions it will increase stack too :) So, just keep a balance...
A.Harrenberg
Posts: 201
Joined: 05 Sep 2016 22:27

Re: Reduction of stack usage

Post by A.Harrenberg »

Hi p0lyg0n1,
p0lyg0n1 wrote:The small notes about reduction of stack usage.
1. to pass parameters inside our function. they will be with negative values in assign map, like:
2. to make some local varriables:
3. to return to previous function:
yes, I think I understood this principle.
p0lyg0n1 wrote: But you have a lot of 2-nd group vars - move some of them to global area...
I will, but I need to understand the library code better in order to not break the code. I have to analyze the code a little bit better to minimize the global variables, otherwise I have to create global variables for each function and the function calls from that function seperatly.
Next problem here is that it is not a good idea to change the public calls to the library as this will break the compatibility to example code from Arduino.

At the moment it looks like I have to rewrite several functions and can't keep the compatibility to the Arduino, but I will try to keep it...
p0lyg0n1 wrote: and there are some strange ones that called "sloc".
Compiler automatically create them when it can't make some calculations with current free set of registers/loca varriables.
My main question was to find out where the compiler needs to create the local sloc variables to see which calculation or which assignment I need to modify in order to safe stack.
p0lyg0n1 wrote: Divide your function into little sub-function blocks, but don't use very deep subfunctions it will increase stack too :) So, just keep a balance...
I get your point, but keeping the original public function and reducing the stack size at the same time will be not so easy.

But I have a starting point now as it at least compiles, 2.07 will give some benefit as well, even if I found something strange there...

Best regards,
Andreas.
fhem.de - ZWave development support
droll
Posts: 48
Joined: 20 Dec 2013 01:37

Re: Reduction of stack usage

Post by droll »

Concerning:
From this I understand that there are +11 - (-13) + 2 = 26 byte of stack used. Is that correct?
I don't understand where the additional 2 bytes are coming from. The return address is certainly pushed on the stack, but I assume that this address is stored at _bp -0 and _bp -1. The stack allocation of these 2 bytes is not explicitly listed in the stack allocation report, but it is taken into account (in my point of view). There is maybe a 8051 knowledgeable person that may comment on that ...

I am also encountering problems due to the limited stack in my application. To analyze the stack usage and optimize it I am using a Python script that parses all generated asm files, builds then the function calling tree, and reports the used stack. It has to be called by providing the compilation temp folder as argument:

Code: Select all

CheckStackUsage.py <CompilationDirectory>
The script is attached to this post. It doesn't add for the moment the mentioned 2 additional bytes discussed above.

Interesting to me is to see that my application uses only 69 bytes of stack in the worst case (starting from the setup and loop functions). Post SPI documentation mentions that there are 120-140 bytes of stack available. So I think my application should have sufficient headroom on the stack side, but I observe sometimes some instabilities. I am wondering if there are really 120-140 bytes of stack available for the user application?
Attachments
CheckStackUsage.zip
Stack usage analyzer
(1013 Bytes) Downloaded 358 times
Last edited by droll on 22 Jan 2017 11:01, edited 1 time in total.
A.Harrenberg
Posts: 201
Joined: 05 Sep 2016 22:27

Re: Reduction of stack usage

Post by A.Harrenberg »

Hi droll,

thanks for the script, and I would love to use it, but I can't get it up and running... I never used Python before, so I just installed the latest Python 3.6 on my windows machine, but the script is producing an error, no matter if I give the directory as an argument or not:

Code: Select all

  File "S:\FHEM\Z-Uno\CheckStackUsage\CheckStackUsage.py", line 25
    print 'Analyzed assember files:'
                                   ^
SyntaxError: Missing parentheses in call to 'print'
Is the script depending on the version of Python? Should I better use a version 2.x?

Best regards,
Andreas.
fhem.de - ZWave development support
droll
Posts: 48
Joined: 20 Dec 2013 01:37

Re: Reduction of stack usage

Post by droll »

Sorry for not having specified the Python version I am using. I use Python 2.7. The print command is not compatible anymore with Pyton 3.x.
A.Harrenberg
Posts: 201
Joined: 05 Sep 2016 22:27

Re: Reduction of stack usage

Post by A.Harrenberg »

Hi droll,
ok, thanks for the info. I will deinstall 3.6 and install a 2.7 version and will test it again tomorrow.
BR,
Andreas.
fhem.de - ZWave development support
A.Harrenberg
Posts: 201
Joined: 05 Sep 2016 22:27

Re: Reduction of stack usage

Post by A.Harrenberg »

Hi,

couldn't wait... Installed 2.7 and it works! Thank you!

Code: Select all

5	119											__cxx__SPIClass__method__beginTransaction01prSPISettings (1x)
2	121												zunoPushByte (4x)
2	121												__cxx__SPISettings__method__getMode00 (1x)
2	121												__cxx__SPISettings__method__getClock00 (1x)
2	121												zunoCall (2x)
2	121												__cxx__SPISettings__method__getBitOrder00 (1x)
6	120											__cxx__SPIClass__method__transfer0105 (2x)
2	122												zunoPushByte (2x)
2	122												zunoPushWord (1x)
2	122												reinterpPOINTER (1x)
2	122												zunoCall (1x)
The code I am trying to port to the zuno currently has 122 byte of stack as a maximum according to your tool...
I am not sure whether these 120-140 byte of stack is for the user code alone or has to be shared with the lower level zuno functions as well... But for sure my code is crashing the zuno ,-(

I have two small questions about the script...
First, what is the number in the first column? Sometimes it is also a question mark...
Second, do you also count the "sloc" bytes that are pushed to the stack? I guess yes, but can you confirm?

This is somehow frustrating... I already reduced the stack usage quite a lot and I am still far above a safe limit. And there are more complex functions in the library that are currently not used by the example I am using.

Best regards,
Andreas.
fhem.de - ZWave development support
droll
Posts: 48
Joined: 20 Dec 2013 01:37

Re: Reduction of stack usage

Post by droll »

The first column contains the number of bytes used by the function. It is derived from the stack allocation tables that are added as comments to the generated assembler files; the 'sloc' bytes are therefore taken into account. For system functions or math functions (e.g. mulint, gptrget, 0x002B00) the stack allocation tables are not reported in an assembler file. The stack usage is therefore not known, which is indicated with a question mark in the generated report. The script assumes in these cases that 2 bytes are used, but this can be highly underestimated!

The second column provides the accumulated function stack usage.

The previous version of the script was not recognizing system calls (e.g. 'LCALL 0x002B00' made by 'zunoCall'). This has been corrected. You can download the updated script from the previous post that contains now the corrected script.
Post Reply