Main Content

In the world of microcontrollers and unlike the world of computers, resources are scarce and any optimizations that can be made are very useful.
In this article we will cover some of the various optimizations that can be made so that the final code loaded in the micro-controller is smaller and potentially faster.
I will focus on the optimization that can be obtained in the chips of the Atmega / Attiny family. However, some of these approaches can be followed for any type of microcontroller.

I will start from a simple example and we will observe that with some changes the reduction of code produced by the compiler is very significant.

There are 18 recommendations that globally decrease the size of the code:

  1. Compile with size optimization (-Os). The Arduino IDE already has this flag active by default.
  2. Use local variables whenever possible.
  3. Use the smallest applicable data type. Use unsigned variables if possible.
  4. If a non-local variable is only referenced within function it should be declared static.
  5. Join non-local variables in structures whenever natural. This increases the possibility of indirect addressing without reloading the pointer.
  6. Use pointers with offset (offset) or declare structures to access the mapped_ _I / O memory.
  7. Use the Use for (;;) {} for external cycles.
  8. Use the do { } while(expression) if applicable.
  9. Use down counters and pre-decrements if applicable.
  10. Access the I/O memory directly (i.e., do not use pointers).
  11. Declare main as C_task if it is not called from either side of the program.
  12. Use macros instead of functions for tasks that create less than 2-3 lines of assembler code.
  13. Reduce the size of the Interrupt Vector segment (INTVEC) to what is currently used by the application. Alternatively, concatenate all CODE segments in a declaration and will be done automatically.
  14. Code reuse is intra-modular. Merge various functions in a module (i.e., in a file) to increase the code reuse factor.
  15. In some cases, speed optimizations result in size code smaller than size optimization. Compile module to module to see which one has the best result.
  16. Optimize C_startup not to initialize unused segments (i.e., IDATA0 or IDATA1 if all variables are small).
  17. If possible, avoid calling functions from within the interrupt routine.
  18. Use the smallest possible memory model.

We will make use of some of these recommendations. To test the result of the optimizations we will start from a code base.

All the sketchs are in a ZIP in this link

The following code is a modified version of Blink that will flash two leds with a variable range between 0 and 1000 msec with jumps of 100 in 100 msec.

int led1 = 13;
int led2 = 12;
int delayTime = 1000;

void setup() {
  pinMode(led1, OUTPUT);
  pinMode(led2, OUTPUT);
}

void loop() {
  for (delayTime = 0; delayTime < 1000 ; delayTime += 100) {
    digitalWrite(led1, HIGH);
    delay(delayTime);
    digitalWrite(led1, LOW);
    delay(delayTime);
    digitalWrite(led2, HIGH);
    delay(delayTime);
    digitalWrite(led2, LOW);
    delay(delayTime);
  }
}
// Sketch uses 1,198 bytes (3%) of program storage space. Maximum is 32,256 bytes.
// Global variables use 15 bytes (0%) of dynamic memory, leaving 2,033 bytes for local variables. Maximum is 2,048 bytes.

As we can see the original code occupies 1196 bytes of flash.

Using the optimization of the space occupied by the variables we will change its type:

TypeRange of valuesNumber of Bytes
char-128 to 1271
byte0 to 2551
int-32,768 to 32,7672*
unsigned int0 to 65,5352*
word0 to 65,5352
long-2,147,483,648 to 2,147,483,6474
unsigned long0 to 4,294,967,2954
float-3.4028235E+38 to 3.4028235E+384
double-3.4028235E+38 to 3.4028235E+384*

The values with (*) are for the Atmega328P chip. Others may be different.

Step 1

In the example we can verify that the declaration of the LEDs can be converted to byte.

Additionally it can also be converted to const since they are not changed (this information is essential for the compiler to optimize the use of these variables).

This step is completely safe and can always be done.

const byte led1 = 13;
const byte led2 = 12;
int delayTime = 1000;

void setup() {
  pinMode(led1, OUTPUT);
  pinMode(led2, OUTPUT);
}

void loop() {
  for (delayTime = 0; delayTime < 1000; delayTime += 100) {
    digitalWrite(led1, HIGH);
    delay(delayTime);
    digitalWrite(led1, LOW);
    delay(delayTime);
    digitalWrite(led2, HIGH);
    delay(delayTime);
    digitalWrite(led2, LOW);
    delay(delayTime);
  }
}
// Sketch uses 1,182 bytes (3%) of program storage space. Maximum is 32,256 bytes.
// Global variables use 11 bytes (0%) of dynamic memory, leaving 2,037 bytes for local variables. Maximum is 2,048 bytes.

We have won 16 bytes.

Step 2

Let’s now pass the delayTime global variable as a local variable of the loop function.

This step is completely safe and can always be done.

const byte led1 = 13;
const byte led2 = 12;

void setup() {
  pinMode(led1, OUTPUT);
  pinMode(led2, OUTPUT);
}

void loop() {

  int delayTime;

  for (delayTime = 0; delayTime <= 1000; delayTime += 100) {
    digitalWrite(led1, HIGH);
    delay(delayTime);
    digitalWrite(led1, LOW);
    delay(delayTime);
    digitalWrite(led2, HIGH);
    delay(delayTime);
    digitalWrite(led2, LOW);
    delay(delayTime);
  }
}
// Sketch uses 1,110 bytes (3%) of program storage space. Maximum is 32,256 bytes.
// Global variables use 11 bytes (0%) of dynamic memory, leaving 2,037 bytes for local variables. Maximum is 2,048 bytes.

This time we’ve won 72 bytes.

Step 3

In this step we will encapsulate functions that do multiple operations.
In this case a function was created to execute the work of one of the LEDs.

This step is completely safe and can always be done.

const byte led1 = 13;
const byte led2 = 12;

void setup() {
  pinMode(led1, OUTPUT);
  pinMode(led2, OUTPUT);
}

void go(const byte led, const int delayTime) {
    digitalWrite(led, HIGH);
    delay(delayTime);
    digitalWrite(led, LOW);
    delay(delayTime);
}

void loop() {
  int delayTime;
  for (delayTime = 0; delayTime <= 1000; delayTime += 100) {
    go(led1, delayTime);
    go(led2, delayTime);
  }
}
// Sketch uses 1,102 bytes (3%) of program storage space. Maximum is 32,256 bytes.
// Global variables use 9 bytes (0%) of dynamic memory, leaving 2,039 bytes for local variables. Maximum is 2,048 bytes.

We’ve just won 8 bytes.

Step 4

We have to take more radical measures. This will replace the pinMode and digitalWrite functions.

This change breaks certain functionalities, namely ones related to PWM.

It also does not check for incorrect values passed to the macros.
Since we will tinker directly with the registers the code has a set of macros that allow the code to be used with the following microcontrollers:

  • Atmega328P
  • Atmega168
  • attiny45/85
  • attiny44/84

This code tends to be faster because it makes what it strictly necessary.

Include the code at the top of sketchs:

// AVR-optimize
//
#if defined (__AVR_ATtiny45__) || defined (__AVR_ATtiny85__) || defined (__AVR_ATtiny44__) || defined (__AVR_ATtiny84__)
#define portOfPin(P)\
  ((&PORTB))
#define ddrOfPin(P)\
  ((&DDRB))
#define pinOfPin(P)\
  ((&PINB))
#define pinIndex(P)((uint8_t)(P>13?P-14:P&7))
#else
#if (__AVR_ATtiny44__) || defined (__AVR_ATtiny84__)
#define portOfPin(P)\
  (((P)>=0&&(P)<8)?&PORTA:&PORTB)
#define ddrOfPin(P)\
  (((P)>=0&&(P)<8)?&DDRA:&DDRB)
#define pinOfPin(P)\
  (((P)>=0&&(P)<8)?&PINA:&PINB)
#define pinIndex(P)((uint8_t)(P>7?P-7:P&7))
#else
#if defined(__AVR_ATmega328P__) || defined(__AVR_ATmega168__)

#define portOfPin(P)\
  (((P)>=0&&(P)<8)?&PORTD:(((P)>7&&(P)<14)?&PORTB:&PORTC))
#define ddrOfPin(P)\
  (((P)>=0&&(P)<8)?&DDRD:(((P)>7&&(P)<14)?&DDRB:&DDRC))
#define pinOfPin(P)\
  (((P)>=0&&(P)<8)?&PIND:(((P)>7&&(P)<14)?&PINB:&PINC))
#define pinIndex(P)((uint8_t)(P>13?P-14:P&7))
#endif
#endif
#endif

#define pinIndex(P)((uint8_t)(P>13?P-14:P&7))
#define pinMask(P)((uint8_t)(1<<pinIndex(P)))

#define pinAsInput(P) *(ddrOfPin(P))&=~pinMask(P)
#define pinAsInputPullUp(P) *(ddrOfPin(P))&=~pinMask(P);digitalHigh(P)
#define pinAsOutput(P) *(ddrOfPin(P))|=pinMask(P)
#define digitalLow(P) *(portOfPin(P))&=~pinMask(P)
#define digitalHigh(P) *(portOfPin(P))|=pinMask(P)
#define isHigh(P)((*(pinOfPin(P))& pinMask(P))>0)
#define isLow(P)((*(pinOfPin(P))& pinMask(P))==0)
#define digitalState(P)((uint8_t)isHigh(P))

New code:

// incluir o AVR-optimize

const byte led1 = 3;
const byte led2 = 2;

void setup() {
  pinAsOutput(led1);
  pinAsOutput(led2);
}

void go(const byte led, const int delayTime) {
    digitalHigh(led);
    delay(delayTime);
    digitalLow(led);
    delay(delayTime);
}

void loop() {
  int delayTime;
  for (delayTime = 0; delayTime <= 1000; delayTime += 100) {
    go(led1, delayTime);
    go(led2, delayTime);
  }
}
// Sketch uses 812 bytes (2%) of program storage space. Maximum is 32,256 bytes.
// Global variables use 9 bytes (0%) of dynamic memory, leaving 2,039 bytes for local variables. Maximum is 2,048 bytes.

Finally we broke the 1k barrier.

A sketch without any code occupies 450 bytes. Ours is 812 bytes.

Which means that the code we are using is occupying 362 bytes.

Step 5

We have not yet replaced one of the functions - delay. In this step we make this substitution.

// incluir o AVR-optimize

const byte led1 = 13;
const byte led2 = 12;

void setup() {
  pinAsOutput(led1);
  pinAsOutput(led2);
}

void tinyDelay(int time) {
  register unsigned long initial = millis();
  while ( millis() - initial < time ) {
    yield();
  }
}

void go(const byte led, const int delayTime) {
    digitalHigh(led);
    tinyDelay(delayTime);
    digitalLow(led);
    tinyDelay(delayTime);
}

void loop() {
  int delayTime;
  for (delayTime = 0;delayTime <= 1000;delayTime += 100) {
    go(led1, delayTime);
    go(led2, delayTime);
  }
}
// Sketch uses 744 bytes (2%) of program storage space. Maximum is 32,256 bytes.
// Global variables use 9 bytes (0%) of dynamic memory, leaving 2,039 bytes for local variables. Maximum is 2,048 bytes.

In this last iteration we were able to go to 744 bytes.

Final Step

In this step we will do some more optimizations:

  • Create our own main
  • return to describe the operations that the function go did by eliminating the function completely
  • delete setup and loop by putting the code directly in main
// incluir o AVR-optimize

const byte led1 = 13;
const byte led2 = 12;

void tinyDelay(int time) {
  register unsigned long initial = millis();
  while ( millis() - initial < time ) {
    yield();
  }
}

int main(void) {
  init(); // don't forget this!
  // SETUP:
  pinAsOutput(led1);
  pinAsOutput(led2);
  for(;;) {
    // LOOP:
    int delayTime;
    for (delayTime = 0; delayTime <= 1000; delayTime += 100) {
      digitalHigh(led1);
      tinyDelay(delayTime);
      digitalLow(led1);
      tinyDelay(delayTime);
      digitalHigh(led2);
      tinyDelay(delayTime);
      digitalLow(led2);
      tinyDelay(delayTime);
    }
  }
}
// Sketch uses 578 bytes (1%) of program storage space. Maximum is 32,256 bytes.
// Global variables use 9 bytes (0%) of dynamic memory, leaving 2,039 bytes for local variables. Maximum is 2,048 bytes.

Conclusion

We started with 1,198 bytes and were able to optimize the code to 578 bytes - less than half the original code.

This exercise should be done whenever necessary, pondering all the potential problems that some of the optimizations may cause.

We could still have gone to the plain code without Arduino libraries but that is for another article.