This is an old revision of the document!
This is my collection of interesting tidbits I collected while playing around with Arduino.
It is hard to get an in-depth understanding of Arduino and the various CPUs involved. The standard IDE involves a lot of 'magic' and there is not much help when you want to get a better understanding of how the magic works.
I had to do a lot of digging to figure out how to easily do assembler programming on the Arduino; below the results of my information quest.
This Arduino IDE is written in Java, and is merely a wrapper around the GNU toolchain for the AVR CPU range. The underlying toolchain has support for mixing assembler and C++ programs, but that functionality is not exposed through the IDE.
Only very minor changes are needed to the IDE source code to make it all work. I went through the exercise on Mac OS X and on Linux; things will be similar on Windows.
Before you can perform the procedure below, you need to have a proper Java development environment set up on your computer. You need to be able to call the Java compiler and the ant build tool from the command line.
These things were already set up on my Mac and Ubuntu machines when I started my quest, so I did not do any research into where they came from - not sure whether they're installed after a standard install or not. If not, they're fairly easy to come by.
On Mac, I imagine you might need to download and install Xcode from Apple (available on the App Store), or use fink or MacPorts. You might also need to download and install a Java JDK from Oracle.
On Linux, you might need to use a few apt-get or yum commands to fetch the necessary tools.
Step 1: Get the source code for the Arduino IDE. At the time of this writing, the source code is available in a git repository located at:
https://github.com/arduino/Arduino
Download the .zip archive with the IDE source code.
Expand the archive - you should end up with a directory called 'Arduino-master'.
First, go find the file called 'Sketch.java' in
wherever_you_put_it/Arduino-master/app/src/processing/app/Sketch.java
The problem is that this code does not recognize “S” (uppercase S) as a valid file name extension. So, you need to adjust the code to accept “S” in addition to “c”, “h”, and so on.
At the time of this writing, the first of the areas to change are around line 1455 (this is version 1.02 of the Arduino IDE).
We add a lower case “s” here - as far as I can tell, the file name being tested has already been converted to lower case, so the extension to look for is “s” even though the actual file name has an upper case “S”.
... // 3. then loop over the code[] and save each .java file for (SketchCode sc : code) { if (sc.isExtension("c") || sc.isExtension("cpp") || sc.isExtension("h")) { // no pre-processing services necessary for java files ...
needs to be come
... // 3. then loop over the code[] and save each .java file for (SketchCode sc : code) { if (sc.isExtension("c") || sc.isExtension("cpp") || sc.isExtension("h") || sc.isExtension("s")) { // no pre-processing services necessary for java files ...
and around line 1871
... /** * Returns a String[] array of proper extensions. */ public String[] getExtensions() { return new String[] { "ino", "pde", "c", "cpp", "h" }; } ...
needs to become
... /** * Returns a String[] array of proper extensions. */ public String[] getExtensions() { return new String[] { "ino", "pde", "c", "cpp", "h", "s" }; } ...
Then start a command-line session and navigate into the build directory, and build the patched IDE
cd wherever_you_put_it/Arduino-master/build ant build ant dist
For the version number, I entered '0102asm' - i.e. the source code repository I got from the GitHub was version 1.02, and I added 'asm' to the version number to remind me that I am not running a standard IDE.
After that, I ended up with a patched distribution in
wherever_you_put_it/Arduino-master/build/macosx/arduino-0102asm-macosx.zip
I then decompressed this (which gives you Arduino.app) and then moved this patched IDE into my /Applications folder instead of the 'official' downloadable version.
On Linux it works pretty much the same way; I haven't tried it on Windows, but I imagine it to be similar too.
The mpide IDE used for the Uno32 (which is Arduino-like, but uses a 32-bit PICX32 processor instead of an 8-bit AVR processor) is based on the Arduino IDE, but modified to support the Uno32. You can make similar changes to this source code too.
As a first exercise, I rebuilt the Blink example in assembler.
Here's how to do it: create a new sketch. In the sketch folder, add two text files: one called 'asmtest.h' and another called 'asmtest.S'.
In asmtest.h put:
/* * Global register variables. */ #ifdef __ASSEMBLER__ /* Assembler-only stuff */ #else /* !ASSEMBLER */ /* C-only stuff */ #include <stdint.h> extern "C" uint8_t led(uint8_t); extern "C" uint8_t asminit(uint8_t); #endif /* ASSEMBLER */
This defines the assembler routines in such a way that they can be called from a C/C++ program. To avoid issues with C++ name mangling, I defined the functions as extern “C” - this tells the C/C++ compiler that the underlying function is using pure C calling conventions as opposed to C++ calling conventions, and hence does not need name mangling.
In the file asmtest.S we get the assembler code:
#include "avr/io.h" #include "asmtest.h" ; Define the function asminit() .global asminit asminit: sbi 4,5; 4 = DDRB (0x24 - 0x20). Bit 5 = pin 13 ret ; Define the function led() .global led ; The assembly function must be declared as global led: cpi r24, 0x00 ; Parameter passed by caller in r24 breq turnoff sbi 5, 5; 5 = PORTB (0x25 - 0x20). Bit 5 = pin 13 ret turnoff: cbi 5, 5; 5 = PORTB (0x25 - 0x20). Bit 5 = pin 13 ret
The code took me a bit of fishing through the instruction set for the AVR processor. PORTB is in location 0x25, but when using the sbi (set bit immediate) or cbi (clear bit immediate) instructions you need to subtract 0x20 from that. Bit 5 of the PORTB byte corresponds to pin 13 of the Arduino board.
Finally the main sketch code in the .ino file is:
#include "asmtest.h" void setup() { asminit(0); } void loop() { led(0); delay(1000); led(1); delay(1000); }
In other words, I still define the setup() and loop() functions, and these then call into my assembler functions.
This first test is only 'half-assembler' - we still have some C/C++ backbone, but the difference in size is significant already. A standard 'Blink' sketch compiles to 1084 bytes. My assembler version is only 582 bytes.
Assembler is most often not the right language to do things in, as the most expensive resource is often the developer's time, and doing things in assembler is slower, more error-prone, and less efficient than C. However, when memory or CPU time is tight, using assembler can result in substantial space savings and speed improvements.
My next exercise was rebuilding the BlinkWithoutDelay example, my first attempt reducing code size from 1028 bytes to 580 bytes. The assembler part is the same as in the Blink example.
asmtest.h:
/* * Global register variables. */ #ifdef __ASSEMBLER__ /* Assembler-only stuff */ #else /* !ASSEMBLER */ /* C-only stuff */ #include <stdint.h> extern "C" uint8_t led(uint8_t); extern "C" uint8_t asminit(uint8_t); #endif /* ASSEMBLER */
asmtest.S:
#include "avr/io.h" #include "asmtest.h" .global asminit asminit: sbi 4,5; 4 = DDRB (0x24 - 0x20). Bit 5 = pin 13 ret .global led ; The assembly function must be declared as global led: cpi r24, 0x00 ; Parameter passed by caller in r24 breq turnoff sbi 5, 5; 5 = PORTB (0x25 - 0x20). Bit 5 = pin 13 ret turnoff: cbi 5, 5; 5 = PORTB (0x25 - 0x20). Bit 5 = pin 13 ret
sketch.ino:
#include "asmtest.h" int x = 0; int on = 1; void setup() { asminit(0); } void loop() { int y; if ((y = millis()) - x > 0) { x = y + 1000; on = -on; led(on); } }
The next step I tried was to build a version of BlinkWithoutDelay that only uses assembler. This is what I came up with:
Make a new sketch, and empty the complete .ino file. You still need to have it, but it should be empty.
Then also create a file called 'asmtest.S' in the same directory as the (empty) .ino file:
#include "avr/io.h" #define yl r28 #define yh r29 .global setup setup: sbi _SFR_IO_ADDR(DDRB), DDB5 ; Bit 5 = pin 13 ret // const long delay = 1000; #define delay 1000 // ms .global loop loop: push yl push yh call millis ; call millis(): 4-byte return value in r25...r22 // Use Y as a pointer to fetch the next time to switch the LED ldi yl, lo8(nextSwitchAfterMillis) ldi yh, hi8(nextSwitchAfterMillis) ld r18, y+ ld r19, y+ ld r20, y+ ld r21, y+ ld r17, y // ledStatus comes immediately after lastMillis, so we can use y // Compare nextSwitchAfterMillis with value returned by millis() sub r18, r22 sbc r19, r23 sbc r20, r24 sbc r21, r25 brcc tooEarly ; carry is set if r18...r21(nextSwitchAfterMillis) < r22...r25(millis()) // Toggle LED state: 0 -> 1, 1 -> 0 inc r17 andi r17, 1 // Store ledStatus for next time. y still points at its memory location st y, r17 // set LED state brne turnoff cbi _SFR_IO_ADDR(PORTB), PORTB5; Bit 5 = pin 13 rjmp ledSwitched turnoff: sbi _SFR_IO_ADDR(PORTB), PORTB5; Bit 5 = pin 13 ledSwitched: // Add long delay; to result of call to millis() ldi r17, lo8(delay) add r22, r17 ldi r17, hi8(delay) adc r23, r17 ldi r17, hlo8(delay) adc r24, r17 ldi r17, hhi8(delay) adc r25, r17 // Store this as the next point in time when we need to toggle the LED st -y, r25 st -y, r24 st -y, r23 st -y, r22 tooEarly: pop yh pop yl ret .data nextSwitchAfterMillis: .long 0 ledStatus: .byte 0
A few tidbits:
- I changed the sbi 4,5 and similar to something like sbi _SFR_IO_ADDR(DDRB), DDB5 using predefined symbols that are defined in the “avr/io.h” include file, so the assembler code better expresses what it does. Underneath, it's still exactly the same - so there is no cost to doing this, but the code becomes more self-explanatory.
- I defined the setup() and loop() functions in assembler instead of in C. The Arduino 'wrapper' that is automatically compiled and linked in together with my code defines both setup() and loop() as 'extern “C”' routines, so the Arduino 'runtime' will find these routines, even though they're defined in assembler instead of C.
- I am calling millis() from assembler. This routine returns a 4-byte long; the assembler routine uses this long for comparison and for calculations. The millis() routine uses r25..r22 to return the long value, which are the standard AVR calling conventions.
- I am using the Y register (composed of r29..r28) as a 'pointer' into memory, using post-increment and pre-decrement to access a sequence of 5 bytes. 4 bytes are used for the millis value when the LED will be toggled, and another byte to contain the LED's current state in bit 0.
- I learned the hard way you need to save and restore the contents of r29..r28 when you clobber them. Hence the push… and pop… of yh/yl at the start and end of the loop routine.
- This version of the routine takes 576 bytes, so it only saved 4 extra bytes from the previous 'hybrid C++/asm' version.
- I also tried compiling a sketch with an empty loop() and setup() (both composed of just a 'ret' assembler instruction). Such a sketch takes 466 bytes. The two 'ret' instructions are 4 bytes, so the Arduino minimal 'runtime' weighs in at 462 bytes. That means that my last BlinkWithoutDelay needs about 576 - 466 = 110 bytes.
Interesting links: