User Tools

Site Tools


smallcpus

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
smallcpus [2012/12/09 18:23]
kris
smallcpus [2015/10/21 19:05] (current)
kris
Line 4: Line 4:
  
 ===== Enabling Assembler Programming on Arduino/​Uno32 ===== ===== Enabling Assembler Programming on Arduino/​Uno32 =====
 +
 +**More recent versions of the Arduino IDE recognize .S files, and no changes are needed. I've tested this with version 1.6.5. If you're using a recent IDE, you can skip ahead to 'The Blink Example In Assembler'​**
 +
 +**Make sure to also check '​Feedback from Readers'​ at the end: it contains some important info for newer versions of the IDE.**
  
 It is hard to get an in-depth understanding of Arduino and the various CPUs involved. The standard IDE involves a lot of '​magic'​ and there is not much help when you want to get a better understanding of how the magic works. It is hard to get an in-depth understanding of Arduino and the various CPUs involved. The standard IDE involves a lot of '​magic'​ and there is not much help when you want to get a better understanding of how the magic works.
Line 379: Line 383:
 http://​www.mindkits.co.nz/​tutorials http://​www.mindkits.co.nz/​tutorials
  
-as I ordered their ready-made tutorial stuff. I haven'​t progressed very far yet, as I keep getting sidetracked on all kinds of interesting ​things.+as I ordered their ready-made tutorial stuff. I haven'​t progressed very far yet, as I keep getting sidetracked on all kinds of interesting ​thing, like assembler-programming tricks.
  
-My next subject is part of Tutorial#0, where you wire up 8 LEDs to pins 2-9 of the Arduino and then make the LEDs light up in sequence, from left to right, then back from right to left, and so on.+My next subject is an assembler version for part of Tutorial#0, where you wire up 8 LEDs to pins 2-9 of the Arduino and then make the LEDs light up in sequence, from left to right, then back from right to left, and so on.
  
 Look for Exercise 0.1 on  Look for Exercise 0.1 on 
Line 392: Line 396:
  
 <​code>​ <​code>​
-/* 
-  Blink 
-  Turns on an LED on for one second, then off for one second, repeatedly. 
-  
-  This example code is in the public domain. 
- */ 
-  
-// Pin 13 has an LED connected on most Arduino boards. 
-// give it a name: 
 #define ledFrom ​ 2 #define ledFrom ​ 2
 #define ledTo 9 #define ledTo 9
Line 442: Line 437:
 </​code>​ </​code>​
  
-Then I set out to do as assembler-only version. In this version, the .ino file remains empty, and all you do is add a file '​knightrider.S'​ to the sketch on a second tab. You need to use the patched IDE for this - standard IDEs won't recognize the .S file.+Then I wrote assembler-only version. In this version, the .ino file remains empty, and all you do is add a file '​knightrider.S'​ to the sketch on a second tab. You need to use the patched IDE for this - standard IDEs won't recognize the .S file. As you'll see further, this first attempt is far from optimal.
  
 <​code>​ <​code>​
Line 572: Line 567:
 This one weighs in at 630 bytes. ​ This one weighs in at 630 bytes. ​
  
-The trick I used is to have two 8-bit values, where one is being rotated right-to-left,​ and the other is being rotated left-to-right. The ROL and ROR instructions rotate through the carry flag - i.e. whatever bit 'falls out' of the register being rotated falls into the carry flag, and the original contents of the previous carry flag are rotated '​into'​ the register. ​ROL and ROR are effectively 9-bit rotations, where the Carry flag is the 9th bit.+The trick I used is to have two 8-bit values, where one is being rotated right-to-left,​ and the other is being rotated left-to-right. The //​rol// ​and //​ror// ​instructions rotate through the carry flag.  
 + 
 +I.e. whatever bit 'falls out' of the register being rotated falls into the carry flag, and the original contents of the previous carry flag are rotated '​into'​ the register. ​//​rol// ​and //​ror// ​are effectively 9-bit rotations, where the Carry flag is the 9th bit.
  
 The two 8-bit values are kept in r17 and r16; looking at the two 8-bit values you'd see successive states like shown below, because of the //rol r17// and //ror r16// instructions:​ The two 8-bit values are kept in r17 and r16; looking at the two 8-bit values you'd see successive states like shown below, because of the //rol r17// and //ror r16// instructions:​
Line 613: Line 610:
 r17:​00000000 r17:​00000000
 r16:​01000000 r16:​01000000
 +
 +Step 11:
 +
 +r17:​00000000
 +r16:​00100000
  
 ... ...
Line 621: Line 623:
 r16:​00000001 r16:​00000001
  
-Step 17: The 1 bit rotates '​out'​ of r16 into the carry, and disappears from view. However, before that happens, the //sbrc r16,0// instruction tests for the '​1'​ bit being in the bit-position 0 of r16, and if it is, executes an //ori r17, 1// - effectively re-instating the bit into r17, ready for another up-down round.+Step 17: The 1 bit rotates '​out'​ of r16 into the carry, and disappears from view. However, before 
 +that happens, the sbrc r16,0 instruction tests for the '​1'​ bit being in the bit-position 0 of r16, 
 +and if it is, executes an ori r17, 1 - effectively re-instating the bit into r17, ready for another 
 +up-down round.
  
 r17:​00000001 r17:​00000001
Line 627: Line 632:
 </​code>​ </​code>​
  
-So, we have this single bit, marching endlessly around the two 8-bit values. Round and round it goes.+So, we now have this single bit, marching endlessly around the two 8-bit values. Round and round it goes.
  
-The second trick is to then make the '​logical or' of these two values. That gives us a single-byte result where the bit seems to go back and forth all the time. That's the value we use to drive our LEDs.+The second trick is to then make the '​logical or' of these two values. That gives us a single-byte ​end-result where the bit seems to go back and forth all the time. That's the value we will use to drive our LEDs.
  
-Before we can use this 8-bit value, we need to do some more shifting, because the LED's are driven ​by two different ports: 6 LEDs are driven by bit 2-7 of PORTD, and 2 LED are driven by bit 0-1 of PORTB. That's what all the //lsl// and //lsr// stuff is about.+Before we can use this 8-bit value, we need to do some more shifting, because the LED's are driven ​from two different ports: 6 LEDs are driven by bit 2-7 of PORTD, and 2 LED are driven by bit 0-1 of PORTB. That's what all the //lsl// and //lsr// stuff is about.
  
-The rest of the code is very similar to our previous BlinkWithoutDelay as far as calculating millis and so on.+The rest of the code is very similar to our previous BlinkWithoutDelay as far as calculating millis and so on - so this version returns properly from the '​loop'​ routine each time it is called; it does not get '​stuck'​.
  
 Now, I thought I should be able to do better, so I rewired the LEDs in such a fashion that I could drop those extra shifts. Now, I thought I should be able to do better, so I rewired the LEDs in such a fashion that I could drop those extra shifts.
  
 <​code>​ <​code>​
-Led 1 = pin 8 +Led 1 = pin 8 PORTB bit 0 
-Led 2 = pin 9 +Led 2 = pin 9 PORTB bit 1 
-Led 3 = pin 2 +-- 
-Led 4 = pin 3 +Led 3 = pin 2 PORTD bit 
-Led 5 = pin 4 +Led 4 = pin 3 PORTD bit 
-Led 6 = pin 5 +Led 5 = pin 4 PORTD bit 
-Led 7 = pin 6 +Led 6 = pin 5 PORTD bit 
-Led 8 = pin 7+Led 7 = pin 6 PORTD bit 
 +Led 8 = pin 7 PORTD bit 7
 </​code>​ </​code>​
  
-This way, I did not need to shift the bits around any more - the bottommost two bits of PORTB are driven by bits 0 and 1 of my calculated value, and the topmost 6 bits of PORTD can be driven by bits 2-7, without needing any additional shifts after the rol/ror trick.+This way, I do not need to shift the bits around any more - the bottommost two bits of PORTB are driven by bits 0 and 1 of my calculated value, and the topmost 6 bits of PORTD can be driven by bits 2-7, without needing any additional shifts after the rol/ror trick.
  
-Now, I want to get a '​feel'​ for how much it costs to use C instead of assembly, but in order to have a fair comparison, I decided to rewire my board, then try writing a C version first, using direct port manipulation instead of directWrite. directWrite is a nice abstraction,​ but there is quite a memory and timing-overhead attached to it.+One of my goals is to get a '​feel'​ for how much it costs to use C instead of assembly. In order to have a useful ​comparison, I decided to rewire my board, then try writing a C version first, ​this time using direct port manipulation instead of //directWrite// 
 + 
 +//directWrite// is a nice abstraction,​ but there is quite a memory and timing-overhead attached to it, and direct port manipulation in C generates much more compact code.
  
 So here is my C version for the rewired setup, no assembler involved. So here is my C version for the rewired setup, no assembler involved.
Line 659: Line 667:
 void setup() void setup()
 { {
-  DDRD = DDRD | 0xFC; +  DDRD = DDRD | 0xFC; // Top 6 bits as outputs 
-  DDRB = DDRB | 0x03;+  DDRB = DDRB | 0x03; // Bottom 2 bits as outputs
 } }
  
 #define delay 50 #define delay 50
 +
 long nextSwitchAfterMillis;​ long nextSwitchAfterMillis;​
 byte shiftUp = 0x01; byte shiftUp = 0x01;
Line 684: Line 693:
 </​code> ​ </​code> ​
  
-This one compiles to 610 bytes. That surprised me - that was way smaller than I expected!+This one compiles to 610 bytes, and that surprised me - that is way smaller than I expected! 
 + 
 +I then did some digging, and checked out the assembly language output of the C-compiler, and as a result, learned a few new tricks.
  
-I then did some digging, and checked out the assembly language output of the C-compiler, ​and learned a few new tricks.+First trick: ​the AVR instruction set has no '​ADCI'​ or '​ADI'​ instruction. My solution was to load the value into a register ​and then use ADC or ADD
  
-First trick: the AVR instruction set has no '​ADCI'​ or '​ADI'​ instruction,​ and my cumbersome solution was to load the value into a register and then use ADC or ADD. The C-compiler ​had a much better trick up it's sleeve: the AVR instruction set does have a SBCI and SBI instruction,​ so instead of adding a constant value, ​they simply ​subtract ​the negative of the value. Saves a few bytes.+The C-compiler ​has a much better trick up it's sleeve: the AVR instruction set //does// have a SBCI and SBI instruction,​ so instead of adding a constant value, ​it simply ​subtracts ​the negative of the value, and no intermediate register is needed to hold the immediate values.
  
-Second trick: as I am only starting my journey into AVR-land, ​missed ​the IN and OUT instructions,​ and instead was addressing DDRB, DDRD, PORTB, PORTD as memory locations. The C compiler uses IN and OUT, which again saves a few bytes.+also completely overlooked ​the IN and OUT instructions,​ and instead was using memory ​addressing ​to access ​DDRB, DDRD, PORTB, PORTD as memory locations. The C compiler ​instead ​uses IN and OUT, which again saved a few bytes.
  
-Then, while I was looking over the instruction set I noticed a few more instructions that allowed me to save some more bytes: STS and LDS, which allow addressing via the Y register but with an offset applied.+While I was looking over the instruction set I also noticed a few more instructions that allowed me to save some more bytes: ​I found STS and LDS, which allow addressing via the Y register but with an offset applied.
  
 So, I finally came up with this: So, I finally came up with this:
Line 712: Line 723:
 // Led 8 = pin 7 // Led 8 = pin 7
 // //
 +
 #define yl r28 #define yl r28
 #define yh r29 #define yh r29
  
-// const long delay = 1000;+// const long delay = 100;
 #define delay 100 // ms #define delay 100 // ms
  
Line 759: Line 771:
  
   // Rotate left & right; bit travels up then down through r17 and r16   // Rotate left & right; bit travels up then down through r17 and r16
-  // Carry is already clear, no need for CLC+  // Carry is already clear because brcs was not takenso no need for CLC
   rol r17   rol r17
   sbrc r16,0   sbrc r16,0
Line 765: Line 777:
   ror r16   ror r16
  
 +  // Update the memory storage with the new rotated values
   std y+4, r17   std y+4, r17
   std y+5, r16   std y+5, r16
  
   // Combine left and right shifter   // Combine left and right shifter
-  or r17,r16+  or r17, r16
   ​   ​
   // 6 high bits of port D   // 6 high bits of port D
Line 780: Line 793:
   out _SFR_IO_ADDR(PORTB),​ r17   out _SFR_IO_ADDR(PORTB),​ r17
   ​   ​
-  // Add delay (subtract negative delay because there is no addi)+  // Add delay (subtract negative delay because there is no addi/adci)
   // to result of call to millis()   // to result of call to millis()
   subi r22, lo8(-delay)   subi r22, lo8(-delay)
Line 808: Line 821:
 </​code>​ </​code>​
  
-This last version is 590 bytes - 20 bytes less than the C compiler. Conclusion: the C compiler is doing a pretty good job, and the extra effort of writing things in assembler is probably rarely worth it. Subtracting the 462 bytes for the runtime, we're looking at saving a little over 10% in size.+This last version is 590 bytes - 20 bytes less than the C compiler. ​ 
 + 
 +Conclusion: the C compiler is doing a pretty good job, and the extra effort of writing things in assembler is probably rarely worth it. Subtracting the 462 bytes for the runtime ​overhead from both compiled sketch sizes, we're looking at saving a little over 10% in size for this particular exercise
  
 Nevertheless,​ sometimes 20 bytes can be the difference between 'it fits' and 'it does not fit', and, more importantly,​ I love tinkering, so I'll do some more assembler, just because I can! Nevertheless,​ sometimes 20 bytes can be the difference between 'it fits' and 'it does not fit', and, more importantly,​ I love tinkering, so I'll do some more assembler, just because I can!
 +
 +Now, just for kicks, a tiny change - change the last listing so it ends as follows:
 +
 +<​code>​
 +...
 +shifters:
 +.byte 1
 +.byte 0x80
 +</​code>​
 +
 +Run it again. Is that cool, or what?
 +
 +===== Feedback from readers =====
 +
 +Hi Kris!
 +
 +Thank you very much for your Arduino asm introduction. I helped me a lot getting started.
 +
 +But then I tried to modify version 1.5.2 the same way, because according to the developer of the arduino eclipse plugin, this is the latest version compatible with the plugin. And I found out that you need to modify the file .../​hardware/​arduino/​avr/​platform.txt,​ too. So I thought, maybe you want to mention this in your introduction.
 +
 +in the "AVR compile patterns"​ section, adding the following solved the issue:
 +
 +## Compile S files
 +recipe.S.o.pattern="​{compiler.path}{compiler.c.cmd}"​ {compiler.S.flags} -mmcu={build.mcu} -DF_CPU={build.f_cpu} -D{software}={runtime.ide.version} {build.extra_flags} {includes} "​{source_file}"​ -o "​{object_file}"​
 +
 +Regards,
 +
 +Ralf
  
 ===== Stuff collected on the Internet ===== ===== Stuff collected on the Internet =====
smallcpus.1355030625.txt.gz · Last modified: 2012/12/09 18:23 by kris