Hello reader!
I’m happy to see you here, and I hope you’ll find something for yourself here. This will be a series of posts about making a brand-new programming language and its compiler. I’ve been doing that for quite a while, so a few milestones have passed. But today, I want to share a new achievement.
Today I managed to compile a function call. And not even that, but the function is recursive.
The test program is simple (in the repo)
package main
func main(a int) (r int) {
return fib(a)
}
func fib(n int) (r int) {
if n <= 1 {
return n
}
return fib(n-1) + fib(n-2)
}
It looks like a Go code. It is not a coincidence. I’m coding the compiler in Go, and I use its stdlib parser instead of my own. My priority, for now, is the compiler backend (ir -> asm). The middle part (ast -> ir) goes next. And the frontend (text -> ast) is the last in the queue.
I’ve managed register allocation before. Which was a hard fight, but I’ll tell you about it next time. The algorithm assigns a limited set of registers to subexpressions calculated in the function.
So now I needed to deal with the fact the function call is an unusual expression. It requires fixed registers assigned to function arguments and their results.
After a few failed ideas I decided to reserve argument registers from assigning to any expression which is alive at the function call instruction. Now I could put arguments and take results from the callee. But that wasn’t the end.
The compiled program didn’t succeed on any input more than 1. The problem was that I passed arguments, but the callee function (the same one) was using the same registers. So we loose the intermediate results after called function returned.
I can’t assign the registers and hope the same function called again will not corrupt them. That is where the Stack comes in. Each function down the recursion call needs its own personal storage.
I’m using the most straightforward approach I imagined I could implement. The first eight registers are dedicated to arguments and returning values. And it wasn’t obvious to me that I may not use them all during the function call since the callee function may call any other and corrupt the 5th argument register event if I passed only two arguments. So I reserved all of them in register allocation.
The rest of the registers are callee saved because I can do that only once in the function prolog and restore in the epilogue. After the register allocation step, before code generation, I find the highest used register and save all from the 8th to the highest to the stack. A similar process is done in the epilogue but in reverse.
Now the function works and makes me so much happy.
This is the assembler text my compiler generates.
.global _fib
.align 4
_fib:
STP FP, LR, [SP, #-16]!
MOV FP, SP
STP X8, X9, [SP, #-16]!
// reg 0 args 1 expr 7
MOV X1, #1 // expr 9
CMP X0, X1 // expr 10
B.LE L.3
B L.2.fix.3.1
L.3:
B L.1
L.2.fix.3.1:
// permutate [[9 0]]
MOV X9, X0
B L.2
L.2:
MOV X0, #1 // expr 16
SUB X0, X9, X0 // expr 17
BL _fib // func call 17 (X0) -> 18 (X8)
MOV X8, X0 // func res 0 fix
MOV X0, #2 // expr 19
SUB X0, X9, X0 // expr 20
BL _fib // func call 20 (X0) -> 21 (X0)
ADD X0, X8, X0 // expr 22
B L.1
L.1:
// reg 0 res 0 expr 24
LDP X8, X9, [SP], #16
LDP FP, LR, [SP], #16
RET
Which I then compile by llvm assembler. And this is what the result looks like.
0000000100003f60 <_fib>:
100003f60: fd 7b bf a9 stp x29, x30, [sp, #-16]!
100003f64: fd 03 00 91 mov x29, sp
100003f68: e8 27 bf a9 stp x8, x9, [sp, #-16]!
100003f6c: 21 00 80 d2 mov x1, #1
100003f70: 1f 00 01 eb cmp x0, x1
100003f74: 4d 00 00 54 b.le 0x100003f7c <_fib+0x1c>
100003f78: 02 00 00 14 b 0x100003f80 <_fib+0x20>
100003f7c: 0c 00 00 14 b 0x100003fac <_fib+0x4c>
100003f80: e9 03 00 aa mov x9, x0
100003f84: 01 00 00 14 b 0x100003f88 <_fib+0x28>
100003f88: 20 00 80 d2 mov x0, #1
100003f8c: 20 01 00 cb sub x0, x9, x0
100003f90: f4 ff ff 97 bl 0x100003f60 <_fib>
100003f94: e8 03 00 aa mov x8, x0
100003f98: 40 00 80 d2 mov x0, #2
100003f9c: 20 01 00 cb sub x0, x9, x0
100003fa0: f0 ff ff 97 bl 0x100003f60 <_fib>
100003fa4: 00 01 00 8b add x0, x8, x0
100003fa8: 01 00 00 14 b 0x100003fac <_fib+0x4c>
100003fac: e8 27 c1 a8 ldp x8, x9, [sp], #16
100003fb0: fd 7b c1 a8 ldp x29, x30, [sp], #16
100003fb4: c0 03 5f d6 ret
It’s funny; it’s not much worse than the clang output of the same program written in C.
0000000100003f5c <_fib>:
100003f5c: f4 4f be a9 stp x20, x19, [sp, #-32]!
100003f60: fd 7b 01 a9 stp x29, x30, [sp, #16]
100003f64: fd 43 00 91 add x29, sp, #16
100003f68: 1f 08 00 71 cmp w0, #2
100003f6c: 6a 00 00 54 b.ge 0x100003f78 <_fib+0x1c>
100003f70: 13 00 80 52 mov w19, #0
100003f74: 0b 00 00 14 b 0x100003fa0 <_fib+0x44>
100003f78: 13 00 80 52 mov w19, #0
100003f7c: f4 03 00 aa mov x20, x0
100003f80: 80 06 00 51 sub w0, w20, #1
100003f84: f6 ff ff 97 bl 0x100003f5c <_fib>
100003f88: e8 03 00 aa mov x8, x0
100003f8c: 80 0a 00 51 sub w0, w20, #2
100003f90: 13 01 13 0b add w19, w8, w19
100003f94: 9f 0e 00 71 cmp w20, #3
100003f98: f4 03 00 aa mov x20, x0
100003f9c: 2c ff ff 54 b.gt 0x100003f80 <_fib+0x24>
100003fa0: 00 00 13 0b add w0, w0, w19
100003fa4: fd 7b 41 a9 ldp x29, x30, [sp, #16]
100003fa8: f4 4f c2 a8 ldp x20, x19, [sp], #32
100003fac: c0 03 5f d6 ret
Clang unwinded the recursive tail call, but it uses the same 32 bytes of the stack, and the code size is only one instruction shorter.