SLS Lecture 16 : Caches : Constructing them from first principles
Contents
%run -i ../python/common.py
UC_SKIPTERMS=True
%run -i ../python/ln_preamble.py
16. SLS Lecture 16 : Caches : Constructing them from first principles#
# setup for sumit examples
appdir=os.getenv('HOME')
appdir=appdir + "/caches"
#print(movdir)
output=runTermCmd("[[ -d " + appdir + " ]] && rm -rf "+ appdir +
";mkdir " + appdir +
";cp ../src/SOL6502/Makefile ../src/SOL6502/sum.s ../src/SOL6502/sum.txt ../src/SOL6502/SOL6502.cfg " + appdir)
display(Markdown('''
- create a directory `mkdir cache; cd cache`
- copy examples
- add a `Makefile` to automate assembling and linking
- we are going run the commands by hand this time to highlight the details
'''))
TermShellCmd("ls " + appdir)
create a directory
mkdir cache; cd cache
copy examples
add a
Makefile
to automate assembling and linkingwe are going run the commands by hand this time to highlight the details
16.1. Remember this picture#
16.1.1. What is the physical Reality#
16.2. Lets build our Computer#
16.3. A nice Simple CPU - Only 6 registers and no MMU#
16.4. Simple Main Memory#
Physical Memory : \(2^{16} = 65536 \ \text{Bytes} = 64 \ \text{KiloBytes (Kb)} \)
NO Virtual Memory!
16.5. The Code#
Code we have come to love: A 6502 version of sumit!
display(Markdown(FileCodeBox(
file="../src/SOL6502/sum.s",
lang="gas",
title="<b>CODE: 6502 asm - sum.s",
h="100%",
w="107em"
)))
CODE: 6502 asm - sum.s
;; We place things in memory locations by hand
;; Fill memory 0x0000 - 0xE000 with zeros
.repeat $E000
.byte $00
.endrep
;; Put our data at 0xE000
.byte 10 ; 0xE000 Array length
.byte 1 ; Array[0] = 1
.byte 2 ; Array[1] = 2
.byte 3 ; Array[2] = 3
.byte 4 ; Array[3] = 4
.byte 5 ; Array[4] = 5
.byte 6 ; Array[5] = 6
.byte 7 ; Array[6] = 7
.byte 8 ; Array[7] = 8
.byte 9 ; Array[8] = 9
.byte 10 ; Array[9] = 10
;; Fill memory from end of data to 0xF000 with zero
.repeat $1000-11
.byte $00
.endrep
;; Put our code at 0xF000
;; Set address to F000 (this is where our code will live)
.ORG $F000
LDA #0 ; load A register with 0
LDX #0 ; load X register with 0
LOOP:
CPX $E000 ; compare value in X register with value at E000 (length of Array)
BEQ DONE ; if equal then jump to done
ADC $E001,X ; add value in memory at M[0xE001 + X register] : A = A + Array[X]
INX ; X=X+1
JMP LOOP
DONE:
BRK
16.6. Our old friends : Assembler and linker#
ca65 - assembler different syntax but same idea
ld65 - linker but only using for symbol resolution we are taking care of placing things in memory
TermShellCmd("make sum.img", cwd=appdir, prompt='')
display(Markdown(FileCodeBox(
file= appdir + "/sum.o.lst",
lang="gas",
title="<b>CODE: 6502 asm listing file",
h="100%",
w="107em"
)))
CODE: 6502 asm listing file
ca65 V2.18 - Ubuntu 2.18-1
Main file : sum.s
Current file: sum.s
000000r 1 ;; We place things in memory locations by hand
000000r 1 ;; Fill memory 0x0000 - 0xE000 with zeros
000000r 1 00 00 00 00 .repeat $E000
000004r 1 00 00 00 00
000008r 1 00 00 00 00
00E000r 1 .byte $00
00E000r 1 .endrep
00E000r 1
00E000r 1 ;; Put our data at 0xE000
00E000r 1 0A .byte 10 ; 0xE000 Array length
00E001r 1 01 .byte 1 ; Array[0] = 1
00E002r 1 02 .byte 2 ; Array[1] = 2
00E003r 1 03 .byte 3 ; Array[2] = 3
00E004r 1 04 .byte 4 ; Array[3] = 4
00E005r 1 05 .byte 5 ; Array[4] = 5
00E006r 1 06 .byte 6 ; Array[5] = 6
00E007r 1 07 .byte 7 ; Array[6] = 7
00E008r 1 08 .byte 8 ; Array[7] = 8
00E009r 1 09 .byte 9 ; Array[8] = 9
00E00Ar 1 0A .byte 10 ; Array[9] = 10
00E00Br 1
00E00Br 1 ;; Fill memory from end of data to 0xF000 with zero
00E00Br 1 00 00 00 00 .repeat $1000-11
00E00Fr 1 00 00 00 00
00E013r 1 00 00 00 00
00F000r 1 .byte $00
00F000r 1 .endrep
00F000r 1
00F000r 1 ;; Put our code at 0xF000
00F000r 1 ;; Set address to F000 (this is where our code will live)
00F000r 1 .ORG $F000
00F000 1 A9 00 LDA #0 ; load A register with 0
00F002 1 A2 00 LDX #0 ; load X register with 0
00F004 1 LOOP:
00F004 1 EC 00 E0 CPX $E000 ; compare value in X register with value at E000 (length of Array)
00F007 1 F0 07 BEQ DONE ; if equal then jump to done
00F009 1 7D 01 E0 ADC $E001,X ; add value in memory at M[0xE001 + X register] : A = A + Array[X]
00F00C 1 E8 INX ; X=X+1
00F00D 1 4C 04 F0 JMP LOOP
00F010 1 DONE:
00F010 1 00 BRK
00F010 1
16.6.1. The “binary”: A Simple Image file#
The linker produce a simple binary image file that is an exact copy of memory to load
TermShellCmd("od -Ax -t x1 sum.img", cwd=appdir, prompt='$ ')
16.6.2. Ok Now What?#
display(Markdown(FileCodeBox(
file="../src/SOL6502/sum.txt",
lang="gas",
title="<b>CODE: 6502 Loop",
h="100%",
w="107em"
)))
CODE: 6502 Loop
Fetch:
a) Buses : Read Addr = PC -> Value
IR = Value
Decode:
a) lookup IR and tell execute what to do
A9 : LDA IMM
A2 : LDX
EC : CPX ABS
F0 : BEQ PC Rel
7D : ADC ABS
E8 : INX
4C : JMP ABS
00 : BRK
Execute
LDA IMM :
a) Buses : Read Addr = PC +1 -> Value
b) A = Value
c) PC = PC + 2
LDX :
a) Buses : Read Addr = PC +1 -> Value
b) X = Value
c) PC = PC + 2
CPX ABS :
a) Buses : Read Addr = PC + 1 -> Value
b) TempAddr Low Byte = Value
c) Buses : Read Addr = PC + 2 -> Value
d) TempAddr High Byte = Value
e) Buses : Read Addr = TempAddr -> Value
f) Compare : TempVal = X - Value
g) Set P flags : if TempVal == 0 then P.Z = 1 else P.Z = 0
h) PC = PC + 3
BEQ PC Rel :
a) if P.Z == 1 then
b) Buses : Read Addr = PC + 1 -> Value
c) PC = PC + Value
d) else
e) PC = PC + 2
ADC ABS :
a) Buses : Read Addr = PC + 1 -> Value
b) TempAddr Low Byte = Value
c) Buses : Read Addr = PC + 2 -> Value
d) TempAddr High Byte = Value
e) Buses : Read Addr = TempAddr + X -> Value
f) Add : A = A + Value
g) PC = PC + 3
INX :
a) X = X + 1
b) PC = PC + 1
JMP ABS :
a) Buses : Read Addr = PC + 1 -> Value
b) TempAddr Low Byte = Value
c) Buses : Read Addr = PC + 2 -> Value
d) TempAddr High Byte = Value
e) PC = TempAddr
BRK:
a) STOP!!!
16.7. Processor Caches#
Modern CPU’s are very fast and memory is “far away” and relatively slow.
Notice that a lot of what a program is accessing memory
Since memory is “slow” most of our CPU time is spent “IDLE” waiting for memory!
Caches are critical to achieving high performance on a modern CPU