%run -i ../python/common.py
UC_SKIPTERMS=True
%run -i ../python/ln_preamble.py

16. SLS Lecture 16 : Caches : Constructing them from first principles#

# setup for sumit examples
appdir=os.getenv('HOME')
appdir=appdir + "/caches"
#print(movdir)
output=runTermCmd("[[ -d " + appdir + " ]] &&  rm -rf "+ appdir + 
             ";mkdir " + appdir + 
             ";cp ../src/SOL6502/Makefile ../src/SOL6502/sum.s ../src/SOL6502/sum.txt ../src/SOL6502/SOL6502.cfg " + appdir)

display(Markdown('''
- create a directory `mkdir cache; cd cache`
- copy examples
- add a `Makefile` to automate assembling and linking
    - we are going run the commands by hand this time to highlight the details
'''))
TermShellCmd("ls " + appdir)
  • create a directory mkdir cache; cd cache

  • copy examples

  • add a Makefile to automate assembling and linking

    • we are going run the commands by hand this time to highlight the details

16.1. Remember this picture#

../_images/ASSEMBLY-VNA-THECPU.017.png

16.1.1. What is the physical Reality#

../_images/motherboard.png

16.2. Lets build our Computer#

../_images/6502mb.png

16.3. A nice Simple CPU - Only 6 registers and no MMU#

../_images/6502Registers.png

16.4. Simple Main Memory#

  • Physical Memory : \(2^{16} = 65536 \ \text{Bytes} = 64 \ \text{KiloBytes (Kb)} \)

  • NO Virtual Memory!

16.5. The Code#

Code we have come to love: A 6502 version of sumit!

display(Markdown(FileCodeBox(
    file="../src/SOL6502/sum.s", 
    lang="gas", 
    title="<b>CODE: 6502 asm - sum.s",
    h="100%", 
    w="107em"
)))

CODE: 6502 asm - sum.s

	;; We place things in memory locations by hand 
	;; Fill memory 0x0000 - 0xE000 with zeros
	.repeat $E000
	.byte $00
	.endrep

	;; Put our data at 0xE000 
	.byte 10      		; 0xE000 Array length
	.byte 1                 ; Array[0] = 1
	.byte 2			; Array[1] = 2
	.byte 3			; Array[2] = 3
	.byte 4			; Array[3] = 4
	.byte 5			; Array[4] = 5
	.byte 6			; Array[5] = 6
	.byte 7			; Array[6] = 7
	.byte 8			; Array[7] = 8
	.byte 9			; Array[8] = 9
	.byte 10		; Array[9] = 10

	;; Fill memory from end of data to 0xF000 with zero
	.repeat $1000-11
	.byte $00
	.endrep
	
	;; Put our code at 0xF000
	;; Set address to F000 (this is where our code will live)
	.ORG $F000    		
	LDA #0        ; load A register with 0
	LDX #0	      ; load X register with 0
LOOP:
	CPX $E000     ; compare value in X register with value at E000 (length of Array)
	BEQ DONE      ; if equal then jump to done
	ADC $E001,X   ; add value in memory  at M[0xE001 + X register] : A = A + Array[X]
	INX           ; X=X+1
	JMP LOOP      
DONE:
	BRK    

16.6. Our old friends : Assembler and linker#

  • ca65 - assembler different syntax but same idea

  • ld65 - linker but only using for symbol resolution we are taking care of placing things in memory

TermShellCmd("make sum.img", cwd=appdir, prompt='')
display(Markdown(FileCodeBox(
    file= appdir + "/sum.o.lst", 
    lang="gas", 
    title="<b>CODE: 6502 asm listing file",
    h="100%", 
    w="107em"
)))

CODE: 6502 asm listing file

ca65 V2.18 - Ubuntu 2.18-1
Main file   : sum.s
Current file: sum.s

000000r 1               	;; We place things in memory locations by hand
000000r 1               	;; Fill memory 0x0000 - 0xE000 with zeros
000000r 1  00 00 00 00  	.repeat $E000
000004r 1  00 00 00 00  
000008r 1  00 00 00 00  
00E000r 1               	.byte $00
00E000r 1               	.endrep
00E000r 1               
00E000r 1               	;; Put our data at 0xE000
00E000r 1  0A           	.byte 10      		; 0xE000 Array length
00E001r 1  01           	.byte 1                 ; Array[0] = 1
00E002r 1  02           	.byte 2			; Array[1] = 2
00E003r 1  03           	.byte 3			; Array[2] = 3
00E004r 1  04           	.byte 4			; Array[3] = 4
00E005r 1  05           	.byte 5			; Array[4] = 5
00E006r 1  06           	.byte 6			; Array[5] = 6
00E007r 1  07           	.byte 7			; Array[6] = 7
00E008r 1  08           	.byte 8			; Array[7] = 8
00E009r 1  09           	.byte 9			; Array[8] = 9
00E00Ar 1  0A           	.byte 10		; Array[9] = 10
00E00Br 1               
00E00Br 1               	;; Fill memory from end of data to 0xF000 with zero
00E00Br 1  00 00 00 00  	.repeat $1000-11
00E00Fr 1  00 00 00 00  
00E013r 1  00 00 00 00  
00F000r 1               	.byte $00
00F000r 1               	.endrep
00F000r 1               
00F000r 1               	;; Put our code at 0xF000
00F000r 1               	;; Set address to F000 (this is where our code will live)
00F000r 1               	.ORG $F000
00F000  1  A9 00        	LDA #0        ; load A register with 0
00F002  1  A2 00        	LDX #0	      ; load X register with 0
00F004  1               LOOP:
00F004  1  EC 00 E0     	CPX $E000     ; compare value in X register with value at E000 (length of Array)
00F007  1  F0 07        	BEQ DONE      ; if equal then jump to done
00F009  1  7D 01 E0     	ADC $E001,X   ; add value in memory  at M[0xE001 + X register] : A = A + Array[X]
00F00C  1  E8           	INX           ; X=X+1
00F00D  1  4C 04 F0     	JMP LOOP
00F010  1               DONE:
00F010  1  00           	BRK
00F010  1               

16.6.1. The “binary”: A Simple Image file#

The linker produce a simple binary image file that is an exact copy of memory to load

TermShellCmd("od -Ax -t x1  sum.img", cwd=appdir, prompt='$ ')

16.6.2. Ok Now What?#

display(Markdown(FileCodeBox(
    file="../src/SOL6502/sum.txt", 
    lang="gas", 
    title="<b>CODE: 6502 Loop",
    h="100%", 
    w="107em"
)))

CODE: 6502 Loop

Fetch:
a) Buses : Read Addr = PC -> Value
           IR = Value
	   
Decode:
a) lookup IR and tell execute what to do
A9 : LDA IMM    
A2 : LDX        
EC : CPX ABS    
F0 : BEQ PC Rel  
7D : ADC ABS     
E8 : INX         
4C : JMP ABS     
00 : BRK         







Execute
LDA IMM :
a) Buses : Read Addr = PC +1 -> Value	      
b) A = Value				      
c) PC = PC + 2

LDX :                                                      
a) Buses : Read Addr = PC +1 -> Value	      
b) X = Value				      
c) PC = PC + 2				      

CPX ABS :
a) Buses : Read Addr = PC + 1 -> Value	      
b) TempAddr Low Byte = Value		      
c) Buses : Read Addr = PC + 2 -> Value	      
d) TempAddr High Byte = Value		      
e) Buses : Read Addr = TempAddr -> Value      
f) Compare : TempVal = X - Value	      
g) Set P flags : if TempVal == 0 then P.Z = 1 else P.Z = 0  
h) PC = PC + 3				      

BEQ PC Rel :
a) if P.Z == 1 then			      
   b) Buses : Read Addr = PC + 1 -> Value     
   c) PC = PC + Value			      
d) else					      
   e) PC = PC + 2			      
                                              
ADC ABS :                                                   
a) Buses : Read Addr = PC + 1 -> Value	      
b) TempAddr Low Byte = Value		      
c) Buses : Read Addr = PC + 2 -> Value	      
d) TempAddr High Byte = Value		      
e) Buses : Read Addr = TempAddr + X -> Value  
f) Add : A = A + Value			      
g) PC = PC + 3				      

INX :        
a) X = X + 1				      
b) PC = PC + 1				      

JMP ABS :
a) Buses : Read Addr = PC + 1 -> Value	      
b) TempAddr Low Byte = Value		      
c) Buses : Read Addr = PC + 2 -> Value	      
d) TempAddr High Byte = Value		      
e) PC = TempAddr			      
                                              
BRK:                                     
a) STOP!!!                                    

16.7. Processor Caches#

Modern CPU’s are very fast and memory is “far away” and relatively slow.

  • Notice that a lot of what a program is accessing memory

  • Since memory is “slow” most of our CPU time is spent “IDLE” waiting for memory!

  • Caches are critical to achieving high performance on a modern CPU

https://ark.intel.com/content/www/us/en/ark/products/203908/intel-core-i710700e-processor-16m-cache-up-to-4-50-ghz.html