13. SLS Lecture 13 : Program Anatomy IV: The Tree of Bytes and Data Structures#

  • create a directory mkdir datastructs; cd datastructs

  • copy examples

  • add a Makefile to automate assembling and linking

    • we are going run the commands by hand this time to highlight the details

  • add our setup.gdb and tree.gdb to make working in gdb easier

  • normally you would want to track everything in git

13.1. Overview#

REMEMBER A DATA STRUCTURE ENDS UP AS …

  • Bytes in memory

    • At particular locations – Memory Address

    • taking up some number of contiguous bytes

  • Intrinsic types : groups of bytes that the processor has built in “interpretations” for

    • bit vectors of various lengths (1,2,4,8,16,…)

    • signed and unsigned integers (1,2,4,8,16,…)

    • floating point number (IEEE 754)

  • Complex data structures often are broken into pieces

    • each piece connected to others

    • connections formed via Addresses

    • pieces record the address of the pieces it connects to

13.2. Worth carefully examining the assembly for this example#

Understanding this code is a great test of your knowledge

CODE: asm - tree.S

	.intel_syntax noprefix
        # TREE NODE STRUCTURE IN MEMORY
	# A NODE is composed of 3, 8 byte values
	# First 8 bytes is an 8 byte signed value
	# Second 8 bytes is a pointer to the left child (another node)
	# Third 8 bytes is a pointer to the right child (another node)
	# a 0 value pointer means there are no nodes in that direction

	#  node.VAL
	#  node.LEFT
	#  node.RIGHT
	.equ VAL, 0       # tree node value 8 bytes (offset 0)
	.equ LEFT, 8      # pointer to left child 8 bytes (offset 8)
	.equ RIGHT, 16    # pointer to right child 8 bytes (offset 16)

	.section .text
	.global _start
_start:
	mov rax, QWORD PTR [ROOT]  # pointer to root node is in a memory location ROOT
loop:
	mov rbx, QWORD PTR [rax + LEFT]  # rbx = left child location
	mov rdx, QWORD PTR [rax + RIGHT] # rdx = right child location
	cmp QWORD PTR [rax + VAL], 0     # compare node's value to zero
	cmovl  rax, rbx                  # if val < 0 then rax = rbx -- left child
	cmovge rax, rdx                  # if val >= 0 then rax = rdx -- right child
	cmp rax, 0                       # if location of next node is 0 we are done
	jne loop                         # otherwise keep walking the tree

	int3

	.section .data
ROOT:
	.quad N0               # ROOT global variable stores address of
	# N0 node &N0 (eg points to N0)

	# A BUNCH OF NODES CONNECTED TO FORM A TREE LIKE STRUCTURE
N0:
	.quad 8                # N0.VAL   = 8
	.quad N1               # N0.LEFT  = &N1
	.quad N2               # N0.RIGHT = &N2

N1:
	.quad 0                # N1.VAL   = 0
	.quad N3               # N1.LEFT  = &N3
	.quad N4               # N1.RIGHT = &N4

N2:
	.quad -5               # N2.VAL   = -5
	.quad N5               # N2.LEFT  = &N5
	.quad N6               # N2.RIGHT = &N6

N3:
	.quad 6                # N3.VAL   = 6
	.quad 0                # N3.LEFT  = 0
	.quad 0                # N3.RIGHT = 0

N4:
	.quad 90               # N4.VAL   = 90
	.quad N7               # N4.LEFT  = &N7
	.quad 0                # N4.RIGHT = 0

N5:
	.quad -3               # N5.VAL   = -3
	.quad N7               # N5.LEFT  = &N7
	.quad 0                # N5.RIGHT = 0

N6:
	.quad 567              # N6.VAL   = 567
	.quad 0                # N6.LEFT  = 0
	.quad 0                # N6.RIGHT = 0

N7:
	.quad -8               # N7.VAL   = -8
	.quad 0                # N7.LEFT  = 0
	.quad 0                # N7.RIGHT = 0
	

13.2.2. Exploring the tree with gdb#

set pagination off
set disassembly-flavor intel
x/1gx &ROOT
x/16xb 0x402008
x/8xh 0x402008
x/4xw 0x402008
x/2xg 0x402008

x/9i _start

b _start
run
display /x $rax
display /1dg $rax + 0
display /2gx $rax + 8

b loop
c
c
c
c
c




13.3. A more complex example#

This example should help to get your creative juices flowing and get a deeper appreciation for how we use the computer to write the kind of code you are used to.

Note I have not tested this much. I encourage you to try the exercises and test the code out.

13.3.1. The Story: An Array of Players#

  1. Lets assume in our program we have an array of “Players”

  2. Our program will have routines that work on the array and on individual players

  3. In our example we will layout a static version of the array with a few players

Remember to draw things out to ensure you are understanding things

13.3.1.1. A Player#

  • Lets use a chuck of memory to represent a player

  • Each player has:

    1. ID: A binary value that can fit in 8 bytes to uniquely identify a player

    2. Name: A “string” : An array of ascii characters with 0 to mark the end of the string

      • maximum length of the string array is 80

    3. Score: A four byte signed integer value

    4. Age: A single byte unsigned integer value

13.3.1.2. The Array of Players#

  • Lets assume there is one global Array for the players

    • One symbol PLAYER_ARRAY should mark the beginning of the player Array

    • One symbol PLAYER_ARRAY_END should mark the end of the player Array

13.3.2. Our “main” program#

  • The following is our main program that has the “entry point”

    • in this case it will simply call our find_player

  • It also lays out the memory for the static global player array

    • It initializes the players with some hard code players

  • When done exits passing the return value for find player as the process exit code

CODE: asm - playertest.S

	.intel_syntax noprefix
	.section .text

	.global _start
_start:
	mov rdi, OFFSET PLAYER_ARRAY
	mov rsi, (PLAYER_ARRAY_END - PLAYER_ARRAY)/93   #ugly magic number
	
	call find_player

	mov rdi, rax
	mov rax, 60
	syscall 

        .data
PLAYER_ARRAY:
	.quad   7            # id
	.string "The Doctor" # name
	.zero   80 - 11      # fill rest of name array with 0
	.int    42           # score 
	.byte   255          # age

	.quad   37           # id
	.string "Bugs Bunny" # name
	.zero   80 - 11      # fill rest of name array with 0
	.int    -4           # score 
	.byte   9            # age
PLAYER_ARRAY_END:	


13.3.3. find_player#

This routine searches an Array of Players:

  • starting from the beginning of array

  • find the first player with capital ‘B’ in their name

  • Either returns the index of the found player or -1

Arguments: Address of the Array and length of the Array

CODE: asm - findplayer.S

	.intel_syntax noprefix
	# EXAMPLE ASSEMBLY CODE OF SOMETHING A LITTE MORE REALISTIC
        #   NOTE THIS IS BY NO MEANS MEANT TO BE THE MOST EFFICIENT
	#   OR ADVANCE WAY OF WRITING THIS CODE.  RATHER IT IS MEANT
	#   TO BE SIMPLE AND HOPEFULLY CORRECT
	
	# Player Structure
	#   id   : 8 byte id
	#   name : 80 byte ascii encoded name
	#   score: 4  byte score
	#   age  : 1  byte age
	#  total number of bytes for a player is 8 + 80 + 4 + 1 = 93
        .equ PLAYER_STRUCT_SIZE, 93   # size of player structure in bytes
	
	# offsets to start of each field
	.equ PLAYER_ID_OFFSET,0       # offset 8 byte unsigned id 
	.equ PLAYER_NAME_OFFSET,8     # offset 80 byte ascii name
	.equ PLAYER_SCORE_OFFSET, 88  # offset 4 byte score
	.equ PLAYER_AGE_OFFSET, 92    # offset 1 Byte unsigned age

	
	#  Routine to search an array of player structures
	#  to find first player who's name contains a 'B'
	#  We assume the location of the array is passed in %rdi
	#  and %rsi contains the length of the array.
	#  Each element of the array is a player structure
	#  When done the index of the first player found that
	#  has a B in its name shoud be left in %rax.
	#  if not found then %rax should contain -1

	# INPUTS
	# rdi -> array : address of player array
	# rsi -> len   : length of player array
	# OUTPUTS
	# rax -> i : index of player with B in name or -1 if none found

	# REGISTER USED AS TEMPORARIES
	# rdx -> player_ptr : pointer to the ith player structure
	# r8  -> j        : temporary integer used to seach name
	# r9b -> tmpc     : temporary byte used to hold the jth charater of the current
        #                 : player's name
	.global find_player
find_player:
	xor rax, rax                 #  i = 0 

	jmp find_player_loop_condition     
find_player_loop:
	mov  rdx, rax                  # player_ptr = i    
	imul rdx, PLAYER_STRUCT_SIZE   # player_ptr = i * size of player stucture
        add  rdx, rdi                  # player_ptr += array starting address
	                               # rdx now holds the address of the ith player
	xor  r8, r8                    # j=0
name_search_loop:
	# tmpc = player->name[j]
	mov  r9b, BYTE PTR [rdx +  r8 + PLAYER_NAME_OFFSET]
	
        cmp r9b, 'B'                 # compare tmpc to 'B'
	je  find_player_done         # found a 'B' in the ith player name
	cmp r9b, 0                   # is the current character 0 if so end of name
        je  name_search_loop_end     # done searching this player's name exit name loop
	inc r8                     # j++
	jmp name_search_loop          # goto top of name search loop to examine next byte in name 
name_search_loop_end:
	
	inc rax                     # i++
find_player_loop_condition:	
	cmp rax,  rsi                # if i < len
	jl find_player_loop

find_player_notfound:
	mov rax, -1
	
find_player_done:	
	ret

13.3.5. Exercises#

  1. Modify the find routine to taking the search character as a parameter

  2. Add more players

  3. Write a routine to update a player’s score

  4. Replace the Array with a list

    • convert static array with static list (see tree example for inspiration)

    • rewrite find_player to search a list