Information Representation - Native Types
Contents
%run -i ../python/common.py
# setup for examples
setupExamples("inforep", "../src/Makefile ../src/empty256.s ../src/oneplusone.s ../src/inforep.gdb ../src/setup.gdb")
18. Information Representation - Native Types#
In this chapter we will study the support a typical modern processor has for working with information. It is important to remember that the CPU and memory can only store binary vectors. The data types are just a collection of operations that interpret byte values in a particular way so that we can use the computer to represent and work with information that we care about.
Unfortunately the data types do not always capture all the properties that the human oriented information we would like to work with perfectly. For example despite the popular belief that computers are good at “math” they really are not! We will learn that the simple idea of representing numbers in a fixed number of bits can lead to the computer doing strange things (situations in which \(x + 1\) results in \(0\)). However, if we know the limits of what the computer can do we can write software that stays within its capabilities or compensates for the limitations. For example we can either modify our problem to avoid the cases where \(x+1\) does not produce the right sum or we can write code that detects this case and uses more operations to get the right result.
A strong knowledge of the CPU’s built in interpretations, data types, will not only allow us to avoid problems, when the data type do not perfectly match what we want to do but it will also let us creatively construct our own interpretations in software that can be very efficient. For example instead of using many integers to represent a set of sixty four boolean attributes we can pack them into a single binary vector and with one instruction determine the following: if the set is empty, if a particular member or combination of members are present, set the value of one or more attributes, and others.
The following are the common native types that we will cover.
Raw binary bits and bytes
Unsigned integers
Signed integers
Floating point numbers
For each we will present how the data is represented in binary vectors of various lengths, the operations typically provide, common patterns for using the operations, and pitfalls to be aware of. Understanding that in the end the data types and how they relate at the binary level lets us us mix the use of the interpretations creatively in our code. For example lets say that you have two numbers in registers \(R_3\) and \(R_4\) and you need to exchange them but you don’t have any free registers. Knowing their binary representation allows you to do the following “trick” using the binary operation Exclusive OR (\(\widehat{}\)), widely supported as a single instruction on CPUs, as follows sequence of operations: \(R_3 \leftarrow R_3 \widehat{} R_4\), \(R_4 \leftarrow R_4 \widehat{} R_3\) and \(R_3 \leftarrow R_3 \widehat{} R_4\) (“Exchanging Registers” in Henry S. Warren, Jr’s book “Hacker’s Delight”). Such tricks are at the heart of how we have gotten computers to do magical things.
In addition the looking at these representations from a generic notation perspective we will also introduce and use GDB syntax and assembly code fragments that let us explore them and solidify our knowledge of how they work. Further in this chapter we will be making extensive use of the notion introduced in the prior chapter on the Information Representation - Preliminaries. We will liberally be switching between using binary and hex depending on which is more informative for the particular example being discussed.
18.1. Raw binary bits and bytes#
Generic Register exchange using exclusive or
Useing GDB to explore register exchange example
p /t { $al, $bl }
set $al = 0b11110000
set $bl = 0b00001111
p /t { $al, $bl }
set $al = $al ^ $bl
p /t { $al, $bl }
set $bl = $al ^ $bl
p /t { $al, $bl }
p /t { $al, $bl }
set $al = $al ^ $bl
p /t { $al, $bl }
Equivalent INTEL assembly fragment
mov al, 0b11110000
mov bl, 0b00001111
xor al, bl
xor bl, al
xor al, bl
Arm equivalent
movz R0, #0b11110000
movz R1, #0b00001111
eor R0, R0, R1
eor R1, R1, R0
eor R0, R0, R1