Introduction: Assembly Programming
Contents
12. Introduction: Assembly Programming#
Assembly Programming means writing code in the language that the hardware of the computer understands – or at least very close to it. Most programmers are completely unaware of how the computer really works, as the languages that they program in are not directly supported by the hardware. Rather, most languages, such as JavaScript, Java, Python, Ruby, Rust, C#, C++, and even C must be translated into the native code that the computer does understand – machine code.
12.1. Machine Code#
The hardware of the computer is built out of electrical components that can store and interpret patterns of bits, eg. binary digits. A single bit can be thought of as a switch that can be in two positions “ON” – 1 and “OFF” – 0. A group of eight bits forms a byte. We will say a lot more about bytes a little later. The main thing to observe is that a byte is easily expressed as an 8 digit binary, (base 2), number. Given that there are 8 bits, a byte can take on \(2^8=256\) unique values ranging from \(00000000\) to \(11111111\).
Machine code is a binary code – meaning that binary values are used to encode the operations and values that make up a program. The following table lists twenty of the machine codes of a MOS 6502 Centeral Processing Unit (CPU) which has a very simple machine code.
Binary Value |
6502 Operation |
---|---|
\(00000000\) |
interrupt - impl: Implied i |
\(00000001\) |
or with accumulator - X,ind: Zero Page Indexed Indirect (zp,x) |
\(00000101\) |
or with accumulator - zpg: Zero Page zp |
\(00000110\) |
arithmetic shift left - zpg: Zero Page zp |
\(00001000\) |
push processor status (SR) - impl: Implied i |
\(00001001\) |
or with accumulator - #: Immediate # |
\(00001010\) |
arithmetic shift left - A: Accumulator A |
\(00001101\) |
or with accumulator - abs: Absolute a |
\(00001110\) |
arithmetic shift left - abs: Absolute a |
\(00010000\) |
branch on plus (negative clear) - rel: Program Counter Relative r |
\(00010001\) |
or with accumulator - ind,Y: Zero Page Indirect Indexed with Y (zp),y |
\(00010101\) |
or with accumulator - zpg,X: Zero Page Index with X |
\(00010110\) |
arithmetic shift left - zpg,X: Zero Page Index with X |
\(00011000\) |
clear carry - impl: Implied i |
\(00011001\) |
or with accumulator - abs,Y: Absolute Indexed with Y a,y |
\(00011101\) |
or with accumulator - abs,X: Absolute Indexed with X a,x |
\(00011110\) |
arithmetic shift left - abs,X: Absolute Indexed with X a,x |
\(00100000\) |
jump subroutine - abs: Absolute a |
\(00100001\) |
and (with accumulator) - X,ind: Zero Page Indexed Indirect (zp,x) |
\(00100100\) |
bit test - zpg: Zero Page zp |
The 6502 was designed and first built in 1975, but continues to be used today. It has 151 simple operations - versus the thousands of complex operations supported by a modern Intel X86-64 processor, which is widely used in computers ranging from laptops to supercomputers. The following article discusses how to calculate the number of operations an Intel x86-64 processor supports and why it is hard to do so; “Enumerating x86-64 – It’s Not as Easy as Counting”. Given the simplicity of the 6502 we will often use it to first get our heads around an idea or mechanism, before looking at the same thing on an Intel X86-64 based computer or an ARM Arch64 based computer.
A program written directly in machine code is a sequence of binary values. The following is a small 6502 machine code program that calculates \(1+1\):
00011000, 10101001, 00000001, 01101001, 00000001
To execute this small program you need to place these values in the memory of the computer and properly configure the CPU so that they can be found and executed.
In the von Neumann Architecture chapter we will use the book’s companion 6502 based computer, SOL6502, to load and execute this machine code program by hand.
12.2. Assembly Code#
Assembly code is a slightly more generic human friendly code that we can use to program a computer. Machine code must be expressed purely in numbers, but assembly code uses structured text that can be easily translated by a set of tools into the equivalent machine code. Each group of machine operations of the CPU that does the same function is assigned a human text mnemonic, often referred to as an instruction. For example the following are the 8 different 6502 machine code operations that add two numbers:
Mnemonic |
Binary Value |
6502 Operation |
---|---|---|
ADC |
\(01100001\) |
add with carry - X,ind: Zero Page Indexed Indirect (zp,x) |
ADC |
\(01100101\) |
add with carry - zpg: Zero Page zp |
ADC |
\(01101001\) |
add with carry - #: Immediate # |
ADC |
\(01101101\) |
add with carry - abs: Absolute a |
ADC |
\(01110001\) |
add with carry - ind,Y: Zero Page Indirect Indexed with Y (zp),y |
ADC |
\(01110101\) |
add with carry - zpg,X: Zero Page Index with X |
ADC |
\(01111001\) |
add with carry - abs,Y: Absolute Indexed with Y a,y |
ADC |
\(01111101\) |
add with carry - abs,X: Absolute Indexed with X a,x |
The mnemonic for all of them is ADC
. The thing that distinguishes the different forms of related operations, such as adding two numbers, is where the values to add will come from and where the results will go. These parameters are often called the operands of an instruction. In assembly code we write the program as a sequence of instructions along with syntax that specifies the operands. A tool called the assembler is then used to translate our assembly program into the corresponding machine code. The following is the same 6502 program to add \(1+1\) written in assembly code:
CLC ; Clear the Carry Flag
LDA #1 ; Load the accumulator with the value 1
ADC #1 ; Add the value 1 to the accumulator
Using an editor we could write the above into a file (eg. add.s) and then use a 6502 assembler to translate it into 6502 machine code. eg.
ca65 add.sThis produces a new file named
add.o
that has the machine code version encoded within it. This is called an object file. The object file contains the binary coded values but does not specify where and how exactly they should go in the memory of the computer nor does it describe how to setup the CPU. In some sense the object file is just a fragment of an executable program. We use another tool called a linker to convert and combine object files into a complete exectuable memory “image” that describes the exact location and values that need to be placed in memory and how to configure the CPU to get the program to run. For exampleld65 add.o -C SOL6502.cfg -o add.img
invokes the linker on ouradd.o
with a description of the SOL6502 computer’s memory layout. This produces a fileadd.img
that is an exact copy of what needs to be loaded into the entire memory of the SOL6502 computer.
Unlike our previous machine code version, the assembly language version is written using mnemonics. It also has comments, thus making it at least “readable” by a human. However, unlike other programming languages that you might be familiar with, you likely still cannot REALLY read it (except maybe for the comments). Eg. you probably cannot tell that it a simple program to add two numbers or how and why it works. For that matter, even if you can figure out what it does, how do you get at the result?
To understand assembly language we need to understand what the operations of the CPU are and what they do. To achieve this, we must understand the basic functioning of the CPU and the rest of the hardware that makes up the computer.
12.2.1. Assembly Programming#
As we will see later, the assembly language, understood by the assembler, includes additional syntax for carefully controlling and placing arbitrary values in memory beyond just opcodes. Assembly programming is all about understanding how the computer works and creating an initial image of memory that, when loaded, will get the computer to do what you want.
The jump from machine code to assembly code illustrates an important pattern that we will see applied over and over again – Program Translation. Rather than needing to understand and remember of all the details of machine code, a programmer can simply learn the assembly language and then rely on the assembler to translate it correctly into machine code. The assembler is a program whose job is to do this translation – in some sense it acts like a machine code programmer that we give an assembly code version of our program. In a similar manner, we can design a new programming language and write a translation program written in assembly language that translates the new language into assembly language, and then use the assembler to translate the result into machine code. A programmer that learns our new language need never know about assembly or machine language!