UC-SLS Lecuture 17 : In to the Light - C Intro
Contents
17. UC-SLS Lecuture 17 : In to the Light - C Intro#
create a directory
mkdir cintro; cd cintro
copy examples
add a
Makefile
to automate assembling and linkingwe are going run the commands by hand this time to highlight the details
add our
setup.gdb
to make working in gdb easiernormally you would want to track everything in git
17.1. What are some the downsides of Assembly Programming?#
The human burden of assembly programming and programmer lock in
The code we write is locked to a specific CPU
The code we write is locked to a specific OS
17.2. High level languages#
downsides?
lets look at python “Hello World”
How many instructions for our “hello.s” : < 10 instructions
python -c print("Hello World")
: 10’s, 100’s, 1000’s, 10,000, 100,000’s, 1,000,000’s, > 10,000,000’s ???gdb
display /1i $pc
starti print("Hello World")
while 1
stepi
end
17.3. The ToolChain#
17.4. ToolChain in action: A simple C version of sumit#
CODE: csumit1.c
long long XARRAY[1024];
long long sumit(void)
{
long long i = 0;
long long sum = 0;
for (i=0; i<10; i++) {
sum += XARRAY[i];
}
return sum;
}
CODE: usecsumit1.S
.intel_syntax noprefix
.global _start
_start:
call sumit
mov rdi, 0 # rdi = 0 = exit return value
mov rax, 60 # rax = 60 = exit syscall num
syscall # exit(0)
17.4.1. Run Assembler on usecsumit1.S#
**Use assembler as expected to usecsumit1.S
\(\rightarrow\) usecsumit1.o
17.4.2. Run Compiler on csumit.c#
A New Step - compile for csumit1.c
\(\rightarrow\) csumit1.s
CODE: csumit1.c
long long XARRAY[1024];
long long sumit(void)
{
long long i = 0;
long long sum = 0;
for (i=0; i<10; i++) {
sum += XARRAY[i];
}
return sum;
}
CODE: csumit1.s
.file "csumit1.c"
.intel_syntax noprefix
.text
.globl sumit
.type sumit, @function
sumit:
xor r8d, r8d
xor eax, eax
.L2:
add r8, QWORD PTR XARRAY[0+rax*8]
inc rax
cmp rax, 10
jne .L2
mov rax, r8
ret
.size sumit, .-sumit
.comm XARRAY,8192,32
.ident "GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0"
.section .note.GNU-stack,"",@progbits
17.4.3. Run Assembler on csumit1.s#
Now same old same old for csumit1.s
\(\rightarrow\) csumit1.o
17.4.4. Linking it all together#
Use Linker to create executable usecsumit1.o
\(+\) csumit1.o
\(\rightarrow\) usecsumit1
17.4.5. Run our executable#
17.4.5.1. Exercises#
add io to usecsumit1
read binary input in from stdin into XARRAY
add a sum memory variable
after call to sumit move the result in rax into the sum memory variable
write value of sum memory variable to stdout
17.5. Remember objdump
is another useful tool#
Let’s us work directly with object files both relocatable and executables
Let’s us see what’s inside and where and what the loader is supposed to do
Has similar capabilities to debugger like
gdb
but often easer to use when you just want to look at things and not actually debug/run things.
17.5.1. OF COURSE THE DEBUGGER IS STILL OUR BEST FRIEND#
we can do everything we were doing before
examine memory
list assembly source
disassemble
set break points
But now if we allow the assembler and linker to produce debug info
-g
thenwe can work with C source level
list C source that corresponds to the opcodes
set break points via C source lines
examine C variables with the debugger knowing the correct types
1, 2, 4 or 8 byte types
pointers vs variables
signed versus unsigned
and support for complex heterogeneous types (we will see this later)
Debug
gcc --static -g -nostartfiles -nostdlib csumit1.c usecsumit1.S -o sum
here we let the compiler driver do all the steps for us
it creates the necessary “intermediary” files (.s and .o files) and removes then when done
it runs the “compiler”, “assembler” and “linker” for us : use
-v
flags to sees this happend
in our case since we are not using the C library or “runtime” we suppress their use
rather we want to provide our own
_start
we are just using the compiler to avoid writing all our code in assembly
we can now
list sumit
set breakpoints on C lines
break 8
disassemble sumit
print and examine C variables
p i
p sum
p XARRAY
p XARRAY[0]
ask what the type of a variable is
whatis XARRAY
In other words our use of a particular memory location is now clear and explicit
17.6. Before we can get really get going#
17.6.1. Our particular compiler : GCC#
https://gcc.gnu.org/onlinedocs/gcc/index.html#SEC_Contents
Example of one of the “C” Standards : https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf
is the back bone of the GNU Linux software environment
however it is by no means the only C compiler tool chain
llvm
xlc
Microsoft’s C compiler
and many more
17.6.2. Controlling the compiler with its command line options#
The C compiler is a very sophisticated program and has many options that control the assembly code it creates
we are going to use options so that make it easier to read the code it creates
normally the code it creates does not target human readability
It usually includes a lot of extra directives to provide the debugger and other tools with more information
and by default wants to keep compilation fast
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
17.6.2.1. Optimization level -O
#
by default if no optimization level is given to
gcc
the compiler suite we are usingproduces code that is suited for use and manipulation in the debugger
this code sacrifices efficiency so that it is easy to work at the “C” source level
we will use
-Os
to set the optimization level tos
whichdoes many optimizations but tries to keep the code “small”
17.6.2.2. Turning off features that we don’t need#
We also going turn off some features designed to exploit features of the Intel processor that make the code more secure
-fcf-protection=none
We are going to turn off certain debugging directives
-fno-asynchronous-unwind-tables
We are going to turn off support for dynamic (load/runtime) relocation and linking
-fno-pic
: turn off generation of position independent code-static
: force static linking
17.6.2.3. We are going to force ourselves to write good code making all warning errors#
-Werror
In general you should never have warning in your code - it is a sign that you don’t know what you are doing
17.6.2.4. Explicitly produce assembly files#
To have the compiler driver only preprocess and compile to produce .s files
-S
17.6.2.5. Generate intel syntax assembly code#
To have it generate intel syntax
-masm=intel
17.6.2.6. Eg.#
17.7. Compiler Driver vs Compiler and “Building”#
The term compiler is used in multiple ways
component of the tool chain that translates C into assembly
or whatever language it was built for (eg. Fortran, etc)
The master command of the the tool chain that
a meta command that knowns how to invoke each of the components of the tool chain
passed a default set of options to the components
you can override these
default behavior is to try and run all steps of the toolchain to try and create and executable from the files specified
eg.
gcc my1.c my2.c myasm.S neuralnet.o -o myexe
if any errors occurs it will stop and report them from which ever component failed
it creates all intermediary files as temporaries and removes them once done
eg. it creates
.s
and.o
files as it needs too and deletes them when it is doneyou can override this behavior
-E
only pre-process-S
pre-process and compile to produce.s
-c
pre-process, compile and assemble to produce.o
The most common use is to separated out creating
.o
from linking into an executable eg.Your program is composed from a mixture of source files:
my1.c, my2.c, myasm.S
for these you would separately “compile” each to produce a corresponding .o
gcc -c my1.c -o my1.o
(runs pre-processor, compiler, and assembler)gcc -c my2.c -o my2.o
(runs pre-processor, compiler, and assembler)gcc -c myasm.S -o myasm.o
(runs pre-processor and assembler)
Your program uses an existing object file from a library that someone gave you
neuralnet.o
Now you as a separate step you link all the object files together to produce your binary
gcc my1.o my2.o myasm.o neuralnet.o -o myexe
You would automate all of this with an Makefile so that when you want to “build” your executable
make will
Only run the necessary steps “compile” steps depending on what source files are newer than their corresponding
.o
Then it will re-link all the
.o
, updated ones and existing ones, to produce a new version of the executable
17.8. Another Link: The C Preprocessor#
17.8.1. Preprocessor in action#
CODE: misc.h
#ifndef __MISC_H__
#define __MISC_H__
//#define ENABLE_VERBOSE
//#define ENABLE_TRACE_LOOP
//#define ENABLE_TRACE_MEM
#ifdef ENABLE_VERBOSE
#define VPRINT(fmt, ...) fprintf(stderr, "%s: " fmt, __func__,__VA_ARGS__)
#else
#define VPRINT(...)
#endif
#ifdef ENABLE_TRACE_LOOP
#define TRACE_LOOP(stmt) { stmt; }
#else
#define TRACE_LOOP(stmt)
#endif
#ifdef ENABLE_TRACE_MEM
#define TRACE_MEM(stmt) { stmt; }
#else
#define TRACE_MEM(stmt)
#endif
#define NYI fprintf(stderr, "%s: NYI\n", __func__)
#endif
CODE: loop.c
#include "misc.h"
int fetch(struct machine *m) { NYI; }
int decode(struct machine *m) { NYI; }
int execute(struct machine *m) { NYI; }
int
loop(int count, struct machine *m)
{
int rc = 1;
unsigned int i = 0;
if (count<0) return rc;
while (1) {
TRACE_LOOP(dump_cpu(m));
rc = interrupts(m);
if (rc) rc = fetch(m);
if (rc) rc = decode(m);
if (rc) rc = execute(m);
i++;
if (rc < 0 || (count && i == count))
break;
}
VPRINT("EXITING: count=%d i=%d\n",count,i);
return rc;
}
In some sense the preprocessor lets us “craft” our code into our own creature that can be customized and controlled
We can now change our code by modifying and defining macros
Edit “misc.h”
or use ability to define and undefine macros via gcc command line options
-D name
Predefine name as a macro, with definition 1.
-D name=definition
The contents of definition are tokenized and processed as if they appeared during translation phase three in a ‘#define’ directive. In particular, the definition is truncated by embedded newline characters.
If you are invoking the preprocessor from a shell or shell-like program you may need to use the shell’s quoting syntax to protect characters such as spaces that have a meaning in the shell syntax.
If you wish to define a function-like macro on the command line, write its argument list with surrounding parentheses before the equals sign (if any). Parentheses are meaningful to most shells, so you should quote the option. With sh and csh, -D'name(args…)=definition' works.
-D and -U options are processed in the order they are given on the command line. All -imacros file and -include file options are processed after all -D and -U options.
-U name
Cancel any previous definition of name, either built in or provided with a -D option.