UC-SLS Lecture 20 : Using LibC to access the OS and escape the confines our process
Contents
20. UC-SLS Lecture 20 : Using LibC to access the OS and escape the confines our process#
Preliminaries
libraries
Standard library :
libc.a[.so]
Address Space management:
dynamic memory for data items:
malloc
andfree
more powerfully control using
mmap
andmunmap
I/O
low-level file descriptor based:
open
,read
,write
,close
,getc
,getchar
,gets
,putc
,putchar
,puts
Formatted and Buffered IO:
fopen
,fread
,fwrite
,fclose
,fprintf
/printf
,fscanf
/scanf
,fgetc
,fgets
,fputs
,fputc
,fgetpos
,fsetpos
,fseek
,fflush
,
create a directory
mkdir libc; cd libc
copy examples
add a
Makefile
to automate assembling and linkingwe are going run the commands by hand this time to highlight the details
normally you would want to track everything in git
20.1. Overview#
20.2. Preliminaries#
20.2.1. SDKs#
Developers package and distribute “native” code for an specific computer and OS as collection of documentation, header files, and libraries. We often call this collection a Software Development ToolKit (SDK). The functions and types defined in an SDK are often referred to as an Application Programmer Interface.
20.2.1.1. Documentation#
Developers provide documentation that explains the API in terms of the functions and types that their code provides for your use. Traditionally on UNIX systems this comprises as set of man pages.
man complex
man cexp
Documentation tells us
Tells us how to use the functions, macros and types of a SDK
What header files we must include in our source to call particular functions
What libraries we must include when we link
20.2.1.2. Header files#
As we discussed to generate assembly for a call to a function compiler must have
A declaration for a function
Definitions for all types it requires
and possibly preprocessor macros
The headers files of a library provide these things so that your code can compile with calls to the libraries functions. Remember we use preprocessor #include <file>
to substitute the contents of <file>
into our own source.
20.2.1.3. Libraries#
Libraries are a new kind of file for us. The are “archives” of object files.
Two main types on
Static archive eg. Linux:
libm.a
Dynamic archive eg. Linux:
libm.so
Statically linking requires static library and dynamic linking requires dynamic
20.2.1.3.1. Linker and Libraries#
We can add a library when we link our executable by passing the right parameters to the linker.
eg. -l<name>
will tell linker to use objects from lib<name>.a
or lib<name>.so
if it needs too
Specifically the library file contains a table of contents with all the object files and symbols define in those objects. If your our object files reference a symbol that you do not define the linker will look for it in the table of contents. If there is an object file in the library that defines the symbol it will include the necessary object file as if you had specified as one of your object files.
20.2.1.4. Example#
C: cexp.c
#include <math.h> /* for atan */
#include <stdio.h>
#include <complex.h>
#include <stdlib.h>
int
main(int argc, char **argv)
{
double y=1.0;
if (argc>1) y=atof(argv[1]);
double pi = 4 * atan(y);
double complex z = cexp(I * pi);
printf("%f + %f * i\n", creal(z), cimag(z));
}
How does the preprocessor and linker know where to find things
The compiler driver, in our case gcc
passes parameters to both the preprocessor and linker
-I <dir>
tells preprocessor to look for header files in<dir>
several standard directories are specified by default
eg.
-I/usr/include
We can look at look at these files if we want to see the details
-L <dir>
similarly tells the linker to look for libraries in<dir>
several standard directories are specified by default
eg.
-L/usr/lib/x86_64-linux-gnu
The linker map file lets us see all the .o’s that got linked in and where they came from
ar
is a tool for working with static libraries
see man ar
for details
We can use it to list the table of contents of the .o
in the archive
We can even use it to extract a member (just like the linker does when linking)
And no surprise it is an object file like the kind we have been creating
What about _start
, atof
and fprintf
Where did they come from????
Lets look for them in the map file
map file can even tell not only what file a symbol came from but also where it ends up being placed in the memory image of our executable
20.2.1.5. Defaults#
C compiler driver ensures that we always link against a set of standard object files and libraries
But the core one is
libc
– The C standard library!
20.3. C Standard Library libc
(-lc
)#
The C standard library was developed at the same time as the core language
There are standards a C compiler and C standard library implementation can conform too
The gnu C compiler
gcc
has its associated gnu libcglibc
20.3.1. Overview#
The C Standard library is very large and provides many categories of routines
We will only consider a small fraction of its functionality 2. Dynamic Memory Management 3. Basic overview of IO
20.3.2. Dynamic Memory Management#
https://www.gnu.org/software/libc/manual/html_mono/libc.html#Memory-Concepts
https://www.gnu.org/software/libc/manual/html_mono/libc.html#Dynamic-Memory-Allocation
C Language has no built in support for Dynamic Memory Variables
other than automatics (function local variables and function parameters)
Must use system calls to get and remove memory from the process
Must use pointers to track it
20.3.2.1. Two categories#
LibC provides routines that give a programmer the ability to dynamically allocate memory
The Memory Allocator (https://www.gnu.org/software/libc/manual/html_mono/libc.html#The-GNU-Allocator)
It calls the OS system calls for the programmer
There is a lot of subtlety to implementing a high performance memory allocator
Basic idea is
The allocator code in libc
malloc
andfree
are called by the application codeThese routines allocate large chucks of memory from the OS
They then break these large chucks down handing out pieces as requested by
malloc
callsAnd coaleasing pieces back into the chucks when
free
is calledIf large requests are made the libc routines call OS to create a separate mappings for these
Similarly if these large requests are freed they immediately free them to the OS
Directly calling
mmap
orbrk
20.3.2.2. Main calls#
void *malloc (size_t size)
// Allocate a block of size bytes. See Basic Allocation.
void free (void *addr)
// Free a block previously allocated by malloc. See Freeing after Malloc.
void *realloc (void *addr, size_t size)
// Make a block previously allocated by malloc larger or smaller, possibly by copying it to a new location. See Changing Block Size.
void *reallocarray (void *ptr, size_t nmemb, size_t size)
// Change the size of a block previously allocated by malloc to nmemb * size bytes as with realloc. See Changing Block Size.
void *calloc (size_t count, size_t eltsize)
//Allocate a block of count * eltsize bytes using malloc, and set its contents to zero. See Allocating Cleared Space.
void *valloc (size_t size)
// Allocate a block of size bytes, starting on a page boundary. See Aligned Memory Blocks.
void *aligned_alloc (size_t size, size_t alignment)
// Allocate a block of size bytes, starting on an address that is a multiple of alignment. See Aligned Memory Blocks.
int posix_memalign (void **memptr, size_t alignment, size_t size)
// Allocate a block of size bytes, starting on an address that is a multiple of alignment. See Aligned Memory Blocks.
void *memalign (size_t size, size_t boundary)
//Allocate a block of size bytes, starting on an address that is a multiple of boundary. See Aligned Memory Blocks.
int mallopt (int param, int value)
// Adjust a tunable parameter. See Malloc Tunable Parameters.
int mcheck (void (*abortfn) (void))
// Tell malloc to perform occasional consistency checks on dynamically allocated memory, and to call abortfn when an inconsistency is found. See Heap Consistency Checking.
struct mallinfo2 mallinfo2 (void)
// Return information about the current dynamic memory usage. See Statistics of Malloc.
20.3.3. Explore with an example#
20.3.3.1. man malloc#
TermShellCmd("man 3 malloc", noposttext=True, markdown=False)
What does this code do?
What do you think will happen?
C: dynmem.c
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
// Use debugger to explore what happens
int main(int argc, char **argv)
{
char *cptr;
int n = 4096;
cptr = malloc(n);
memset(cptr, 0xaa, n);
free(cptr);
return 0;
}
Debug
demonstrate strace ./dynmem
what do you expect to see Use debugger
use
jump
command andset var n=X
to execute several malloc callsexplore /proc/
/maps set breakpoints on malloc, sbrk, brk,
disass - brk easy see syscall
break on syscall instruction - look at maps
use where to show call chains
then add below and use strace
#include <stdlib.h>
#include <unistd.h>
// Use debugger to explore what happens
int
main(int argc, char **argv)
{
char *cptr;
char c;
read(0, &c, 1);
int n = 4096;
cptr = malloc(n);
read(0, &c, 1);
n = 1024 * 1024 * 1024;
cptr = malloc(n);
read(0, &c, 1);
n = 1024 * 1024 * 1024;
cptr = malloc(n);
read(0, &c, 1);
free(cptr);
return 0;
}
20.3.3.2. LibC typically C wrappers for OS system calls and support for making syscalls directly in C#
libc provides C wrappers for system calls
Wrappers expose C function interface for system calls of the OS
You can lookup C version and simply call it like any other C function call
implementation in libc takes care of all the assembly stuff for you
putting parameters in the right registers
filling in the system call number
man 2 syscalls
man 2 brk
glibc also has support for using the syscall instruction in C
man 2 syscall
In the following example we will call brk directly rather than letting malloc call it
Do not do this – you can screw up malloc only for example purposes
C: dynmemsyscalls.c
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/syscall.h> /* For SYS_xxx definitions */
int
main(int argc, char **argv)
{
char *cptr;
int n = 4096;
cptr = malloc(n);
memset(cptr, 0xaa, n);
free(cptr);
cptr = sbrk(4096);
memset(cptr, 0xaa, 4096);
cptr = (void *) syscall(12, 0); // hardcode syscall number
syscall(SYS_brk, cptr + 4096); // use constant from header
memset(cptr, 0xaa, 4096);
return 0;
}
20.3.3.3. You can use mmap
and munmap
in your C programs to manage your own dynamic address space mapping#
man mmap
It is very powerful but also has a lot of parameters
20.3.4. Under the covers malloc
, free
and friends#
are calling
brk
,mmap
andmunmap
you and your code does not need to deal with the details
rather a simpler interface that is consistent across CPU types and OS
get me
n
bytes of memory and return a pointer to itgive back memory at this address
20.3.4.1. Important points about dynamic memory#
What is wrong with this code?
C: badmsgcode.c
#include <stdlib.h>
#include <assert.h>
// waits for request message to arrive: returns length in bytes and updates integer pointed to
// by idPtr with id of request
int getRequest(int *idPtr);
// read the data of request with id into memory pointed to by buffer
void readRequestData(int id, char *buffer);
// process request with id and data in memory pointed to by buffer, frees buffer when done
void processRequest(int id, char *buffer);
int
main(int argc, char **argv)
{
int n;
int *id_ptr;
char *msg_buffer;
// my server loop
while (1) {
id_ptr = malloc(sizeof(int));
assert(id_ptr != 0);
n = getRequest(id_ptr);
msg_buffer = malloc(n);
assert(msg_buffer != 0);
readRequestData(*id_ptr, msg_buffer);
processRequest(*id_ptr, msg_buffer);
}
// should never get here
exit(0);
}
this is why higher level languages have a lot of code to hide the details of dynamic memory
reference counting
garbage collectors
20.4. Layers#
What we see is a general pattern of layering
Assembly is at the bottom
OS kernel provides an assembly interface to its routines
x86 Linux system calls invoked via
syscall
instruction and the right register values
lowest layer of libc routines provides wrappers
The other lib routines build on these to add more functionality eg.
malloc
andfree
thus programmers and C applications are isolated from details of OS syscalls
simply need to learn libc which is ported to each computer OS pair
20.5. IO is no different#
20.5.1. Lowest layer is wrappers for core UNIX I/O routines#
open
,close
,read
,write
,dup
,mkdir
,exec
,mmap
20.5.2. A layer up buffered and formated I/O#
fopen
,fclose
,fread
,fwrite
,fprintf
/printf
,fscanf
/scanf