20. UC-SLS Lecture 20 : Using LibC to access the OS and escape the confines our process#

  • Preliminaries

    • libraries

    • Standard library : libc.a[.so]

  • Address Space management:

    • dynamic memory for data items: malloc and free

    • more powerfully control using mmap and munmap

  • I/O

    • low-level file descriptor based: open, read, write, close, getc, getchar, gets, putc, putchar, puts

    • Formatted and Buffered IO: fopen, fread, fwrite, fclose, fprintf/printf, fscanf/scanf, fgetc, fgets, fputs, fputc, fgetpos, fsetpos, fseek, fflush,

  • create a directory mkdir libc; cd libc

  • copy examples

  • add a Makefile to automate assembling and linking

    • we are going run the commands by hand this time to highlight the details

  • normally you would want to track everything in git

20.1. Overview#

20.2. Preliminaries#

20.2.1. SDKs#

Developers package and distribute “native” code for an specific computer and OS as collection of documentation, header files, and libraries. We often call this collection a Software Development ToolKit (SDK). The functions and types defined in an SDK are often referred to as an Application Programmer Interface.

20.2.1.1. Documentation#

Developers provide documentation that explains the API in terms of the functions and types that their code provides for your use. Traditionally on UNIX systems this comprises as set of man pages.

man complex

man cexp

Documentation tells us

  1. Tells us how to use the functions, macros and types of a SDK

  2. What header files we must include in our source to call particular functions

  3. What libraries we must include when we link

20.2.1.2. Header files#

As we discussed to generate assembly for a call to a function compiler must have

  1. A declaration for a function

  2. Definitions for all types it requires

  3. and possibly preprocessor macros

The headers files of a library provide these things so that your code can compile with calls to the libraries functions. Remember we use preprocessor #include <file> to substitute the contents of <file> into our own source.

20.2.1.3. Libraries#

Libraries are a new kind of file for us. The are “archives” of object files.

Two main types on

  1. Static archive eg. Linux: libm.a

  2. Dynamic archive eg. Linux: libm.so

Statically linking requires static library and dynamic linking requires dynamic

20.2.1.3.1. Linker and Libraries#

We can add a library when we link our executable by passing the right parameters to the linker.

eg. -l<name> will tell linker to use objects from lib<name>.a or lib<name>.so if it needs too

Specifically the library file contains a table of contents with all the object files and symbols define in those objects. If your our object files reference a symbol that you do not define the linker will look for it in the table of contents. If there is an object file in the library that defines the symbol it will include the necessary object file as if you had specified as one of your object files.

20.2.1.4. Example#

C: cexp.c

#include <math.h> /* for atan */
#include <stdio.h>
#include <complex.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
  double y=1.0;
  if (argc>1) y=atof(argv[1]);
  double pi = 4 * atan(y);
  double complex z = cexp(I * pi);
  printf("%f + %f * i\n", creal(z), cimag(z));
}

How does the preprocessor and linker know where to find things

The compiler driver, in our case gcc passes parameters to both the preprocessor and linker

  • -I <dir> tells preprocessor to look for header files in <dir>

    • several standard directories are specified by default

      • eg. -I/usr/include

  • We can look at look at these files if we want to see the details

  • -L <dir> similarly tells the linker to look for libraries in <dir>

    • several standard directories are specified by default

      • eg. -L/usr/lib/x86_64-linux-gnu

The linker map file lets us see all the .o’s that got linked in and where they came from

ar is a tool for working with static libraries see man ar for details

We can use it to list the table of contents of the .o in the archive

We can even use it to extract a member (just like the linker does when linking)

And no surprise it is an object file like the kind we have been creating

What about _start, atof and fprintf

  • Where did they come from????

    • Lets look for them in the map file

      • map file can even tell not only what file a symbol came from but also where it ends up being placed in the memory image of our executable

20.2.1.5. Defaults#

  • C compiler driver ensures that we always link against a set of standard object files and libraries

  • But the core one is libc – The C standard library!

20.3. C Standard Library libc (-lc)#

The C standard library was developed at the same time as the core language

20.3.1. Overview#

  • The C Standard library is very large and provides many categories of routines

  • We will only consider a small fraction of its functionality 2. Dynamic Memory Management 3. Basic overview of IO

20.3.2. Dynamic Memory Management#

20.3.2.1. Two categories#

LibC provides routines that give a programmer the ability to dynamically allocate memory

  • The Memory Allocator (https://www.gnu.org/software/libc/manual/html_mono/libc.html#The-GNU-Allocator)

    • It calls the OS system calls for the programmer

      • There is a lot of subtlety to implementing a high performance memory allocator

    • Basic idea is

      1. The allocator code in libc malloc and free are called by the application code

      2. These routines allocate large chucks of memory from the OS

      3. They then break these large chucks down handing out pieces as requested by malloc calls

      4. And coaleasing pieces back into the chucks when free is called

      5. If large requests are made the libc routines call OS to create a separate mappings for these

      6. Similarly if these large requests are freed they immediately free them to the OS

  • Directly calling mmap or brk

20.3.2.2. Main calls#

void *malloc (size_t size)
// Allocate a block of size bytes. See Basic Allocation.

void free (void *addr)
// Free a block previously allocated by malloc. See Freeing after Malloc.

void *realloc (void *addr, size_t size)
// Make a block previously allocated by malloc larger or smaller, possibly by copying it to a new location. See Changing Block Size.

void *reallocarray (void *ptr, size_t nmemb, size_t size)
// Change the size of a block previously allocated by malloc to nmemb * size bytes as with realloc. See Changing Block Size.

void *calloc (size_t count, size_t eltsize)
//Allocate a block of count * eltsize bytes using malloc, and set its contents to zero. See Allocating Cleared Space.

void *valloc (size_t size)
// Allocate a block of size bytes, starting on a page boundary. See Aligned Memory Blocks.

void *aligned_alloc (size_t size, size_t alignment)
// Allocate a block of size bytes, starting on an address that is a multiple of alignment. See Aligned Memory Blocks.

int posix_memalign (void **memptr, size_t alignment, size_t size)
// Allocate a block of size bytes, starting on an address that is a multiple of alignment. See Aligned Memory Blocks.

void *memalign (size_t size, size_t boundary)
//Allocate a block of size bytes, starting on an address that is a multiple of boundary. See Aligned Memory Blocks.

int mallopt (int param, int value)
// Adjust a tunable parameter. See Malloc Tunable Parameters.

int mcheck (void (*abortfn) (void))
// Tell malloc to perform occasional consistency checks on dynamically allocated memory, and to call abortfn when an inconsistency is found. See Heap Consistency Checking.

struct mallinfo2 mallinfo2 (void)
// Return information about the current dynamic memory usage. See Statistics of Malloc.

20.3.3. Explore with an example#

20.3.3.1. man malloc#

TermShellCmd("man 3 malloc", noposttext=True, markdown=False)

What does this code do?

  • What do you think will happen?

C: dynmem.c

#include <stdlib.h>
#include <string.h>
#include <unistd.h>
// Use debugger to explore what happens
int main(int argc, char **argv)
{
  char *cptr;
  int n = 4096;

  cptr = malloc(n);
  memset(cptr, 0xaa, n);
  free(cptr);
  return 0;
}

Debug

  • demonstrate strace ./dynmem

    • what do you expect to see Use debugger

    • use jump command and set var n=X to execute several malloc calls

    • explore /proc//maps

    • set breakpoints on malloc, sbrk, brk,

      • disass - brk easy see syscall

      • break on syscall instruction - look at maps

      • use where to show call chains

  • then add below and use strace

#include <stdlib.h>
#include <unistd.h>

// Use debugger to explore what happens                                            

int
main(int argc, char **argv)
{
  char *cptr;
  char c;

  read(0, &c, 1);

  int n = 4096;
  cptr = malloc(n);
  read(0, &c, 1);

  n = 1024 * 1024  * 1024;
  cptr = malloc(n);
  read(0, &c, 1);

  n = 1024 * 1024  * 1024;
  cptr = malloc(n);
  read(0, &c, 1);

  free(cptr);
  return 0;
}

20.3.3.2. LibC typically C wrappers for OS system calls and support for making syscalls directly in C#

libc provides C wrappers for system calls

  • Wrappers expose C function interface for system calls of the OS

  • You can lookup C version and simply call it like any other C function call

    • implementation in libc takes care of all the assembly stuff for you

      • putting parameters in the right registers

      • filling in the system call number

man 2 syscalls

man 2 brk

glibc also has support for using the syscall instruction in C

man 2 syscall

In the following example we will call brk directly rather than letting malloc call it

Do not do this – you can screw up malloc only for example purposes

C: dynmemsyscalls.c

#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/syscall.h>   /* For SYS_xxx definitions */

int
main(int argc, char **argv)
{
  char *cptr;
  int n = 4096;

  cptr = malloc(n);
  memset(cptr, 0xaa, n);
  free(cptr);

  cptr = sbrk(4096);
  memset(cptr, 0xaa, 4096);

  cptr = (void *) syscall(12, 0); // hardcode syscall number
  syscall(SYS_brk, cptr + 4096);  // use constant from header
  memset(cptr, 0xaa, 4096);

  return 0;
}

20.3.3.3. You can use mmap and munmap in your C programs to manage your own dynamic address space mapping#

man mmap

It is very powerful but also has a lot of parameters

20.3.4. Under the covers malloc, free and friends#

  • are calling brk, mmap and munmap

  • you and your code does not need to deal with the details

    • rather a simpler interface that is consistent across CPU types and OS

      • get me n bytes of memory and return a pointer to it

      • give back memory at this address

20.3.4.1. Important points about dynamic memory#

What is wrong with this code?

C: badmsgcode.c

#include <stdlib.h>
#include <assert.h>

// waits for request message to arrive: returns length in bytes and updates integer pointed to
// by idPtr with id of request
int getRequest(int *idPtr);
// read the data of request with id into memory pointed to by buffer
void readRequestData(int id, char *buffer);
// process request with id and data in memory pointed to by buffer, frees buffer when done
void processRequest(int id, char *buffer);

int
main(int argc, char **argv)
{
  int n;
  int *id_ptr;
  char *msg_buffer;

  // my server loop
  while (1) {
    id_ptr = malloc(sizeof(int));
    assert(id_ptr != 0);
    n = getRequest(id_ptr);
    msg_buffer = malloc(n);
    assert(msg_buffer != 0);
    readRequestData(*id_ptr, msg_buffer);
    processRequest(*id_ptr, msg_buffer);
  }
  // should never get here
  exit(0);
}

  • this is why higher level languages have a lot of code to hide the details of dynamic memory

    • reference counting

    • garbage collectors

20.4. Layers#

What we see is a general pattern of layering

  • Assembly is at the bottom

  • OS kernel provides an assembly interface to its routines

    • x86 Linux system calls invoked via syscall instruction and the right register values

  • lowest layer of libc routines provides wrappers

  • The other lib routines build on these to add more functionality eg. malloc and free

    • thus programmers and C applications are isolated from details of OS syscalls

      • simply need to learn libc which is ported to each computer OS pair

20.5. IO is no different#

20.5.1. Lowest layer is wrappers for core UNIX I/O routines#

  • open, close, read, write, dup, mkdir, exec, mmap

20.5.2. A layer up buffered and formated I/O#

  • fopen, fclose, fread, fwrite,

  • fprintf/printf, fscanf/scanf