In [None]:
%run -i ../python/common.py
UC_SKIPTERMS=True
%run -i ../python/ln_preamble.py

# UC-SLS Lecture 20 : Using LibC to access the OS and escape the confines our process
- Preliminaries
  - libraries
  - Standard library : `libc.a[.so]`
- Address Space management:
  - dynamic memory for data items: `malloc` and `free`
  - more powerfully control using `mmap` and `munmap`
- I/O
  - low-level file descriptor based: `open`, `read`, `write`, `close`, `getc`, `getchar`, `gets`, `putc`, `putchar`, `puts`
  - Formatted and Buffered IO: `fopen`, `fread`, `fwrite`, `fclose`, `fprintf`/`printf`,
  `fscanf`/`scanf`, `fgetc`, `fgets`, `fputs`, `fputc`, `fgetpos`, `fsetpos`, `fseek`, `fflush`, 

In [None]:
# setup for sumit examples
appdir=os.getenv('HOME')
appdir=appdir + "/libc"
#print(movdir)
output=runTermCmd("[[ -d " + appdir + " ]] &&  rm -rf "+ appdir + 
             ";mkdir " + appdir + 
             ";cp ../src/Makefile ../src/cexp.c ../src/badmsgcode.c ../src/dynmem.c ../src/dynmemsyscalls.c " + appdir)

display(Markdown('''
- create a directory `mkdir libc; cd libc`
- copy examples
- add a `Makefile` to automate assembling and linking
    - we are going run the commands by hand this time to highlight the details
- normally you would want to track everything in git
'''))
TermShellCmd("ls " + appdir)

## Overview

<center>
<img src="../images/LibC-001.png" >
</center>

<center>
<img src="../images/LibC-002.png" >
</center>

<center>
<img src="../images/LibC-003.png" >
</center>

## Preliminaries 
### SDKs 

Developers package and distribute "native" code for an specific computer and OS as collection of documentation, header files, and libraries.  We often call this collection a Software Development ToolKit (SDK).  The functions and types defined in an SDK are often referred to as an Application Programmer Interface.  

#### Documentation

Developers provide documentation that explains the API in terms of the functions and types that their code provides for your use.  Traditionally on UNIX systems this comprises as set of man pages.  

**man complex**

In [None]:
TermShellCmd("man complex", noposttext=True, markdown=False)

**man cexp**

In [None]:
TermShellCmd("man cexp", noposttext=True, markdown=False)

**Documentation tells us**
    
0. Tells us how to use the functions, macros and types of a SDK
1. What header files we must include in our source to call particular functions
2. What libraries we must include when we link

#### Header files

As we discussed to generate assembly for a call to a function compiler must have
1. A declaration for a function 
2. Definitions for all types it requires
3. and possibly preprocessor macros 

The headers files of a library provide these things so that your code can compile with calls to the libraries functions. Remember we use preprocessor `#include <file>` to substitute the contents of `<file>` into our own source.

#### Libraries

Libraries are a new kind of file for us. The are "archives" of object files.

Two main types on
1. Static archive eg. Linux: `libm.a` 
2. Dynamic archive eg. Linux: `libm.so` 

Statically linking requires static library and dynamic linking requires dynamic

##### Linker and Libraries
 
We can add a library when we link our executable by passing the right parameters to the linker.  

eg.  `-l<name>` will tell linker to use objects from `lib<name>.a` or `lib<name>.so` if it needs too

Specifically the library file contains a table of contents with all the object files and symbols define in those objects.  If your our object files reference a symbol that you do not define the linker will look for it in the table of contents.  If there is an object file in the library that defines the symbol it will include the necessary object file as if you had specified as one of your object files.

#### Example

In [None]:
display(Markdown('<font size="6rem">' + FileCodeBox(
    file=appdir + "/cexp.c", 
    lang="c", 
    title="<b>C: cexp.c",
    h="100%", 
    w="100%"
) + '</font>'))
TermShellCmd("[[ -a cexp ]] && rm cexp;make cexp", cwd=appdir, prompt='', noposttext=True)
TermShellCmd("./cexp", cwd=appdir, noposttext=True)

**How does the preprocessor and linker know where to find things**

The compiler driver, in our case `gcc` passes parameters to both the preprocessor and linker
- `-I <dir>` tells preprocessor to look for header files in `<dir>` 
   - several standard directories are specified by default
     - eg. `-I/usr/include`

In [None]:
TermShellCmd("ls -la /usr/include/math.h /usr/include/stdio.h /usr/include/complex.h", noposttext=True)

- We can look at look at these files if we want to see the details 

- `-L <dir>` similarly tells the linker to look for libraries in `<dir>`
  - several standard directories are specified by default
       - eg. `-L/usr/lib/x86_64-linux-gnu`

**The linker map file lets us see all the .o's that got linked in and where they came from**


In [None]:
TermShellCmd("head -20 cexp.map ", cwd=appdir, noposttext=True)

**`ar` is a tool for working with static libraries**
see `man ar` for details

We can use it to list the table of contents of the `.o` in the archive

In [None]:
TermShellCmd("ar t /usr/lib/x86_64-linux-gnu/libm-2.31.a | head", cwd=appdir, noposttext=True)

We can even use it to extract a member (just like the linker does when linking)

In [None]:
TermShellCmd("ar x /usr/lib/x86_64-linux-gnu/libm-2.31.a s_cexp.o; ls -l s_cexp.o", cwd=appdir, noposttext=True)

**And no surprise it is an object file like the kind we have been creating**

In [None]:
TermShellCmd("objdump -d -Mintel s_cexp.o | head -20", cwd=appdir, noposttext=True)

**What about `_start`, `atof` and `fprintf`**

- Where did they come from????
   - Lets look for them in the map file
      - map file can even tell not only what file a symbol came from but also where it ends up being placed in the memory image of our executable

In [None]:
TermShellCmd("grep -B 1 ' _start$' cexp.map", cwd=appdir, noposttext=True)

In [None]:
TermShellCmd("grep -B 1 ' atof$' cexp.map", cwd=appdir, noposttext=True)

In [None]:
TermShellCmd("grep -B 1 ' fprintf$' cexp.map", cwd=appdir, noposttext=True)

#### Defaults

- C compiler driver ensures that we always link against a set of standard object files and libraries
- But the core one is `libc` -- The C standard library!


## C Standard Library `libc` (`-lc`)

The C standard library was developed at the same time as the core language
- There are standards a C compiler and C standard library implementation can conform too
  - eg. https://www.iso.org/standard/17782.html
- The gnu C compiler `gcc` has its associated gnu libc `glibc`
  - https://www.gnu.org/software/libc/manual/html_node/index.html
  - These are the standards it conforms too
    - https://www.gnu.org/software/libc/manual/html_node/Standards-and-Portability.html#Standards-and-Portability

### Overview

- The C Standard library is very large and provides many categories of routines

- We will only consider a small fraction of its functionality
  2. Dynamic Memory Management 
  3. Basic overview of IO

### Dynamic Memory Management
- https://www.gnu.org/software/libc/manual/html_mono/libc.html#Memory-Concepts
- https://www.gnu.org/software/libc/manual/html_mono/libc.html#Dynamic-Memory-Allocation
    - C Language has no built in support for Dynamic Memory Variables 
    - other than automatics (function local variables and function parameters)
  - Must use system calls to get and remove memory from the process
  - Must use pointers to track it

#### Two categories

LibC provides routines that give a programmer the ability to dynamically allocate memory 
- The Memory Allocator (https://www.gnu.org/software/libc/manual/html_mono/libc.html#The-GNU-Allocator)
  - It calls the OS system calls for the programmer 
    - There is a lot of subtlety to implementing a high performance memory allocator
  - Basic idea is 
    1. The allocator code in libc `malloc` and `free` are called by the application code
    2. These routines allocate large chucks of memory from the OS
    3. They then break these large chucks down handing out pieces as requested by `malloc` calls
    4. And coaleasing pieces back into the chucks when `free` is called
    5. If large requests are made the libc routines call OS to create a separate mappings for these
    6. Similarly if these large requests are freed they immediately free them to the OS
- Directly calling `mmap` or `brk`

#### Main calls
- https://www.gnu.org/software/libc/manual/html_node/Summary-of-Malloc.html#Summary-of-Malloc

```
void *malloc (size_t size)
// Allocate a block of size bytes. See Basic Allocation.

void free (void *addr)
// Free a block previously allocated by malloc. See Freeing after Malloc.

void *realloc (void *addr, size_t size)
// Make a block previously allocated by malloc larger or smaller, possibly by copying it to a new location. See Changing Block Size.

void *reallocarray (void *ptr, size_t nmemb, size_t size)
// Change the size of a block previously allocated by malloc to nmemb * size bytes as with realloc. See Changing Block Size.

void *calloc (size_t count, size_t eltsize)
//Allocate a block of count * eltsize bytes using malloc, and set its contents to zero. See Allocating Cleared Space.

void *valloc (size_t size)
// Allocate a block of size bytes, starting on a page boundary. See Aligned Memory Blocks.

void *aligned_alloc (size_t size, size_t alignment)
// Allocate a block of size bytes, starting on an address that is a multiple of alignment. See Aligned Memory Blocks.

int posix_memalign (void **memptr, size_t alignment, size_t size)
// Allocate a block of size bytes, starting on an address that is a multiple of alignment. See Aligned Memory Blocks.

void *memalign (size_t size, size_t boundary)
//Allocate a block of size bytes, starting on an address that is a multiple of boundary. See Aligned Memory Blocks.

int mallopt (int param, int value)
// Adjust a tunable parameter. See Malloc Tunable Parameters.

int mcheck (void (*abortfn) (void))
// Tell malloc to perform occasional consistency checks on dynamically allocated memory, and to call abortfn when an inconsistency is found. See Heap Consistency Checking.

struct mallinfo2 mallinfo2 (void)
// Return information about the current dynamic memory usage. See Statistics of Malloc.
```

### Explore with an example

#### man malloc

In [None]:
TermShellCmd("man 3 malloc", noposttext=True, markdown=False)

**What does this code do?**
- What do you think will happen?

In [None]:
display(Markdown("<font size='6.5em'>" + FileCodeBox(
    file=appdir + "/dynmem.c", 
    lang="c", 
    title="<b>C: dynmem.c",
    h="100%", 
    w="100%"
)+"</font>"))
TermShellCmd("[[ -a dynmem ]] && rm dynmem;make dynmem", cwd=appdir, prompt='', noposttext=True)

In [None]:
display(showDT())

- demonstrate strace ./dynmem
  - what do you expect to see
Use debugger 
  - use `jump` command and `set var n=X` to execute several malloc calls
  - explore /proc/<pid>/maps
  - set breakpoints on malloc, sbrk, brk, 
    - disass - brk easy see syscall
    - break on syscall instruction - look at maps
    - use where to show call chains
- then add below and use strace

```c
#include <stdlib.h>
#include <unistd.h>

// Use debugger to explore what happens                                            

int
main(int argc, char **argv)
{
  char *cptr;
  char c;

  read(0, &c, 1);

  int n = 4096;
  cptr = malloc(n);
  read(0, &c, 1);

  n = 1024 * 1024  * 1024;
  cptr = malloc(n);
  read(0, &c, 1);

  n = 1024 * 1024  * 1024;
  cptr = malloc(n);
  read(0, &c, 1);

  free(cptr);
  return 0;
}
```

#### LibC typically C wrappers for OS system calls and support for making syscalls directly in C


**libc provides C wrappers for system calls**

- Wrappers expose C function interface for system calls of the OS
- You can lookup C version and simply call it like any other C function call
  - implementation in libc takes care of all the assembly stuff for you
    - putting parameters in the right registers
    - filling in the system call number

**man 2 syscalls**

In [None]:
TermShellCmd("man 2 syscalls|head -80", noposttext=True, markdown=False)

**man 2 brk**

In [None]:
TermShellCmd("man 2 brk", noposttext=True, markdown=False)

**glibc also has support for using the syscall instruction in C**

**man 2 syscall**

In [None]:
TermShellCmd("man 2 syscall", noposttext=True, markdown=False)

**In the following example we will call brk directly rather than letting malloc call it**

Do not do this -- you can screw up malloc only for example purposes


In [None]:
display(Markdown('<font size="6rem">' + FileCodeBox(
    file=appdir + "/dynmemsyscalls.c", 
    lang="c", 
    title="<b>C: dynmemsyscalls.c",
    h="100%", 
    w="107em"
) + "</font>"))
TermShellCmd("[[ -a dynmemsyscalls ]] && rm dynmemsyscalls;make dynmemsyscalls", cwd=appdir, prompt='', noposttext=True)

#### You can use `mmap` and `munmap` in your C programs to manage your own  dynamic address space mapping


**man mmap**

It is very powerful but also has a lot of parameters 

In [None]:
TermShellCmd("man 2 mmap|head -40", noposttext=True, markdown=False)

### Under the covers `malloc`, `free` and friends
- are calling `brk`, `mmap` and `munmap` 
- you and your code does not need to deal with the details
  - rather a simpler interface that is consistent across CPU types and OS
    - get me `n` bytes of memory and return a pointer to it 
    - give back memory at this address

#### Important points about dynamic memory

What is wrong with this code?

In [None]:
display(Markdown('<font size="6rem">' + FileCodeBox(
    file=appdir + "/badmsgcode.c", 
    lang="c", 
    title="<b>C: badmsgcode.c",
    h="100%", 
    w="100%"
) + "</font>"))
TermShellCmd("[[ -a badmsgcode.o ]] && rm badmsgcode.o;make badmsgcode.o", cwd=appdir, prompt='', noposttext=True)

- this is why higher level languages have a lot of code to hide the details of dynamic memory
  - reference counting
  - garbage collectors

## Layers

What we see is a general pattern of layering

- Assembly is at the bottom
- OS kernel provides an assembly interface to its routines 
  - x86 Linux system calls invoked via `syscall` instruction and the right register values
- lowest layer of libc routines provides wrappers
- The other lib routines build on these to add more functionality eg. `malloc` and `free`
  - thus programmers and C applications are isolated from details of OS syscalls
    - simply need to learn libc which is ported to each computer OS pair

## IO is no different

### Lowest layer is wrappers for core UNIX I/O routines

- `open`, `close`, `read`, `write`, `dup`, `mkdir`, `exec`, `mmap`

### A layer up buffered and formated I/O

- `fopen`, `fclose`, `fread`, `fwrite`, 
- `fprintf`/`printf`, `fscanf`/`scanf`
