Thursday, August 27, 2009

Static and Dynamic Linker

Linker: A program that links one or more object or relocatable files and results in binary executable. Linking done in two forms.

Static linking:
Executable built with libraries at build time. Every time we create a new executable depending on platform.

Dynamic linking:
Linking with libraries at run time. Easy designing, change of code, update etc. These files are first loaded in memory and then linked. Dynamic code is shared between multiple applications.

Important library for C programming is libc.

The library file created with extension .a or .so .
In UNIX environment file name having .a extension called as static library and for .so as dynamic library.

In windows environment dynamic library as .dll extension and static as .libs extension.

Creating a Static and dynamic library in LINUX:
Tool for creating library is ar (library compressed tool).

create source files like first.c and second.c
create relocatable executable files like
gcc -c first.c -o first.o
gcc -c second.c -o second.o
ar rcs lmine.a first.o second.o

Note: rcs stands replace create symbol(variable,function name etc).

Relocatable binary is created in two forms.
position dependent.
position independent.

Above created static library is position dependent.
Position independent code can be created using
gcc -c -fpic first.c -o first.o
gcc -c -fpic second.c -o second.o

gcc -shared -o libmine.so first.o second.o

files which showed in green color required permision for load.

/usr/lib is a default libary repository.
/usr/include is a default header repository.

while using the static or dynamic library we have to create a header file which contains the function prototype.

gcc -I./ app.c -o app

search for include files in current directory.
for using library
gcc app.o -o app -lmine.a
it search in default directory /usr/lib
for searching in current directory.
gcc -c app.o -o app ./mine.a

here we can see, it is having static library but it is also linking some dynamic library called libc etc.

for linking statically give -static option during creation of binary executable.
when we find the object dump we get more information how it linked.

objdump -D app | more

linker creates procedure address table, It contains dynamically linked function and address of it.

Dynamic or Shared libraries can be used by an application as a load time library or run time library. Load time library mainly loaded during init.

load time libraries are those which are loaded during process initialization and remains resident through out the process execution time.

libraries are called as run time when they are loaded by functionality.
it mainly avoid direct reference to library.

for example the source is in mandl.c

void (*ptr) ( void);

void *libptr=dlopen("./mylib.so",RTLD_NOW);
ptr=dlsym(libptr,"func");
ptr(void);
dlclose(libptr);
gcc mandl.c -o mandl -ldl

RTLD_LAZY is used when it takes much time in execution.

Binary Image:
Binary image are of 3 types.

1) Linkable( relocatable).
2) Loadable( shared library with position independent).
3) Executable( functionality + runtime ).

functionality further divided into instruction and data.
each part of binary image is loaded into each block of memory(stack, BSS, Code and data).
To read the header information of binary file objdump or readelf is used.

.text contains all instructions.
.data contains only initialized global variable, static variables etc.
.bss uninitialized global data.
.rodata contains read only data. No write operation performed in this.
other sections are linker specific.
each .o file has section header table.
linker combines all section header table into single program header table with matching the same flags.

The header table in executable binary is called program header table.
loader mainly use this header table to load into process address space.

Various flags in sections.

PROGBITS: code,data,debug information.
SYMTAB and DYNSYM: Symbol table.
STRTAB: String table.
etc.

How a binary executable loaded into memory.
  • Think that we want to execute program1.For execution we have to write ./program1
  • The shell receives this request to run a program, Checks for the existence of file.
  • It verifies the Application Binary Interface(ABI). To know ABI type readelf -a
  • It every thing is fine it call loader to load the program.
  • Loader checks the program header table part of elf header.
  • Using the kernel memory allocator it allocates memory block in user space and maps the segments in executable image.
  • Calls the kernel process manager to carry process registration.
  • It allocates structures to keep track of information i.e TCB ( task control block) named as task_struct.
  • A valid process id is given.
  • The process control block is en queued to run queue. ( depends on different queue different run queues are used).
  • Depending on the scheduler it selects PCB from run queue.
  • During runtime it adds a section named as stack.
Application Booting up Procedure:
  • Booting of application starts with an init routine. Initializer is responsible for intilising of stack segment and loading of shared objects or libraries. Link loader is resposible for loading all shared libraries.
  • Before loading it resolves all symbols of program header table. During loading of shared libraries symbol entries in PL table are updated with absolute address.
  • after initialization it calls main program.
  • Process detach is responsible for detaching shared libraries from process. It also releases the stack segment assigned to it.

Wednesday, August 26, 2009

Stages of Compilation in Linux

A Source file is compiled and linked to form a executable binary file for execution in a architecture.Understanding the various stages of compilation helps in cross compilation of code.
for e.g. below steps shows compilation process using GCC compiler.

Source file:
It contains the Source program in text format. it can be of any langaue c,c++ etc.
for e.g. first.c is a C Source code.

Preprocessing:
it helps in creating fast and efficient code.
It reads from header files for creating a preprocessed source file.
All macros and constant symbols are replaced.
All conditional preprocessor directives are processed by preprocessor.
It provides conditional compilation.

gcc -E first.c -o first.i

The know the steps of preprocessing in console. compile using
gcc -v -E first.c -o first.i
v stands for verbose.

Assembler
: Takes preprocessed file and creates .s file called as assembly file.
it is mainly required for optimization(speed and space) of code.
for e.g. gcc -v -S first.i -o first.s

Relocatable Binary
:

gcc -c first.s -o first.o
Contains offset address of the assembly code, it is assigned at compile time.
object dump of first.o shows offset address.
for eg: A relocatable code contains call 19 <> .Its position depends on main position.
This file contains source in assembly and library routines.


Executable Binary:
gcc first.o by default it creates a.out else we can give as gcc first.o -o first

This loadable file contains loadable address in the form of segment and offset called as absolute address.Function calls entries present in PLT called as procedure linkage table.
executable file contains some run time library. This file is mainly created by linker which is OS dependent.

From executable binary to loadable binary code is created by
objdump -D first
for seeing in page wise- objdump -D first | more

File Format: To know the format of file, file command is used.

for e.g.
file first.c
it shows as text file.

file first.i
it shows as text file.

file first.s
it shows as Assembly file.

file first.o
it shows as binary file.

These set of tools are called tool chain. Cross compilers mainly required for executing the code in different architectures. The object dump of binary executable contains the loadable address.

The creation of files from .c to .o is can be used in any architecture.
.exe or binary executable is platform and architecture specific.

The executable has three flags:
suid: process keeps information about uid. if this flag is set.
sgid: process keeps information about gid. if this flag is set.
sticky: if this flag is set means requesting kernel to keep in memory after execution.

during execution of executable binary it is stored in process address space.
process address space contains different segments of memory like data,code,stack etc.

During runtime three key functions are performed.
--init
Resource allocation is done here.it calls different resource allocation routines.it calls specific kernel system calls.

--start
make a call to main and handover the control to functionality.

--fini
release of all resources allocated by init.

if system is crashed due to some bug in the kernel modules. The --fini routine is not called and resources are not deallocated. The parent will become a zombie process and performs all cleanup.

relocatable code is executed in various platform by a interface called as Runtime.
(Relocatable code .o file) functionality->unix runtime->binary(elf format with no extension)-> unix OS.
(Relocatable code .obj file) functionality->windows runtime->binary(coff format with .exe extension)->Windows OS.

Runtime layer also called as Application Binary Interface.