Saturday, January 14, 2017

Keikaku Prelude Part 1: C Programming Crash Course

This is the first of a three-part introduction to C programming. This is targeted at people that want to follow along with keikaku projects, but have no experience in C. This is important since C99 will be the main language for the foundation portion of keikaku.


In this part, we'll do a crash course in C where you'll learn to make projects while having a sense of what's happening under the hood.

Let's get the ball rolling.

Installing a C Compiler


The all-important step 1 is to install the gcc compiler to compile the programs we'll be seeing. This is trivial if you run linux, but if you run windows there are several options. The best one is just to install Minimalist GNU for Windows aka MinGW. Get the setup file here, which by default will setup in C:\MinGW. This installs mingw-get, which lets you select from components to install. Check mingw-developer-tools, mingw32-base, mingw32-gcc-g++ and hit "apply changes" from the window's drop-down menu. When that finishes, append C:\MinGW\bin to your environmental path variable so you can conveniently use gcc from the windows command line.

Who is This "Executable"

To understand C, we need to understand what programs are.

Runnable code that can run directly on your machine is called an executable (duh, right?). It's just a set of instructions. This is what it looks like in memory when you run it. Just try to remember the names for now.


The parts are as such.

text:
    The "text" portion of an executable is the instructions. It's the architecture's assembly instructions.
data:
    This is where initialized data lives, like initialized global variables.
bss:
    Contains uninitialized global variables. That is, it is known that these variables will exist and will be global throughout the lifetime of the program, but their value won't be known until we get to the program's execution(i.e. runtime).

Stack and heap are for variables created at run-time(often called dynamic memory). They are not part of the executable.

To deal with the variables created at run-time, we separate all our dynamic memory into a stack and heap portion. The stack will have short-lived, temporary variables, whereas the heap is what we use for storing longer term data. We grab data from the heap when we use malloc() or calloc().






The text, data, bss parts are non-dynamic because they compose the executable itself. Everything about them is known at when you compile your C program (often called compile-time), but the dynamic part is only known at run-time.

Let's look at a simple C program as an example.

//ex1.c
//"standard io" library lets us use printf
#include <stdio .h>  

int main(){
    int i = 420;
    char *my_string = "this program just prints this string dude";
    printf(my_string);  //"printf = print-format"
    return 0;
}

Put this in a file called ex1.c, and use this command to compile it

    gcc ex1.c -o a.exe

The "-o" specifies the name of the output executable. Run the program to see its output.

Remember the three parts of an executable? We can actually look at those with a command.

size ex1.exe
4th and 5th columns are their sum in decimal and hex notation
Furthermore, using the -A option gives us "SysV format", that lets us see individually named sections of the executable.
size -B ex1.exe


Finally, let's take a look at the assembly in our main() with the following
objdump -M intel -D ex1.exe | grep main.: -A16
 

The format for intel-style assembly is operation <destination>, <source> if you wanted to stare at that screen, but don't force yourself to worry about it; we just want the gist of this level.

That wraps up what C gets compiled into, now let's go into the language.

Bonus: To get a sense of how big a full assembly program is. See all the compiler-generated output by running
objdump -M intel -D ex1.exe
and then realize that this is a small executable file

C Language and Features


Functions

C is a language of functions. The entry point of a C program, main() is a function like any other, your compiler just sets it as the entry-point by default.

//ex2.c
#include <stdio.h>
#include <math.h>

void printstr(char* in, int i){
    printf("argument %d :  %s\n", i, in);
}

int main(int argc, char** argv){
    int i=0;
    while (i<argc){
        printstr(argv[i], i);learning
        i++;
    }
    printf("%d\n",sizeof(char));
    return 0;
}

The environmental variables and input are passed as an argument to the program, and exist right above where the entire stack starts. "But what's a stack?" you ask? It's a data structure with an add and remove operation, and where the last thing you added is the first thing to be removed. The function stack works the same way as the data structure.

When we call a function, we move a register(which you can think of as a variable) called the frame pointer to the beginning of the new frame

When we call printstr(), we add a new frame on the stack, aka a stack-frame. We first push our arguments onto the stack, then before jumping to the function's code, we put the return address(i.e. where we are) onto the stack, then we set the frame pointer and jump to the function's code. Locals variables are stored relative to the frame pointer and our stack pointer now points to the new bottom of the stack. This way, we always store our return address for when we finish computation.

Here's an illustrated example.

FP=Frame Pointer, SP=Stack Pointer

This works for any number of function-calls-inside-other-function-calls, as long as our stack has space for it, otherwise we get a, uh, well, you know.

Lets look at dealing with functions in multiple files. Take the following two blocks and put them into ex3-main.c and ex3-func.c, respectively. Then run gcc ex3-main.c ex3-func.c -o ex3.exe to get a runnable program.

//ex3-main.c
#include <stdio.h>

int space_function(int in);

int main(){
    printf("Calling external function\n");
    space_function(2);
    return 0;
}

//ex3-func.c
#include <stdio.h>

int space_function(int in){
    while (in>0){
        printf("SPACE\n"); in--;
    }
    return 420;
}

Why does this work? When you compile the program, each function declaration(the part that isn't the body) becomes a symbol. You can declare a symbol in one file and define it in another, and a program called a linker will try to resolve missing symbols during the linkage phase of compiling. Defining a symbol(i.e. the body of the function) multiple times will cause the compiler to fail.

Types

Assembly doesn't have types(for the most part). Just memory and operations on memory. The operations themselves expects their input to be in a type but act more like maps between input and output. There are literally four types.

Types: char, int, float, double
Size:    1,     4,    4,      8  bytes

These are almost always the type sizes, but they can change on different platforms due to register size or compiler implementation. The standard itself only guarantees integers to be 16 bits.

Furthermore, there are three main modifiers

long: can be applied to float or double "long int" guarantees 32 bits, "long long int" guarantees 64 bits. "long double" ensures at least 80-bit floating point precision
unsigned: Removes the negative range of your variable, but doubles the positive range. Only supported by integral types.

Anyway, there are many different annoying ways to combine these modifiers, but it's usually clear what they mean.


Language Constructs and Scope

Look, all basic constructs for language structure and scope are the same as other languages you've used, but for completeness I'll include them here.
#include <stdio.h>


int main(){
    int i=0, j=2;
    {  //these braces define a new lexical scope, so 
        int i=3;
        printf("i=%d j=%d", i, j);  //prints out 2,3
    }
    return 0;
}

Let's add modifiers and use a larger range by making it unsigned.
#include <stdio.h>
//this is a preprocessor directive, we'll cover that in a bit
//but this one just replaces that string with the string 10 in your code
#define ARRAY_NEEDS_COMPILE_TIME_SIZE 10
int main(){
    int oh_boy_an_array[ARRAY_NEEDS_COMPILE_TIME_SIZE]
    int *i_am_also_an_array = malloc(10*sizeof(int));  //defines an array but in runtime via a pointer
    long long unsigned int i=0;
    for (long long unsigned int i=ULLONG_MAX-2; i<ULLONG_MAX; i++){
        printf("what value is this? %llu :", i);
    }
}
Finally, pointers and arrays are almost exactly the same. You can use array-style indexing for a pointer, and it'll figure out the memory offset using its type. Similarly, an array name will just give you the address of the memory it's pointing to.
#include <stdio.h>
int main(){
    int a[1000]
    int *b = malloc(1000*sizeof(int));
    for (int i=0; i<1000; i++){
        a[i] = i;
        b[i] = i;
    }
    printf("Value at 489 - a%d : , b:%d", a[489], b[489]);  //will print the same thing
}

You'll learn more constructs through later examples, but you should mainly think of the language as a slick wrapper for assembly.

Managing Multiple Files


Defining a function definition every time you want to import something is quite exhausting and prone to error. This is why we include headers, but what does that actually do? Well, it has to do with the stages of compiling a C program, which are as follows.

$$1) \hskip{0.25em} Pre\text{-}Processing  \hskip{0.25em} \Rightarrow  \hskip{0.25em}2) \hskip{0.25em} Compilation  \hskip{0.25em} \Rightarrow 3) \hskip{0.25em} Assembly  \hskip{0.25em} \Rightarrow 4) \hskip{0.25em} Linking$$

The pre-processor runs on macros in your code, compilation turns your code into assembly instruction, assembly turns those into machine instructions, and the linker connects them to the system's libraries.

Any C statement starting with # is a directive to the pre-processor, which modifies the source file before compiling it. This process doesn't do any complex analysis of types or C code, it's literally just adds strings to your code.

The #include <target> directive, in particular, takes the target file and literally pastes its contents in its place. You can include another C file, or any text file for that matter. However, by placing common function declarations in header files we can include those functions without redefining them, whereas if we included the .c file, we'd have multiple definitions of the same symbols!

Let's look at a two-file example of this.

//ex4-main.c
#include <stdio.h> //the brackets tell the compiler to look in "include paths" for the headers
#include "ex4.h"  //parentheses tell the compiler to look in the local directory for the header


int main(){
    printf("Calling external function\n");
    space_function(2);
    return 0;
}
//ex4.h
//headers are for interfaces, so you can just read the functions you need

//function prints space to the degree of its input
int space_function(long unsigned int in);
//ex4-implementation.c
#include <stdio.h>
#include "ex4.h"

//supports a lot of space
int space_function(long unsigned int in){
    while (in>0){
        printf("SPACE\n"); in--;
    }
    return 420;
}

To run this, you only need to pass the c files to the compiler, and it'll find its dependencies if they're in the search path. Compile all these files with gcc ex4*c -o ex4.exe, and the terminal will substitute in all files that start with "ex4" and end with "c".

Finally, let's look at how global variables can be shared between files with the "extern" keyword, or hidden with the "static" keyword. The "extern" keyword, as applied to variables, tells that compilation-unit (i.e. file) to look for the definition of the variable elsewhere. This code is also starting to look more like real code, which almost always usually relies on a non-standard interface of a library.

//ex5-main.c
#include <stdio.h>
#include "ex5-header.h"

extern char* ENTRANCE_STATEMENT;

int main(){
    //welcome the user
    printf(ENTRANCE_STATEMENT);

    instantiate_library();
    draw_disc();
    set_disc_size(17,17); //high-resolution disc
    draw_disc();

    return 0;
}
The header file will usually look something like this, if you don't just read library docs directly
//--------------------------------------------------
// ex5.h
// Disclib - The library that draws discs
// Author : Sergey Ivanov
// Copyright vapor-waves c.1997
//--------------------------------------------------

//instantiates library so you can use its functions
void instantiate_library();
//draws disc 
void draw_disc();
//resizes disc proportions for next draw
void set_disc_size(int width, int length);
//ex5-globals.c
int    DISC_HEIGHT    = 5;
int    DISC_WIDTH     = 5;
char * ENTRANCE_STATEMENT = "welcome to the world of discs\n";
//ex5-library.c
//--------------------------------------------------
// Disclib - The library that draws discs
// Author : Sergey Ivanov
// Copyright vapor-waves c.1997
//--------------------------------------------------

#include "ex5-header.h"
#include <stdio.h>
#include <math.h>

static char* ENTRANCE_STATEMENT = "welcome... to discworld\n"; //restrict scope to this file

extern int DISC_HEIGHT;
extern int DISC_WIDTH;

//helper functions
static float axis_x(float x); //static limits their visibility to this file during compliation, in-case an extern function with the same name is declared
static float axis_y(float x);

//proprietary functionality -- DO NOT LOOK
void instantiate_library(){
    printf(ENTRANCE_STATEMENT);
}

//using this function adds a couple hundred to the base license cost
void set_disc_size(int width, int height){
    DISC_HEIGHT = height;
    DISC_WIDTH  = width;
}

//draws a fucking disc
void draw_disc(){
    float x, y;
    for (x=0; x< DISC_WIDTH; x++){
        for (y=0; y < DISC_HEIGHT; y++){
            double x_n = axis_x(x);
            double y_n = axis_y(y);
            double len=sqrt(x_n*x_n + y_n*y_n);
            if (len > 1){
                putchar('#');
            } else {
                putchar(' ');
            }

        }
        putchar('\n');
    }

    putchar('\n');
}

float axis_x(float x){
    float midpoint = DISC_WIDTH/2;
    float result = (x-midpoint) / midpoint;
    return result;
}

float axis_y(float y){
    float midpoint = DISC_HEIGHT/2;
    float result = (y-midpoint) / midpoint;
    return result;
}

Build all the c files with gcc ex5*c -lm
The -lm option links the math library, though this is actually included by default with the mingw compiler.

This is a good example of how to hide functionality, since the end-user never needs to look at the implementation of the library. In practice, they often won't be able to anyway, because they'll have include a pre-compiled object-file that implements the desired functionality. The linker will do the job of finding the symbols in the object-file that match the header's interface the same as if its source were included in the compilation step. That gets more into building libraries than programs.

The discs of our labor

There are quite a few topics that are left untouched, but this post should have given you more than enough to make useful programs with it. We've covered assembly formats, basic C syntax, run-time vs compile-time memory, and project management with the preprocessor. With the bottom-up approach, you now have a solid foundation for understanding any new C info thrown at you.

Exercises for the Reader:

1)    Expand example 2 to capitalize the user's input
2)    Improve the the code from (1) by putting all functions and variables into a separate file(s)
3)    Tell me the difference between a disc and a disk

all sources can be found on this github page

No comments :

Post a Comment