Monday, January 16, 2017

Keikaku Prelude Part 2: C Debugging Crash Course

This is the second of a three-part introduction to C programming. If you haven't read part one, check it out here. This is targeted at people that want to follow along with keikaku projects, but have no experience in C. This is important since C99 will be the main language for the foundation portion of keikaku.


In this episode of our C-saga, we'll cover the basics of debugging a program with gdb. If you haven't installed gdb on your computer, and you run windows, follow the mingw instructions from part one here. Otherwise, if you're running linux, you know how to install a package.

Basic Errors


What is a program failure? It's a tricky problem to define, because programs are just instructions that run. If you had a program that inremented an integer forever, it would run just fine until the processor sends an error, which could itself be considered acceptable behavior.

The point is that the concept of a program error only make sense in the context of an operating system or platform. Since our platform is gcc, on some x86 architecture, our list of error-types becomes fairly easy to understand.

Here are the core causes of almost all run-time errors when programming in C:
  • reading/writing memory your process doesn't own
  • accessing a device or file you don't have permission to
  • giving garbage values to system calls(your own functions may or may not care)
  • running assembly instructions with garbage input. e.g. dividing by zero. This varies by architecture
  • generally, trying to do something you don't have permissions to

What actually happens in a lot of these cases is that the cpu throws something called an exception interrupt to the operating system. It does this for things like dereferencing a null pointer. The operating system then sends a signal to the running instance of your program(this is called a process), which then has to handle it or die. You can actually structure a C program around handling thrown exceptions rather than having to sanitize input.

Of course, most errors won't be from code trying to do any of these low-level things. Those are all topics to be covered later, anyway. When your program dies crashes, the reported errors are all symptoms of an initial programmatic error. What happens most of the time is that your code messes up a bit due to human error, some garbage value gets passed into some function or other, that function can't handle it, and your operating system tells you to get your act together.

Anyway, let's start with the most basic of errors for a newbie to C

Errors Involving the Usage of gcc

All the sample code will be in the same prelude C repository, in part2.

The most basic errors are missing function definitions, and missing definitions. This means your compiler couldn't resolve those symbols we talked about in part 1.

Here's the most basic example of forgetting a function definition.
//example for causing missing function definitions

int main(){
    printf("you won't see this until you fix this");
    this_function_was_never_defined();
}



And here's forgetting include the libraries. This code uses the ncurses library, but the compiler command doesn't link it.

#include <ncurses.h>

//definition-only = promising this function will exist
int draw_the_rest(); 

//basic ncurses program
int main(){
    initscr(); //initializes the screen
    draw_the_rest();
}





Resolving these means installing the appropriate library, and if you think you already installed it, you'll have to check your search path for headers and libraries. This stuff is how the compiler finds all your favorite <stdio.h>'s and their implementations.

We check our library search path with ld --verbose | grep SEARCH_DIR


and  our include search path with echo |  gcc -E -Wp,-v -


The compiler will also detect trivial errors related to types, but since C is very flexible about types, it'll rarely be a problem.

Once you resolve compiler errors, you'll encounter code errors.

A segfault is a "segmentation fault", which means attempting to access a memory segment the process does not own.

Here's an example of a program that causes a segmentation fault very close to 100% of the time.

#include <stdio.h>
#include <stdlib.hgt;
#include <time.hgt;

int main(){
    srand(time(NULL));  //NULL is basically a fancy alias for 0
    int * uninitialized_pointer = rand();
    printf("%s", uninitialized_pointer);
    return 0;
}







The operating system keeps track of the memory we've explicitly allocated via malloc(), calloc(), and the memory we've implicitly allocated on the stack via functions. In almost all cases, we can't write to anything else.

In the real world we won't have these obvious examples, so let's debug a program with a non-obvious problem. Here's a C program that uses the curses API to make balls bounce around the screen, but it has one key issue: the balls eventually fall off the screen!


#include <ncurses.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>

#define DELAY 30000
#define NUMBALLS 10
#define MAX_X_VEL 1.2
#define MAX_Y_VEL 1.2

//----UTILITY FUNCTIONS----
//returns within [0,1]
float rfloat(){
    return ((float)rand() / (float)RAND_MAX);
} //C doesn't have a built-in random float func

int roundfl(float in){
    return (in + 0.5);
}

//----MAIN SECTION STARTS HERE----
//
//our central type
typedef struct{
    float x_pos, y_pos;
    float x_vel, y_vel;
} ball_t;

//global object pool
ball_t* ball_pool[NUMBALLS] = {0};

//used to initialize
ball_t* spawn_random_ball(int max_x, int max_y){
    ball_t* raw_ball = malloc(sizeof(ball_t));

    raw_ball->x_pos = rfloat() * max_x;
    raw_ball->y_pos = rfloat() * max_y;
    raw_ball->x_vel = rfloat() * MAX_X_VEL;
    raw_ball->y_vel = rfloat() * MAX_Y_VEL;

    return raw_ball;
}

//----LOOP LOGIC----
void draw_balls(){
    int i, x_round=0, y_round=0;
    for (i=0; i<NUMBALLS;i++){
        x_round = roundfl(ball_pool[i]->x_pos);
        y_round = roundfl(ball_pool[i]->y_pos);
        mvprintw(y_round, x_round, "o");
    }
}

void check_collisions(float max_x, float max_y){
    //will I hit the wall?
    int i;
    for (i=0; i<NUMBALLS;i++){
        float next_x=0, next_y=0;
        next_x = ball_pool[i]->x_pos + ball_pool[i]->x_vel;
        next_y = ball_pool[i]->y_pos + ball_pool[i]->y_vel;

        //collide with the wall and add the remaining distance,
        //then move it a bit back for the update() step
        if (next_x < 0){
            ball_pool[i]->x_vel *= -1;
            ball_pool[i]->x_pos = next_x*-1.0 - ball_pool[i]->x_vel;
        }
        else if (next_x < max_x){
            ball_pool[i]->x_vel *= -1;
            ball_pool[i]->x_pos = max_x - (next_x - max_x) - ball_pool[i]->x_vel;
        }
        if (next_y < 0){
            ball_pool[i]->y_vel *= -1;
            ball_pool[i]->y_pos = next_y*-1.0 - ball_pool[i]->y_vel;
        }
        else if (next_x > max_x){
            ball_pool[i]->y_vel *= -1;
            ball_pool[i]->y_pos = max_y - (next_y - max_y) - ball_pool[i]->y_vel;
        }
    }
}

void step(){
    int i;
    for (i=0; i<NUMBALLS; i++){
        ball_pool[i]->x_pos += ball_pool[i]->x_vel;
        ball_pool[i]->y_pos += ball_pool[i]->y_vel;
    }
}

//----MAIN PROGRAM----
int main(int argc, char *argv[]) {
    //init random seed
    srand(time(NULL));

    //ncurses related stuff
    int max_y=0, max_x=0;
    initscr(); //initialize scren
    noecho();  //don't echo input
    curs_set(FALSE); //don't display cursor
    //get rows and columns, init global "standard screen" var
    getmaxyx(stdscr, max_y, max_x); 

    //create balls
    int i;
    for (i=0; i<NUMBALLS; i++){
        ball_pool[i] = spawn_random_ball(max_x,max_y);
    }
    //game loop
    while(1) {
        clear();
        draw_balls();
        refresh();
        usleep(DELAY);
        check_collisions(max_x, max_y);
        step();
    }

    endwin();
}


To compile this you need the ncurses development library. If you're on linux, just install the latest ncurses dev library and include -lncurses when compiling. On windows, you can use a public domain version called pdcurses. Download the pdc34dllw.zip file from the latest branch. The w at the end denotes the windows version. Then put the files in mingw's install folder as such: .h files to "include" folder, .lib to the "lib" folder, pdcurses.dll to the "bin" folder. Then add -lpdcurses to the end of your compilation command. This way gcc knows to link against its libraries.

Debugging Errors with GDB

In order to debug this code properly, we need to compile it with debugging enabled. We by adding the -g flag to our compilation command. This includes a lot of meta-information in the executable, such as a link to the sources, which gdb takes advantage of to make debugging much easier.

Go ahead, run gcc -g big_hunt_example.c, and then start it with gdb by running gdb a.out

So, a thing you'd want to do is to stop gdb at some point and print out the states of variables. We can dot his.Suspend a process in gdb with ctrl+z. gdb nicely handles the interrupt signal by suspending the child process.

You can start and restart the program at any time by typing run, then ctrl+z to suspend execution

suspending an ncurses app might look weird, but it's fine

Alright, we're in. But wait, where are we?

bt = backtrace
The the oldest trick in gdb's book is the stack trace. This related to the program stack that we learned about in part one. We type backtrace(aliased by bt) to list the functions that are currently on the stack, which also tells us what function we're currently in. We can select any frame and examine it, by typing frame n where n is that frame number. However, we won't be doing that right now.

l = list

You can list the code you're in by typing list, this is also aliased by just typing l. This also accepts a function or line number, so you can look code up directly and then decide you want to set a break point there.



"Wait, Break point?", you ask? Break points are just places in the code where the debugger will stop execution and let you examine local variables. That's right, let's reset our program by typing run again. This time let's break at main()

Since it's not detecting the wall, let's examine the collision detection function break check_collisions
Hit run again and...




Oh no, that's called every time we make a frame! Well, while we're here we can check out the local variables with info locals. You can check their values using print val_name Since we only care about it when one of our balls has gone out of bounds! Well, instead of having to keep stepping inside check_collisions for 10*(velocity of the fastest dude) number of times, we can set a conditional breakpoint.

break check_collisions if ball_pool[i]->y_pos > 120
run 

which will wait for all the variables in the statement to be in scope and for the statement to be true before evaluating, from here we can step through and see where our y, bound isn't being checked. Give it 5 minutes of reading the code in that section, and you should figure out the issue.

While we're here let's use print to execute random code

print max_x
print max_x=2
print max_x   // yep, we messed with a variable



At this point it's pretty obvious what the problem in our code is, so let's save a checkpoint so we don't have to rerun our program every time with the same watchpoint to get here. Just run checkpoint and afterwards gdb will tell you the checkpoint number so you can go back to it by typing restart checkpoint_number



Ah, that's the line! We just copied it from the x position check but forgot to change it. How silly!

That's about it for this tutorial, here are some features you should be aware of so you can look them up later.
 reverse-continue continue debugging but run in reverse
 gdb program core analyses core dump file created by program
 define gdb functions, useful to have some in your .gdbinit
 disassemble f to see the assembly of function f
 examine <address> lets you inspect memory or registers directly. Has a lot of options for interpreting memory as different types, and even instructions!

Its a lot to take in at once, and proficiency with gdb is built more with muscle memory more than with reading. However, it should all make sense within the context of C programming and its innards. Hopefully this will give you enough knowledge to debug your code effectively and efficiently. And by your code, I really mean mine, tyvm, gg no re

No comments :

Post a Comment