Mixing Assembly and C
The purpose of this guide is to give the reader an understanding of how to begin using C and Assembly to create an operating system for a x86 computer.This guide is only designed to get you started with mixing C and Assembly to produce flat binary files. Before reading thru this guide you shoud already have an understanding of C or C++ and also be familiar with assembly language. If you're not familiar with these languages or just need to brush up check out Related Sites for some sites that can help. WARNING: If you are using any linux or possible other forms of unix please read the "Additional Information" section at the end of this page before trying any of the sample code. The code in this guide requires:
So you want to write an Operating System? Good luck was the sarcastic reply I recieved once after telling someone the exact same thing. It has been a long time just finding enough information to figure out where to even start. But I believe I've found the best starting point for most people, of course this begins with what the heck you're going to use to write the thing. It turns out that the best tools(my opion) for OS writing are freely downloadable from the web. For C/C++ we'll be using GCC or the DOS port called DJGPP and for assembly we'll be using NASM. One of the major pains of writing an OS is that you can't just write a program and have it be an OS. To even load a OS you typically have to write a bootsector followed by a bootstrap loader that sets up your protected mode enviroment and then loads your kernel. On top of this even if you normally program with the compiler and assembler that we are using there are special considerations you have to make when writing an OS. To make your life easier I'm going to be providing a couple small utilities so that you only have to learn one step at a time. The first utility you're going to be using is a kernel loading program that works under DOS. Since DOS is a real mode OS it's pretty easy to use it to read a kernel file from disk then use the loading program to startup protected mode. You can download the loader here. The download contains two files the first is the loader and the second is a sample kernel to make sure you have the loader working with your system. The loader needs to be run from DOS with no memory managers in memory. You can make sure that no memory managers are loaded by either booting from a DOS/WIN9x boot disk with no config.sys and autoexec.bat, or by pressing F8 during boot on WIN9x and choosing safe mode command prompt only from your boot menu. If you don't have a DOS boot disk or a WIN9x system you can download a free DOS clone called FREEDOS. To load the test kernel just start DOS, change to the directory that the kernel32.bin file is in and run the loader. The message "Hi" should appear in the bottom right corner of your screen and you should be returned back to a DOS promt. Hopefully everything worked ok on your system and we can move right along to writing your own kernel. GCC which stands for GNU's Compiler Collection is the C/C++ compiler that comes with almost every type of unix. If you've been doing any programming under any unix's I'm going to assume you atleast know how to use GCC for normal purposes. Luckily for all the people running WIN9x and NT there is also a port of GCC for download at www.delorie.com/djgpp. For the people that will be trying DJGPP for the first time, you should know that DJGPP is a command line compiler. This means that you write your code in any program that can produce text files and then compile from a DOS box. Once you have your compiler working there are a few things you need to know before we continue. First is that you can't include any header files that you didn't write yourself. Most header files have dependinces on the OS for which they were written for. For example, printf and cout, when a program is run that uses either one of these commands an OS service is called to display text. Because we are writting the OS the only portions of C/C++ that we can use are what we will call the core language. The core language includes only the reserved keywords and expressions that are avalible when no header files are included. If your wondering how the "Hi" was displayed with the sample kernel, that is where assembly language comes into play. Since we can't really display anything until the assembly portion of this guide, for now we're going to make a C kernel that does nothing. As easy as that sounds it's a pain in the rear to figure out for the first time. First create a text file that contains the following code: int main(void) { repeat: goto repeat; }Save as "kernel32.c". From the same location as the file enter the command gcc -ffreestanding -c -o kernel32.o kernel32.c If you recieved no errors then enter the command ld -Ttext 0x100000 --oformat binary -o kernel32.bin kernel32.o You should get a warning that says "warning: cannot find entry symbol start; defaulting to 00100000". The gcc command compiles our C source into an object file. The ld command links the object file into a binary file thats gets loaded at 0x100000 (the first meg, where the loader copies the kernel too).If you're not used to running ld you may think it's a little extra work but actually you always run it anytime you create an executable with GCC or DJGPP. ld is called by the compiler when a executable is created with only a gcc command. The reason we used both commands instead of just gcc is that we're going to be linking multiple object files together and you need to learn the correct command syntax. Here is a list of what each of the paramaters we'll be using means for gcc :
And here is a list of what each of the paramaters we'll be using means for ld :
For complete desriptions of all command options you can look at the online manuals for gcc and ld. Rename the sample kernel and copy the new kernel32.bin over to the same location as the old kernel32.bin. Try loading your new kernel just like the old one. When you run the loader your system should hang and never return to a prompt. If you used a floppy, the drive will stay spinning. Since the new kernel is running in a endless loop and ignoring everthing else(interupts are disabled) this is exactly what is supposed to happen. NASM which stands for Netwide Assembler is "a free portable assembler for the Intel 80x86 microprocessor". You can download NASM at kernel.org, the NASM website seems to be dead right now. If you're running a type of unix look in the pakage collection of your distribution for a copy of NASM. NASM, just like DJGPP, works via the command line. This means that you write your code in any program that can produce text files and then compile from a prompt. If you downloaded the Windows version you may want to rename "nasmw.exe" to nasm.exe". The first thing we'll be doing is to make an assembly code kernel that does the same thing as our C kernel, nothing. Create a text file that contains the following code: [BITS 32] repeat: jmp repeatSave as "kernel32.asm". From the same location as the file enter the command nasm -f coff -o kernel32.o kernel32.asm
If you recieved no errors then enter the command ld -Ttext 0x100000 --oformat binary -o kernel32.bin kernel32.o You should get a warning that says "warning: cannot find entry symbol start; defaulting to 00100000". Copy the new kernel32.bin over to the same location as the old kernel32.bin. Try loading your new kernel just like the old one. When you run the loader your system should hang just like the C kernel did. To make it easier to test our kernels it's going to be alot easier if we can just return to DOS after running them. After all having to hit reset after testing each kernel gets old very fast. To accomplish this you need to know a little about how the loader works. The loader reads "kernel32.bin" into memory and places it at the first megabyte of memory. Then the loader sets up all selectors to access the first four megabytes of memory and executes a far call to the first instruction at 0x100000. So to return to the loader from the kernel all we have to do is execute a far return. The loader then reenables interupts, frees any memory it used, and returns to DOS. Create a text file that contains the following code: [BITS 32] retfSave as "kernel32.asm". From the same location as the file enter the command nasm -f coff -o kernel32.o kernel32.asm
If you recieved no errors then enter the command ld -Ttext 0x100000 --oformat binary -o kernel32.bin kernel32.o You should get a warning that says "warning: cannot find entry symbol start; defaulting to 00100000". Try loading your new kernel just like the old one. When you run the loader you should be returned back to a DOS prompt. Be sure not to mess up the stack in your kernels, otherwise the far return won't work and anything could happen. Here is a list of what each of the paramaters we'll be using means for nasm :
To display a "Hi" message just like the sample kernel, make a kernel that contains the following code: [BITS 32] mov byte [es:0xb8f9c],'H' mov byte [es:0xb8f9e],'i' retfSince es points to a selector who's base addess is zero and the color text area starts at 0xb8000 the letters H and i are displayed near the end of a standard 80x25 text display. We'll discuss display adapters in a video article(hopefully), for now all you need to know is that to write a character to the display you just copy it's ASCII value to 0xb8000 to get it to show up in the upper left corner. To display a character in any other location just add 2 to 0xb8000 for evey place to the right. Text wraps down to the start of the next row when you reach the end of the column. Mixing C and Assembly (Most of the following text is taken directly from the nasm docs) External Symbol NamesMost 32-bit C compilers share the convention used by 16-bit compilers, that the names of all global symbols (functions or data) they define are formed by prefixing an underscore to the name as it appears in the C program. However, not all of them do: the ELF specification states that C symbols do not have a leading underscore on their assembly-language names. Function Definitions and Function CallsThe C calling convention in 32-bit programs is as follows. In the following description, the words caller and callee are used to denote the function doing the calling and the function which gets called.
Thus, you would define a function in C style in the following way:
global _myfunc _myfunc: push ebp mov ebp,esp sub esp,0x40 ; 64 bytes of local stack space mov ebx,[ebp+8] ; first parameter to function ; some more code leave ; mov esp,ebp / pop ebp ret At the other end of the process, to call a C function from your assembly code, you would do something like this:
extern _printf ; and then, further down... push dword [myint] ; one of my integer variables push dword mystring ; pointer into my data segment call _printf add esp,byte 8 ; `byte' saves space ; then those data items... segment _DATA myint dd 1234 mystring db 'This number -> %d <- should be 1234',10,0 This piece of code is the assembly equivalent of the C code
int myint = 1234; printf("This number -> %d <- should be 1234\n", myint); Accessing Data ItemsTo get at the contents of C variables, or to declare variables which C
can access, you need only declare the names as
extern _i mov eax,[_i] And to declare your own integer variable which C programs can access as
global _j _j dd 0 To access a C array, you need to know the size of the components of the
array. For example, To access a C data structure, you need to know the offset from the base
of the structure to the field you are interested in. You can either do this
by converting the C structure definition into a NASM structure definition
(using To do either of these, you should read your C compiler's manual to find
out how it organises data structures. NASM gives no special alignment to
structure members in its own
struct { char c; int i; } foo; might be eight bytes long rather than five, since the
Helper Macros for the 32-bit C InterfaceIf you find the underscores inconvenient, you can define macros to
replace the
%macro cglobal 1 global _%1 %define %1 _%1 %endmacro
%macro cextern 1 extern _%1 %define %1 _%1 %endmacro (These forms of the macros only take one argument at a time; a
If you then declare an external like this:
cextern printf then the macro will expand it as
extern _printf %define printf _printf Thereafter, you can reference The Included in the NASM archives, in the An example of an assembly function using the macro set is given here:
proc _proc32 %$i arg %$j arg mov eax,[ebp + %$i] mov ebx,[ebp + %$j] add eax,[ebx] endproc This defines Note that the
Our first mixed kernelCreate a text file that contains the following code: extern void sayhi(void); extern void quit(void); int main(void) { sayhi(); quit(); }Save as "mix_c.c". Create another text file that contains the following code: [BITS 32] GLOBAL _sayhi GLOBAL _quit SECTION .text _sayhi: mov byte [es:0xb8f9c],'H' mov byte [es:0xb8f9e],'i' ret _quit: mov esp,ebp pop ebp retfSave as "mix_asm.asm". From the same location as the files enter the commands gcc -ffreestanding -c -o mix_c.o mix_c.c nasm -f coff -o mix_asm.o mix_asm.asm
If you recieved no errors then enter the command ld -Ttext 0x100000 --oformat binary -o kernel32.bin mix_c.o mix_asm.o You should get a warning that says "warning: cannot find entry symbol start; defaulting to 00100000". Copy the new kernel32.bin over to the same location as the old kernel32.bin. Try loading your new kernel just like the old one. When you run the loader your system should display "Hi" in the bottom right corner of your screen and you should be returned to the prompt. When linking your object files your code will appear inside of your output file in the order of the input files. Also when using constants in your C code such as myfunc("Hello"); gcc based compilers will put your constants in the code segment before the beginning of the function in which it's declared.
When jumping or calling binary outputted C code you have three options to avoid this problem. You can create a function at the beginning your C code without constants thats calls or jumps to the next function.
You can link another file (assembly or C) before your C code that is just there to call your C code. And your last option is too use the gcc option -fwritable-strings Linux Warning: There is a problem with ld on Linux. The problem is that the "ld" that comes with linux distros lists support for the coff object format, but apparently you have to rebuilt binutils from gnu.org to get it working. I found two possible solutions. Recompile ld or edit your assembly files and remove all the leading underscores. Then when you assemble with nasm use the -f aout option instead of coff. I've tested the second method briefly and it works. The loader in this lesson makes a small GDT with selectors for the first 4 megabytes of memory and puts them in the segment registers before calling the kernel. It also leaves all interrupts disabled while the kernel runs. Don't try to enable int's in your kernel with this loader because a protected mode IDT is never setup. Different lessons will be using different loaders, so don't assume that you don't need to download the loader for whatever lesson you're on. If your want to take a look, the source for the loader is here. http://directory.google.com/Top/Computers/Programming/Languages/C/Tutorials/http://directory.google.com/Top/Computers/Programming/Languages/Assembly Fuzzy Logic DJGPP+NASM Tutorial example code(local mirror) |