User:Zesterer/Bare Bones

From OSDev Wiki

Jump to: navigation, search

Difficulty level
Beginner

Kernel Designs
Models
Monolithic Kernel Microkernel Hybrid Kernel Exokernel Nano/Picokernel Cache Kernel Virtualizing Kernel Megalithic Kernel
Other Concepts
Modular Kernel Higher Half Kernel 64-bit Kernel

This is an introductory tutorial for kernel development on the x86 architecture. It is a minimal example only, and does not show you how to structure a serious project. However, completing this project is a good first step for aspiring kernel developers.

By the end of this tutorial, you will have the following:

A simple kernel written in C and x86 Assembly capable of displaying text on the screen
An ISO disk image containing your kernel that can be run from an emulator or on real hardware.
A rudimentary understanding of the x86 and x86 assembly.

What This Is Not

A make-your-own operating system tutorial. Developing a complete operating system requires years of work and, in most cases, decades of experience.
A one-size-fits-all introduction. There are many languages, architectures and approaches to operating system development. However, this tends to be a common route taken by beginners.
A proper and full kernel implementation. Real operating system kernels are vastly more complicated than the code shown here. This setup makes many assumptions about the system and ignored many things that will become important as you develop your kernel further.

Requirements

Before proceeding, make sure you have the basics ready.

Required Knowledge

It is important to make sure that you have enough basic knowledge. Kernel development has an extremely steep learning curve.

A sense of realism. You are not going to be making the next Linux or Windows any time soon. In fact, it'll take a lot of hard work to even get your system displaying text and responding to user input.
A basic understanding of the C programming language. You should be comfortable with functions, pointers, casting and flow control.
A basic understanding of computer architecture. You should know the difference between ROM and RAM, you should know what CPU registers are, and what machine code & executables are.
A basic understanding of computer logic and mathematics. You should understand hexadecimal & binary, as well as bitwise operators and the purpose of a stack.
Some basic command-line skills. You should understand how to run simple commands, navigate your filesystem and manipulate files.

If after reading this you feel like you have a few holes in your knowledge, I highly recommend reading Beginner Mistakes and related OS theory first.

Required Tools

You will also require some software tools to develop this project.

A UNIX-like operating system that supports operating system development well such as Linux or a UNIX-like environment (MinGW or Cygwin if you are using Windows).
A text editor of some sort, preferably with syntax highlighting (it will make your life easier). I recommend the cross-platform Atom editor.
A copy of QEMU for the i386 architecture. This is only necessary if you want to test your kernel without real hardware.

Some Background Knowledge

First, a little background information.

We will be developing our kernel for the x86 architecture. The x86 is a family of computer architectures first introduced in the 1970s. Most modern PCs are backwards-compatible with x86 however, so we should have no problem running our finished kernel on real hardware.

The x86 is a CISC (Complex Instruction Set Computer) architecture. This means that it has a large instruction set that we can manipulate in order to execute programs.

We will be compiling our kernel with our current operating system, with the intention of it running free-standing (i.e: independent of any other OS). For this reason, we will be using a cross-compiler to create our final executable. Cross-compilation is the first stage of writing a new operating system. After a huge amount of work, it is possible to make an operating system 'self-hosting'. This means that it is capable of compiling itself rather than relying on an existing OS. However, that is a long way down the road. For the foreseeable future, you will be cross-compiling your operating system.

We'll be writing the kernel in x86 assembly and the C programming language. When the x86 first loads up our kernel, it won't yet be in a fit state to run C code. This is why we must use assembly to first set up a basic C environment. Once this is done, we can write (most) of the rest of the kernel in C.

The x86 is a complex architecture with various different CPU states and modes. To avoid having to deal with them right now, we'll be using the GRUB bootloader to load our kernel into memory and set up a stable 32-bit 'protected-mode' environment.

To test our kernel, we'll be running it in QEMU. QEMU is an emulator that will allow us to test our kernel without rebooting real hardware to test every change we make.

Building A Cross-Compiler

Main article: GCC Cross-Compiler, Why do I need a Cross-Compiler?

The first thing we need to do is to build ourselves a cross-compiler toolchain to let us compile and link code for the x86. Until we have that, we can't do anything. A standard C compiler won't work: it doesn't understand how to generate the machine code we need for this project.

You can find instructions for building a GCC cross-compiler for the i686-elf target here: GCC Cross-Compiler

Don't attempt the rest of the tutorial without getting yourself the correct compiler. I can assure you: it won't work.

The Code

So we know the theory, and we have our tools ready. What now? Code? Well... before putting finger to keyboard, it's a good idea for us to take a step back and think about what we have to work with.

Freestanding - What Does This Mean?

I've already mentioned that we're running our code 'freestanding'. But what does this mean, and how will it affect us? It's important you know. Normally when you write C code in a hosted environment, you have a plethora or interfaces available to you. You can read from files, you can output messages, you can get user input... All with just a few lines of code. Sadly, we don't have that. Those things are provided by an operating system. And right now, we don't have one since we ARE the operating system.

What we DO have access to however is a few useful headers GCC automatically provides us with (they give us things like fixed-width integers) and the hardware of the x86. We'd like to output text, so for that we're going to write ourselves a very simple driver that interacts with the x86's VGA buffer and allows us to display text on the screen. In doing this, we'll have to avoid using anything like the C standard library, because it simply isn't available when we're compiling for a freestanding target like we are now.

So without further ado... Let's start programming!

The Project Structure

Please remember, this is a minimal setup. A more advanced kernel project will have a more complex code structure, as well as an automated build system.

For now, we'll be creating 3 files in our project directory. They are:

start.s     - This file will contain our x86 assembly code that starts our kernel and sets up the x86
kernel.c    - This file will contain the majority of our kernel, written in C
linker.ld   - This file will give the compiler information about how it should construct our kernel executable by linking the previous files together

start.s

First, we create start.s. To best understand the code, I recommend typing it out by hand such that you can understand each part in detail.

// We declare the 'kernel_main' label as being external to this file.
// That's because it's the name of the main C function in 'kernel.c'.
.extern kernel_main
// We declare the 'start' label as global (accessible from outside this file), since the linker will need to know where it is.
// In a bit, we'll actually take a look at the code that defines this label.
.global start
// Our bootloader, GRUB, needs to know some basic information about our kernel before it can boot it.
// We give GRUB this information using a standard known as 'Multiboot'.
// To define a valid 'Multiboot header' that will be recognised by GRUB, we need to hard code some
// constants into the executable. The following code calculates those constants.
.set MB_MAGIC, 0x1BADB002 // This is a 'magic' constant that GRUB will use to detect our kernel's location.
.set MB_FLAGS, (1 << 0) | (1 << 1) // This tells GRUB to 1: load modules on page boundaries and 2: provide a memory map (this is useful later in development)
// Finally, we calculate a checksum that includes all the previous values
.set MB_CHECKSUM, (0 - (MB_MAGIC + MB_FLAGS))
// We now start the section of the executable that will contain our Multiboot header
.section .multiboot
.align 4 // Make sure the following data is aligned on a multiple of 4 bytes
// Use the previously calculated constants in executable code
.long MB_MAGIC
.long MB_FLAGS
// Use the checksum we calculated earlier
.long MB_CHECKSUM
// This section contains data initialised to zeroes when the kernel is loaded
.section .bss
// Our C code will need a stack to run. Here, we allocate 4096 bytes (or 4 Kilobytes) for our stack.
// We can expand this later if we want a larger stack. For now, it will be perfectly adequate.
.align 16
stack_bottom:
.skip 4096 // Reserve a 4096-byte (4K) stack
stack_top:
// This section contains our actual assembly code to be run when our kernel loads
.section .text
// Here is the 'start' label we mentioned before. This is the first code that gets run in our kernel.
start:
// First thing's first: we want to set up an environment that's ready to run C code.
// C is very relaxed in its requirements: All we need to do is to set up the stack.
// Please note that on x86, the stack grows DOWNWARD. This is why we start at the top.
mov $stack_top, %esp // Set the stack pointer to the top of the stack
// Now we have a C-worthy (haha!) environment ready to run the rest of our kernel.
// At this point, we can call our main C function.
call kernel_main
// If, by some mysterious circumstances, the kernel's C code ever returns, all we want to do is to hang the CPU
hang:
cli // Disable CPU interrupts
hlt // Halt the CPU
jmp hang // If that didn't work, loop around and try again.

kernel.c

'kernel.c' contains our main kernel code. Specifically, it contains code for displaying text on the screen using the VGA textmode buffer.

// GCC provides these header files automatically
// They give us access to useful things like fixed-width types
#include <stddef.h>
#include <stdint.h>
 
// First, let's do some basic checks to make sure we are using our x86-elf cross-compiler correctly
#if defined(__linux__)
	#error "This code must be compiled with a cross-compiler"
#elif !defined(__i386__)
	#error "This code must be compiled with an x86-elf compiler"
#endif
 
// This is the x86's VGA textmode buffer. To display text, we write data to this memory location
volatile uint16_t* vga_buffer = (uint16_t*)0xB8000;
// By default, the VGA textmode buffer has a size of 80x25 characters
const int VGA_COLS = 80;
const int VGA_ROWS = 25;
 
// We start displaying text in the top-left of the screen (column = 0, row = 0)
int term_col = 0;
int term_row = 0;
uint8_t term_color = 0x0F; // Black background, White foreground
 
// This function initiates the terminal by clearing it
void term_init()
{
	// Clear the textmode buffer
	for (int col = 0; col < VGA_COLS; col ++)
	{
		for (int row = 0; row < VGA_ROWS; row ++)
		{
			// The VGA textmode buffer has size (VGA_COLS * VGA_ROWS).
			// Given this, we find an index into the buffer for our character
			const size_t index = (VGA_COLS * row) + col;
			// Entries in the VGA buffer take the binary form BBBBFFFFCCCCCCCC, where:
			// - B is the background color
			// - F is the foreground color
			// - C is the ASCII character
			vga_buffer[index] = ((uint16_t)term_color << 8) | ' '; // Set the character to blank (a space character)
		}
	}
}
 
// This function places a single character onto the screen
void term_putc(char c)
{
	// Remember - we don't want to display ALL characters!
	switch (c)
	{
	case '\n': // Newline characters should return the column to 0, and increment the row
		{
			term_col = 0;
			term_row ++;
			break;
		}
 
	default: // Normal characters just get displayed and then increment the column
		{
			const size_t index = (VGA_COLS * term_row) + term_col; // Like before, calculate the buffer index
			vga_buffer[index] = ((uint16_t)term_color << 8) | c;
			term_col ++;
			break;
		}
	}
 
	// What happens if we get past the last column? We need to reset the column to 0, and increment the row to get to a new line
	if (term_col >= VGA_COLS)
	{
		term_col = 0;
		term_row ++;
	}
 
	// What happens if we get past the last row? We need to reset both column and row to 0 in order to loop back to the top of the screen
	if (term_row >= VGA_ROWS)
	{
		term_col = 0;
		term_row = 0;
	}
}
 
// This function prints an entire string onto the screen
void term_print(const char* str)
{
	for (size_t i = 0; str[i] != '\0'; i ++) // Keep placing characters until we hit the null-terminating character ('\0')
		term_putc(str[i]);
}
 
 
 
// This is our kernel's main function
void kernel_main()
{
	// We're here! Let's initiate the terminal and display a message to show we got here.
 
	// Initiate terminal
	term_init();
 
	// Display some messages
	term_print("Hello, World!\n");
	term_print("Welcome to the kernel.\n");
}

linker.ld

This file is a link script. It is used to define how bits of the final kernel executable will be stitched together. It allows us to specify alignment, address offset, and other such useful properties of parts of the code that we need when our kernel gets linked together into an executable. You don't normally do this when you're writing C code - the compiler will do this for you. But this an operating system kernel, so we need to explicitly define a lot of these things.

/* The bootloader will start execution at the symbol designated as the entry point. In this case, that's 'start' (defined in start.s) */
ENTRY(start)
 
/* Tell the linker part of the compiler where the various sections of the kernel will be put in the final kernel executable. */
SECTIONS
{
	/* Begin putting sections at 1 Megabyte (1M), a good place for kernels to be loaded at by the bootloader. */
	/* This is because memory below 1 Megabyte is reserved for other x86-related things, so we can't use it */
	. = 1M;
 
	/* We align all sections in the executable at multiples of 4 Kilobytes (4K). This will become useful later in development when we add paging */
 
	/* First put the multiboot header, as it's required to be near the start of the executable otherwise the bootloader won't find it */
	/* The Multiboot header is Read-Only data, so we can put it in a '.rodata' section. */
	.rodata BLOCK(4K) : ALIGN(4K)
	{
		*(.multiboot)
	}
 
	/* Executable code */
	.text BLOCK(4K) : ALIGN(4K)
	{
		*(.text)
	}
 
	/* Read-only data. */
	.rodata BLOCK(4K) : ALIGN(4K)
	{
		*(.rodata)
	}
 
	/* Read-write data (initialized) */
	.data BLOCK(4K) : ALIGN(4K)
	{
		*(.data)
	}
 
	/* Read-write data (uninitialized) and stack */
	.bss BLOCK(4K) : ALIGN(4K)
	{
		*(COMMON)
		*(.bss)
	}
}

Compiling And Linking

Now comes the magic. We'll be compiling the code we've just written into object files (if you don't know what these are, you can think of them as flat-pack shelves from IKEA. They are blobs of compiled code packaged along with instructions explaining how they should be linked into a larger executable).

Once we've compiled the object files, we'll be using our linker (part of our cross-compiler toolchain) to link the object files together into the final kernel executable!

Compiling

To compile our code, we'll need to run the following commands:

i686-elf-gcc -std=gnu99 -ffreestanding -g -c start.s -o start.o
i686-elf-gcc -std=gnu99 -ffreestanding -g -c kernel.c -o kernel.o

This will create two object files named start.o and kernel.o ready for linking.

There are several parts to the above command:

-std=gnu99 tells the compiler to adhere to the C99 GNU standard. This gives us all of the abilities of C99, plus a bunch of useful extra things that the GNU developers added in for us.
-ffreestanding tells the compiler to generate free-standing code (i.e: does not rely on an existing operating system to run).
-g tells the compiler to add debugging symbols to the compiled code. As the kernel grows, it'll be increasingly useful to have a good way of debugging problems. It's best to start early.
-c tells the compiler to generate just object files rather than compiled and linked executables.

Linking

Finally, to link the objects together into the final executable, we'll run the following command:

i686-elf-gcc -ffreestanding -nostdlib -g -T linker.ld start.o kernel.o -o mykernel.elf -lgcc

This command also has several unexplained parts:

-nostdlib is used to specify that we aren't linking against a C standard library. This should be obvious, since we're running freestanding.
-T <link-script> is used to specify our linker script, 'linker.ld'.
-lgcc tells the linker to link against libgcc which the the built-in platform-independent library that gcc provides for us to deal with simple code that GCC can implicitly generate (i.e: moving and manipulating data, low-level maths operations, etc.)

Running The Kernel With QEMU

After all that, the linker command should have produced a file named mykernel.elf. This is our kernel image. Congratulations! I would tell you to give yourself a pat on the back, but it's wise to wait until we've seen it actually working before doing that.

Do you remember how I said that we'd need to use the GRUB bootloader? We will need to for real hardware. But luckily for us, QEMU has the ability to read Multiboot kernels built-in, so we don't need to go through the hassle of attaching our kernel to GRUB.

To run your kernel with QEMU, you can use the following command (replace mykernel.elf with the name of your kernel file)

qemu-system-i386 -kernel mykernel.elf

If everything went to plan, you should see something like the following appear:

If this isn't what you see, then I recommend going back through the code and checking you typed everything up correctly, ran the correct commands, and that you have a working cross-compiler. Please only ask for help on the forums or via IRC if you've double-checked your method.

Running The Kernel With GRUB And Real Hardware

To attach your kernel to GRUB and test it on real hardware, you'll have to first have GRUB installed. We'll be using GRUB2, so make sure you have the latest version on your system.

We'll need to start by creating a new directory tree for our GRUB ISO build. I call it 'isoroot', but you can choose whatever name you want. Inside that directory, you'll need to create a directory called 'boot' and a directory below that called 'grub'. Create a GRUB configuration file called grub.cfg in 'isoroot/boot/grub'. You'll want to put the following in it, and then adjust the contents according to your directory structure and kernel name.

grub.cfg

menuentry "My Kernel" {
	multiboot /boot/mykernel.elf
	boot
}

Once you have done this, copy your final kernel .elf executable that we linked previously into the 'isoroot/boot' directory.

Your final file structure should look roughly like this:

├── isoroot
│   └── boot
│       ├── grub
│       │   └── grub.cfg
│       └── mykernel.elf

This last command will take your GRUB directory, read it, and produce a final ISO at the end. Make sure you replace 'isoroot' with whatever your GRUB directory name is.

grub-mkrescue isodir -o mykernel.iso

Congratulations! If all went to plan, you've just produced your first ever working x86 kernel ISO disk image. This disk image will happily run on emulators like VMWare, QEMU, VirtualBox and Microsoft VirtualPC. If you want to run it on real hardware, simply flash it to a USB or disk drive like you would a Linux distribution LiveCD. For a good tutorial on how to do this, take a look at [Ubuntu: Burning ISO File Tutorial].

Note: The x86 VGA textmode device we rely on in this tutorial is dependent on the existence of the BIOS. This ISO can only be booted properly from a UEFI system if it is in a BIOS compatibility mode (often referred to as 'legacy mode').

What Now?

At this point, you may be wondering what else we can do. I'm sure you have grand visions of a working filesystem, a GUI (Graphical User Interface), or even your system running some of your favourite games. But now is not the time for grand visions: You would end up disappointed anyway. The path to writing a complete operating system is a long and confusing one that involves years of dedicated work. Keep your goals realistic, but feel free to play around with the code and test out new ideas.

With that in mind, here are a list of things you can probably achieve within at most a few days with a little work.

Add Color Support To Your Terminal

Color will improve the look of your kernel a lot. You'll need to find a sensible way of defining available colors in code (The VGA textmode buffer provides support for 16 colors in both the background and foreground of your text) and then changing them in the code. Color support will also be useful for displaying colorful error messages as your kernel grows in size (trust me: you'll spend a lot of time debugging).

Add ANSI Support To Your Terminal

ANSI (American National Standards Institute) has a standardised system of escape codes that modern computer terminals should understand. These escape codes allow a computer program to manipulate terminal color, text effects, clear the screen and move the text cursor around. You can find more details about ANSI here: [ANSI Escape Codes - Wikipedia].

Add Scrolling To Your Terminal

Scrolling is a common feature in CLIs (Command-Line Interfaces). When text runs off the bottom of the screen, can scroll the rest of the text upwards to fit a new line below rather than simply loop back to the top of the screen as we do now. To achieve this, you'll need to shift all the characters in the VGA buffer upwards by one row.

Create A Build System For Your Kernel

Main article: Meaty Skeleton

At the moment, we compile and link our kernel by manually running several commands. As your kernel grows, this will no longer be a realistic way of doing things. You'll need to design and implement a build system that can build your kernel for you. Common kernel build systems use 'make', 'CMake', 'autoconf', 'SCons' or 'Tup'.

Call Global Constructors

Main article: Calling Global Constructors

In C and particularly C++, the compiler will often generate code that is meant to be run when the program starts. Currently, we don't have any means to run such code. As our kernel grows, we'll need to add support for this.

Going Further On x86

Main article: Going Further on x86

There is a whole world of interesting things to explore through operating system development. To improve our kernel further, we will need to better understand the x86 and what hardware we have available.