Monday, January 23, 2012

How do hardware interrupts work?

Recently, a friend asked me why we needed device drivers in kernel mode, since the BIOS already had code to deal with some situation (write to a device (int 0x13), deal with VGA (int 0x10)...). Will the question seems simple, it got me thinking a bit.
The first answer which is quite instinctive is performance. But what is the technical explanation? I'm not an expert so what follows might be subject to caution, but this is more or less how I understand it.

On boot up, the processor is in real mode. This means that the IDT (in charge of mapping interrupts to code addresses, a bit like a hash-map) is located in the BIOS address space. Thus the IDT is within the BIOS address space (from 0x0000 to 0x03ff). All constructors have to write the code to deal with the interrupts, and this enable the BIOS to perform some basic tasks (reading the first 512 bytes from a disk to find the MBR for example).
So why can't we leverage these interrupts in protected mode? Well in protected mode, the kernel owns the base IDT address. It puts the base address of the IDT in the IDTR register of the processor (use ASM SIDT instruction on i386)
This means that it controls the mapping from interrupts to code. This also means that the kernel has a way to optimize  and control hardware access. This way it can buffer access to the physical devices to restrain the performance bottleneck for example. It can also implement file systems abstractions (VFS) on top of the basic block devices. But why doesn't it switch back to the int 0x13 BIOS interrupt to have access to the drive?
Well for a few main reasons I think:
  1. It is simply not possible. In real-mode, address space is of 16 bits. You cannot map a higher address space to a lower one. There is no turning back.
  2. Even if 1. was possible, imagine the context switches which would be required (they are expensive). When an application in user-space would want to write to disk (set aside buffering mechanisms), it would have to call fwrite, which in turn would trigger a write system call. This in turn triggers a context change, the arguments in the user land stack are validated and copied on the kernel stack, and kernel deals with the system call. If using BIOS interrupts, it would then have to switch to real mode and issue an int 0x13 interrupt which would be really slow, and only be possible to copy a few bytes of data. This would be a huge bottleneck in disk access.
That's my 2 cents (and I guess naive 2 cents) on why we can't just switch to real mode to deal with disk or VGA access.