Setting up a GDT

We’re so close! We’re currently in long mode, but not ‘real’ long mode. We need to go from this ‘compatibility mode’ to honest-to-goodness long mode. To do this, we need to set up a ‘global descriptor table’.

This table, also known as a GDT, is kind of vestigial. The GDT is used for a style of memory handling called ‘segmentation’, which is in contrast to the paging model that we just set up. Even though we’re not using segmentation, however, we’re still required to have a valid GDT. Such is life.

So let’s set up a minimal GDT. Our GDT will have three entries:

  • a ‘zero entry’
  • a ‘code segment’
  • a ‘data segment’

If we were going to be using the GDT for real stuff, it could have a number of code and data segment entries. But we need at least one of each to have a minimum viable table, so let’s get to it!

The Zero entry

The first entry in the GDT is special: it needs to be a zero value. Add this to the bottom of boot.asm:

section .rodata
    dq 0

We have a new section: rodata. This stands for ‘read only data’, and since we’re not going to modify our GDT, having it be read-only is a good idea.

Next, we have a label: gdt64. We’ll use this label later, to tell the hardware where our GDT is located.

Finally, dq 0. This is ‘define quad-word’, in other words, a 64-bit value. Given that it’s a zero entry, it shouldn’t be too surprising that the value of this entry is zero!

That’s all there is to it.

Setting up a code segment

Next, we need a code segment. Add this below the dq 0:

.code: equ $ - gdt64
    dq (1<<44) | (1<<47) | (1<<41) | (1<<43) | (1<<53)

Let's talk about the dq line first. If you recall from the last section, 1<<44 means ‘left shift one 44 places’, which sets the 44th bit. But what about |? This means or. So, if we or a bunch of these values together, we’ll end up with a value that has the 44th, 47th, 41st, 43rd, and 53rd bit set.

Why | and not or, like before? Well, here, we’re not running assembly instructions: we’re defining some data. So there’s no instruction to execute, so the language used is a bit different.

Finally, why these bits? Well, as we’ve seen with other table entries, each bit has a meaning. Here’s a summary:

  • 44: ‘descriptor type’: This has to be 1 for code and data segments
  • 47: ‘present’: This is set to 1 if the entry is valid
  • 41: ‘read/write’: If this is a code segment, 1 means that it’s readable
  • 43: ‘executable’: Set to 1 for code segments
  • 53: ‘64-bit’: if this is a 64-bit GDT, this should be set

That’s all we need for a valid code segment!

Oh, but let's not forget about the other line:

.code: equ $ - gdt64

What's up with this? So, in a bit, we'll need to reference this entry somehow. But we don't reference the entry by its address, we reference it by an offset. If we needed just an address, we could use code:. But we can't, so we need more. Also, note that period at the start, it's .code:. This tells the assembler to scope this label under the last label that appeared, so we'll say gdt64.code rather than just code. Some nice encapsulation.

So that's what's up with the label, but we still have this equ $ - gdt64 bit. $ is the current position. So we're subtracting the address of gdt64 from the current position. Conveniently, that's the offset number we need for later: how far is this segment past the start of the GDT. The equ sets the address for the label; in other words, this line is saying "set the .code label's value to the current address minus the address of gdt64". Got it?

Setting up a data segment

Below the code segment, add this for a data segment:

.data: equ $ - gdt64
    dq (1<<44) | (1<<47) | (1<<41)

We need less bits set for a data segment. But they’re ones we covered before. The only difference is bit 41; for data segments, a 1 means that it’s writable.

We also use the same trick again with the labels, calculating the offset with equ.

Putting it all together

Here’s our whole GDT:

section .rodata
    dq 0
.code: equ $ - gdt64
    dq (1<<44) | (1<<47) | (1<<41) | (1<<43) | (1<<53)
.data: equ $ - gdt64
    dq (1<<44) | (1<<47) | (1<<41)

We’re so close! Now, to tell the hardware about our GDT. There’s a special assembly instruction for this: lgdt. But it doesn’t take the GDT itself; it takes a special structure: two bytes for the length, and eight bytes for the address. So we have to set that up.

Below these dqs, add this:

    dw .pointer - gdt64 - 1
    dq gdt64

To calculate the length, we take the value of this new label, pointer, and subtract the value of gdt64, and then subtract one more. We could calculate this length manually, but if we do it this way, if we add another GDT entry for some reason, it will automatically correct itself, which is nice.

The dq here has the address of our table. Straightforward.

Load the GDT

So! We’re finally ready to tell the hardware about our GDT. Add this line after all of the paging stuff we did in the last chapter:

    lgdt [gdt64.pointer]

We pass lgdt the value of our pointer label. lgdt stands for ‘load global descriptor table’. That’s it!

We have all of the prerequisites done! In the next section, we will complete our transition by jumping to long mode.