Building a custom, third-party OS for Intel’s Xeon Phi (Part 2: The Design)

Last time I gave a brief overview of the history leading up to the Xeon Phi and some of the motivations for its creation. While most of this was based on research and first-hand experience, I’ll admit that some of it—particularly the pieces related to the transition from Larrabee to the Xeon Phi—involved a little speculation. However, having had a good chat at HPDC with someone who was, at the time, in a ranking position at Intel, I can say that those speculations were accurate.

In order to elucidate some of the issues that one may run into when building an OS for the Phi, I think it’s worth spending at least a small amount of time discussing the Phi’s hardware design. That’s what I’ll be doing in this post. If you’re already familiar with the basics and have perused the relevant manuals, then you might want to skip ahead to the next post (when it’s written).

Given my focus on OS issues, I’m not going to go into too many details, especially for the microarchitecture. I’d instead like to focus on the hardware design from an OS developer’s perspective. Fortunately, if you need a primer on the hardware, there is plenty of material out there.

Here’s a good brief overview from Intel and here’s a detailed document on the core microarchitecture. This ebook is also a pretty good resource detailing the architecture from an app developer’s viewpoint.

Platform Hardware

A good deal of the platform hardware that you’re used to dealing with on a typical PC are not included in the Phi. Here are some of the more important differences you should be aware of:

  • no Northbridge/Southbridge
  • no 8259a PIC
  • no 8254 PIT (programmable interval timer)
  • no HPET (high-precision timer)
  • no framebuffer

There is, however a IOAPIC, programmable in memory. This IOAPIC does not hook up to the standard legacy ISA or PCI interrupts, but rather receives interrupts routed from the SBOX, which is Intel lingo for their PCIe interface on the card. From the perspective of an OS developer, you can think of the SBOX like a Northbridge, in that it’s the Phi’s arbiter to the rest of the system.

The lack of support for these devices isn’t really a big deal. You still get an APIC timer, so all is not lost. The interrupt logic is all controlled through the APICs (below) and the IOAPIC. There’s another timer on the card called the Elapsed Time Counter (ETC) which is slow, but accurate across frequency domains.


Just like any modern AMD or Intel, you’ll find that each core on the Xeon Phi has an advanced programmable interrupt controller (APIC). There is very little difference here (it’s even mapped into the same address range at 0xfec00000), but in order to accommodate the large core count, Intel has modified some of the hardware registers in the APIC.  They’ve done this to expand the APIC ID from 8 bits to 16 bits. The registers you should care about are the APIC ID register, the Logical Destination Register (LDR), and the Interrupt Command Register (ICR).

APIC ID Register
APIC ID Register


Logical Destination Register (LDR)
Logical Destination Register (LDR)
High-order 32 bits in the Interrupt Command Register (ICR)
High-order 32 bits in the Interrupt Command Register (ICR)

What this means for you if you’re writing an OS is likely some changes in your macros in your APIC-specific code. You should start from the xAPIC, not the x2APIC.

Instruction Support

Below is a list of instructions that aren’t supported on the Phi. You can see more detail in the developer’s guide, Section 4.2.

  • monitor/mwait
  • mfence/sfence/lfence
  • cmov
  • in/out
  • sysenter/sysexit
  • pause
  • prefetch
  • VMX related instructions

Note that the AVX instructions have been expanded for 512-bit vector lanes. The maximum supported CPUID leaf on the Phi is leaf 4.

OS interface

Probably the biggest part of working with the Phi—from a system software point of view—is reading from and writing to the SBOX MMIO registers. These registers implement a good deal of functionality for the card, functionality that, on a typical chip, you would find implemented in MSRs.

Some of the more important ones are:

  • bootstrap
  • DVFS (both core and memory)
  • the card’s IOAPIC
  • thermal control
  • machine check exceptions
  • card status (fans, temperature, etc)
  • Elapsed Time Counter (ETC)
  • flash programming
  • various scratch registers for software
  • interrupt logic
  • PCIe control
  • System Memory Page Tables (SMPT) — for accessing host memory from the card

I’ll come back to some of these in later posts, where I will dive deeper into OS implementation for the Phi. Next time, I’ll talk a bit about the existing OSes that run on the Phi and how we used them as a starting point.


One thought on “Building a custom, third-party OS for Intel’s Xeon Phi (Part 2: The Design)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s