Run Cores

During execution, the runcore is like the heart of Parrot. The runcore controls calling the various opcodes with the correct data, and making sure that program flow moves properly. Some runcores, such as the precomputed C goto runcore are optimized for speed and don't perform many tasks beyond finding and dispatching opcodes. Other runcores, such as the GC-Debug, debug and profiling runcores help with typical software maintenance and analysis tasks. We'll talk about all of these throughout the chapter.

Runcores must pass execution to each opcode in the incoming bytecode stream. This is called dispatching the opcodes. Because the different runcores are structured in different ways, the opcodes themselves must be formated differently. The opcode compiler compiles opcodes into a number of separate formats, depending on what runcores are included in the compiled Parrot. Because of this, understanding opcodes first requires an understanding of the Parrot runcores.

Parrot has multiple runcores. Some are useful for particular maintenance tasks, some are only available as optimizations in certain compilers, some are intended for general use, and some are just interesing flights of fancy with no practical benefits. Here we list the various runcores, their uses, and their benefits.

Slow Core

The slow core is a basic runcore design that treats each opcode as a separate function at the C level. Each function is called, and returns the address of the next opcode to be called by the core. The slow core performs bounds checking to ensure that the next opcode to be called is properly in bounds, and not somewhere random in memory. Because of this modular approach where opcodes are treated as separate executable entities many other runcores, especially diagnostic and maintenance cores are based on this design. The program counter pc is the current index into the bytecode stream. Here is a pseudocode representation for how the slow core works:

  while(1) {
      pc = NEXT_OPCODE;
      if(pc < LOW_BOUND || pc > HIGH_BOUND)
          throw exception;
      DISPATCH_OPCODE(pc);
      UPDATE_INTERPRETER();
  }

Fast Core

The fast core is a bare-bones core that doesn't do any of the bounds-checking or context updating that the slow core does. The fast core is the way Parrot should run, and is used to find and debug places where execution strays outside of its normal bounds. In pseudocode, the fast core is very much like the slow core except it doesn't do the bounds checking between each instruction, and doesn't update the interpreter's current context for each dispatch.

  while(1) {
      pc = NEXT_OPCODE;
      DISPATCH_OPCODE(pc);
  }

Switch Core

As its name implies, the switch core uses a gigantic C switch / case structure to execute opcodes. Here's a brief example of how this architecture works:

  for( ; ; current_opcode++) {
      switch(*current_opcode) {
          case opcode_1:
              ...
          case opcode_2:
              ...
          case opcode_3:
              ...
      }
  }

This is quite a fast architecture for dispatching opcodes because it all happens within a single function. The only operations performed between opcodes is a jump back to the top of the loop, incrementing the opcode pointer, dereferencing the opcode pointer, and then a jump to the case statement for the next opcode.

Computed Goto Core

Computed Goto is a feature of some C compilers where a label is treated as a piece of data that can be stored as a void * pointer. Each opcode becomes simply a label in a very large function, and pointers to the labels are stored in a large array. Calling an opcode is as easy as taking that opcode's number as the index of the label array, and calling the associated label. Sound complicated? It is a little, especially to C programmers who are not used to using labels, much less treating them as first class data items.

Notice that computed goto is a feature only available in some compilers such as GCC, and will not be available in every distribution of Parrot, depending what compilers were used to build it.

As was mentioned earlier, not all compilers support computed goto, which means that this core will not be built on platforms that don't support it. However, it's still an interesting topic to study so we will look at it briefly here. For compilers that support it, computed goto labels are void ** values. In the computed goto core, all the labels represent different opcodes, so they are stored in an array:

  void *my_labels[] = {
      &&label1,
      &&label2,
      &&label3
  };

  label1:
      ...
  label2:
      ...
  label3:
      ...

Jumping to one of these labels is done with a command like this:

  goto *my_labels[opcode_number];

Actually, opcodes are pointed to by an opcode_t * pointer, and all opcodes are stored sequentially in memory, so the actual jump in the computed goto core must increment the pointer and then jump to the new version. In C it looks something like this:

  goto *my_labels[*(current_opcode += 1)];

Each opcode is an index into the array of labels, and at the end of each opcode an instruction like this is performed to move to the next opcode in series, or else some kind of control flow occurs that moves it to a non-sequential location:

  goto *my_lables[*(current_opcode = destination)];

These are simplifications on what really happens in this core, because the actual code has been optimized quite a bit from what has been presented here. However, as we shall see with the precomputed goto core, it isn't optimized as aggressively as is possible.

Precomputed Goto Core

The precomputed goto core is an amazingly fast optimized core that uses the same computed goto feature, but performs the array dereferencing before the core even starts. The compiled bytecode is fed into a preprocessor that converts the bytecode instruction numbers into lable pointer values. In the computed goto core, you have this operation to move to the next opcode:

  goto *my_labels[*(current_opcode += 1)];

This single line of code is deceptively complex. A number of machine code operations must be performed to complete this step: The value of current_opcode must be incremented to the next value, that value must be dereferenced to find the opcode value. In C, arrays are pointers, so my_labels gets dereferenced and an offset is taken from it to find the stored label reference. That label reference is then dereferenced, and the jump is performed.

That's a lot of steps to execute before we can jump to the next opcode. What if each opcode value was replaced with the value of the jump label beforehand? If current_opcode points to a label pointer directly, we don't need to perform an additional dereference on the array at all. We can replace that entire mess above with this line:

  goto **(current_opcode += 1);

That's far fewer machine instructions to execute before we can move to the next opcode, which means faster throughput. Remember that whatever dispatch mechanism is used will be called after every single opcode, and some large programs may have millions of opcodes! Every single machine instruction that can be cut out of the dispatch mechanism could increase the execution speed of Parrot in a significant and noticable way. The dispatch mechanism used by the various runcores is hardly the largest performance bottleneck in Parrot anyway, but we like to use faster cores to shave every little bit of speed out of the system.

The caveat of course is that the predereferenced computed goto core is only available with compilers that support computed goto, such as GCC. Parrot will not have access to this core if it is built with a different compiler.

Tracing Core

Profiling Core

The profiling core analyzes the performance of Parrot, and helps to determine where bottlenecks and trouble spots are in the programs that run on top of Parrot. When Parrot calls a PIR subroutine it sets up the environment, allocates storage for the passed parameters and the return values, passes the parameters, and calls a new runcore to execute it. To calculate the amount of time that each subroutine takes, we need to measure the amount of time spent in each runcore from the time the core begins to the time the core executes. The profiling core does exactly this, acting very similarly to a slow core but also measuring the amount of time it takes for the core to complete. The tracing core actually keeps track of a few additional values, including the number of GC cycles run while in the subroutine, the number of each opcode called and the number of calls to each subroutine made. All this information is helpfully printed to the STDERR output for later analysis.

GC Debug Core

Parrot's garbage collector has been known as a weakness in the system for several years. In fact, the garbage collector and memory management subsystem was one of the last systems to be improved and rewritten before the release of version 1.0. It's not that garbage collection isn't important, but instead that it was so hard to do earlier in the project.

Early on when the GC was such a weakness, and later when the GC was under active development, it was useful to have an operational mode that would really exercise the GC and find bugs that otherwise could hide by sheer chance. The GC debug runcore was this tool. The core executes a complete collection iteration between every single opcode. The throughput performance is terrible, but that's not the point: it's almost guaranteed to find problems in the memory system if they exist.

Debug Core

The debug core works like a normal software debugger, such as GDB. The debug core executes each opcode, and then prompts the user to enter a command. These commands can be used to continue execution, step to the next opcode, or examine and manipulate data from the executing program.

Functions

opcode_t *runops_fast_core
Runs the Parrot operations starting at pc until there are no more operations. This performs no bounds checking, profiling, or tracing.
opcode_t *runops_cgoto_core
Runs the Parrot operations starting at pc until there are no more operations, using the computed goto core, performing no bounds checking, profiling, or tracing.If computed goto is not available then Parrot exits with exit code 1.
static opcode_t *runops_trace_core
Runs the Parrot operations starting at pc until there are no more operations, using the tracing interpreter.
opcode_t *runops_slow_core
Runs the Parrot operations starting at pc until there are no more operations, with tracing and bounds checking enabled.
opcode_t *runops_gc_debug_core
Runs the Parrot operations starting at pc until there are no more operations, performing a full GC run before each op. This is very slow, but it's also a very quick way to find GC problems.
opcode_t *runops_profile_core
Runs the Parrot operations starting at pc until there are no more operations, with tracing, bounds checking, and profiling enabled.
opcode_t *runops_debugger_core
Used by the debugger, under construction

*/

/* * Local variables: * c-file-style: "parrot" * End: * vim: expandtab shiftwidth=4: */

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 174:
Deleting unknown formatting code N<>