Parrot Subroutines


This document describes how to define, call, and return from Parrot subroutine objects and other invokables.


Parrot comes with different subroutine and related classes which implement CPS (Continuation Passing Style) and PCC (Parrot Calling Conventions) docs/pdds/pdd03_calling_conventions.pod.

Class Tree

These are all of the built-in classes that are directly callable, or "invokable":


By "invokable" we mean that they can be supplied as the first argument to the invoke, invokecc, or tailcall instructions. Generally speaking, invokable objects are divided into two subtypes: Sub and classes that are built on it create a new context when invoked, and Continuation classes return control to an existing context that was captured when the Continuation was created.

There are (of course) two classes that straddle this distinction:

  1. Invoking a Closure object creates a new context for the sub it refers to directly, but it also captures an "outer" context that provides bindings for the immediately-enclosing lexical scope (and, if that context is itself is for a Closure, the subsequent scopes working outwards).
  2. [add a newclosure example? -- rgr, 6-Apr-08.]

  3. A Coroutine acts like a normal sub when called initially, and can also return normally, but acts like a continuation when exited via the yield instruction and re-entered by re-invoking. But when all yield states are exhausted the coroutine dies. How can however reset a Coroutine.
  4.     .sub 'mycoro'
        .sub 'main' :main
            $I0 = mycoro()
            say $I0
            $I0 = mycoro()
            say $I0
            $I0 = mycoro()
            say $I0
        # =>
        # 1
        # 2
        # 4
        # Cannot resume dead coroutine.


Creating subs

Subs are created by IMCC (the PIR compiler) via the .sub directive. Unless the :anon pragma is included, they are stored in the constant table associated with the bytecode and can be fetched with the get_hll_global and get_root_global opcodes. Within the PIR source, they can also be put in registers with a .const 'Sub' declaration:

    .const 'Sub' rsub = 'random_sub'

This uses find_sub_not_null under the hood to look up the sub named "random_sub".

Here's an example of fetching a sub from another namespace:

    .sub main :main
        get_hll_global $P0, ['Other'; 'Namespace'], "the_sub"
        print "back\n"

    .namespace ['Other'; 'Namespace']

    .sub the_sub
        print "in sub\n"

Note that the_sub could be defined in a different bytecode or PIR source file from main.

Program entry point

One subroutine in the first executed source or bytecode file may be flagged as the "main" subroutine, where execution starts.

    .sub the_main_event :main
       # ...

In the absence of a :main entry Parrot starts execution at the first statement. Any :main directives in a subsequent PIR or bytecode file that are loaded under program control are ignored.

Note that if the first executed source or bytecode file contains more than one sub flagged as :main, Parrot currently picks the last such sub to start execution. This is arguably a bug, so users should not depend upon it.

Load-time initialization

If a subroutine is marked as :load this subroutine is run, before the load_bytecode opcode returns.


    .sub main :main
       print "in main\n"
       load_bytecode "library_code.pir"
       print "back to main\n"

    # library_code.pir

    .sub _my_lib_init :load
       print "initializing library\n"

If a subroutine is marked as :init this subroutine is run before the :main or the first subroutine in the source file runs. Unlike :main subs, :init subs are also run when compiling from memory. :load subs are run only in any source or bytecode files loaded subsequently.

These markers are called "pragmas", and are defined fully in "pdds/pdd19_pir.pod" in docs. The following table summarizes the behavior of the five pragmas that cause Parrot to run a sub implicitly:

                  ------ Executed when --------
                  compiling to    -- loading --
    Sub Pragma    disk  memory    first   after
    ==========    ====  ======    =====   =====
     :immediate   yes   yes       no      no
     :postcomp    yes   no        no      no
     :load        no    no        no      yes
     :init        no    yes       yes     no
     :main        no    no        yes     no

The same load-time behavior applies regardless of whether the loaded file is PIR source or bytecode. Note that it is possible to mark a sub with both :load and :init.

Defining subs

A sub is defined by a block of code starting with .sub and ending with .end. Parameters which the sub can be called with are defined by .param:

    .sub do_something
      .param pmc a_pmc
      .param string some_string
      #do something

The set of .param instructions are converted to a single get_params instruction. The compiler will decide which registers to use.

    get_params '(0,0)', $P0, $S0

A parameter can be declared optional with the :optional command. If an optional parameter is followed by parameter declared :opt_flag, this parameter will store an integer indicating whether the optional parameter was used.

    .param string maybe :optional
    .param int has_maybe :opt_flag
    unless has_maybe goto no_maybe
    #do something with maybe
    #don't use maybe

A sub can accept an arbitrary number of parameters by declaring a :slurpy parameter. This creates a pmc containing an array of all parameters passed to the sub, these can be accessed like so:

    .param pmc all_params :slurpy

    $P0 = all_params[0]
    $S0 = all_params[1]

A slurpy parameter can also be defined after a set of positional parameters, in which case it will only hold any additional parameters passed.

A parameter may also be declared :named, giving them a string which can be used when calling the sub to explicitly assign a parameter, ignoring position.

    .param int counter :named("counter")

This can be combined with :optional as well as :opt_flag, so that the parameter need only be passed when necessary.

If a parameter is declared with :slurpy and :named (with no string), it creates an associative array containing all named parameters which can be accessed like so:

    .param pmc all_params :slurpy :named
    $S0 = all_params['name']
    $I0 = all_params['counter']

Calling the sub

PIR sub invocation syntax is similar to HLL syntax:

    $P0 = do_something($P1, $S3)

This is syntactic sugar for the following four bytecode instructions:

    # Establish arguments.
    set_args '(0,0)', $P1, $S3
    # Find the sub.
    $P8 = find_sub_not_null "do_something"
    # Establish return values.
    get_results '(0)', $P0
    # Call the sub in $P8, implicitly creating a return continuation.
    invokecc $P8

The sub name could be replaced with a PMC register, in which case the find_sub_not_null instruction would not be needed. If the return values from the sub were ignored (by dropping the $P0 = part), the get_results instruction would be omitted. However, set_args is emitted even in the case of a call without arguments.

The first operands to the set_args and get_results instructions are actually placeholders for an integer array that describes the register types. For example, the '(0,0)' for set_args is replaced internally with [2, 1], which means "two arguments, of type PMC and string". Note that return values get the same register type coercion as sub parameters. This is all described in much more detail in "pdds/pdd03_calling_conventions.pod" in docs.

Named parameters can be explicity called in one of two ways:

    $P5 = do_something($I6 :named("counter"), $S4 :named("name"))
    #or equivalently
    $P5 = do_something("counter" => $I6, "name" => $S4)

To receive multiple values, put the register names in parentheses:

    ($P10, $P11) = do_something($P1, $S3)

    ($P10, $P11) = do_something($P1, $S3)

To test whether a value was returned, declare it :optional, and follow it with an integer register declared :opt_val:

    ($P10 :optional, $I10 :opt_val) = do_something($P1, $S3)

A :slurpy value can be declared, as in parameter declarations, to catch an arbitrary number of return values:

    ($P12, $P13 :slurpy) = do_something($P1, $S3)

Note that the parameters stored in a :slurpy, or :slurpy :named array can be used as parameters for another call using the :flat declaration:

    ($P14, $P15) = do_something($P13 :flat)

Subs may also return :named values, which can be explicitly accessed similar to parameter declarations:

    ($I11 :named("counter"), $S4 :named("name")) = do_something($P1, $S3)

All of these affect only the signature provided via get_results.

[not sure what this is for, leaving it alone for now -aninhumer]

    # Call the sub in $P8, with continuation (created earlier) in $P9.
    invoke $P8, $P9

Returning from a sub

PIR supports a convenient syntax for returning any number of values from a sub or closure:

    .sub main 
      .return ($P0, $I1, $S3)

Integer, float, and string constants are also accepted. This is translated to:

    set_returns '(0,0,0)', $P0, $I1, $S3
    returncc	# return by calling the current continuation

As for set_args, the '(0,0,0)' is actually a placeholder for an integer array that describes the register types; it is replaced internally with [2, 0, 1], which means "three arguments, of type PMC, integer, and string".

All of the declarations allowed for calls to a sub can also be used with return values. (:named, :flat)

Another way to return from a sub is to use tail-calling, which calls a new sub with the current continuation, so that the new sub returns directly to the caller of the old sub (i.e. without first returning to the old sub). This passes the three values to another_sub via tail-calling:

    .sub main
      .tailcall another_sub($P0, $I1, $S3)

This is translated into a set_args instruction for the call, but with tailcall instead of invokecc:

    set_args '(0,0,0)', $P0, $I1, $S3
    $P8 = find_sub_not_null "another_sub"
    tailcall $P8

As for calling, the sub name could be replaced with a PMC register, in which case the find_sub_not_null instruction would not be needed.

If needed, the current continuation can be extracted and called explicitly as follows:

    ## This is what defines .INTERPINFO_CURRENT_CONT.
    .include 'interpinfo.pasm'
    ## Store our return continuation as exit_cont.
    .local pmc exit_cont
    exit_cont = interpinfo .INTERPINFO_CURRENT_CONT
    ## Invoke it explicitly:
    invokecc exit_cont
    ## ... or equivalently:
    tailcall exit_cont

To return values, use set_args as before.

All together now

The following complete example illustrates the typical call/return pattern:

    .sub main :main
	print "in main\n"
	print "back to main\n"

    .sub the_sub
	print "in sub\n"

Notice that we are not passing or returning values here.

[example of passing values. this could get pretty elaborate; look for other examples first. -- rgr, 6-Apr-08.]

If a short subroutine is called several times, for instance inside a loop, the creation of the return continuation can be done outside the loop:

    .sub main :main
	    ## Initialize the sub and the return cont.
	    .local pmc cont
	    cont = new 'Continuation'
	    set_addr cont, ret_label
	    .const .Sub rsub = 'random_sub'
	    ## Loop initialization.
	    .local int loop_max, i
	    loop_max = 1000000
	    i = 0

	    ## Main loop.
	    set_args '(0)', i
	    invoke rsub, cont
	    ## This is where "cont" returns.
	    inc i
	    if i < loop_max goto again

    .sub random_sub
	    .param int foo
	    ## do_something

If the sub returns values, the get_results must be after ret_label in order to receive them.

Since this is much more obscure than the PIR calling syntax, it should only be done if there is a measurable performance advantage. Even in this trivial example, calling "rsub(i)" is only about a third slower on x86.


src/pmc/sub.pmc, src/pmc/closure.pmc, src/pmc/continuation.pmc, src/pmc/coroutine.pmc, src/sub.c, t/pmc/sub.t


docs/pdds/pdd03_calling_conventions.pod docs/pdds/pdd19_pir.pod


Leopold Toetsch <>