PDD 25: Concurrency

Abstract

This document defines the requirements and implementation strategy for Parrot's concurrency models.

Description

- Parrot supports multiple concurrency models, including green threads, native Windows and POSIX threads, event-based programming, and asynchronous I/O.
- A concurrency scheduler manages all concurrent tasks. A certain number of tasks are initialized at startup, a "thread-pool", so that task switching scales linearly up to the number of physical CPU's.
- Each interpreter has its own concurrency scheduler.
- Concurrency schedulers for different interpreters communicate and can share tasks.
- A concurrency scheduler may link to other schedulers as a parent, a child, or an equal.
- A task is a concurrent unit of work.
- All tasks support a standard interface used by the concurrency scheduler, but otherwise have a great deal of flexibility in their implementation.
- Tasks can share PMC variables.

Concurrency is a parallel execution of units of code (on multiprocessor machines), or a flexible ordering of serial units of code (on single processor machines). For certain problem spaces, concurrency offers significant speed gains by parceling out processor-intensive activity or by ensuring that a wait for input or system resources doesn't hold up the entire application.

A task is a unit of code that can be executed concurrently.

Implementation

Rather than defining a single canonical threading model, Parrot defines an infrastructure that supports multiple concurrency models and provides for interaction between the various models. Parrot already uses multiple concurrency models for events, threads, async I/O, and exceptions, a trend that will only continue as we support multiple HLLs and external threading libraries. Designing for multiple concurrency models also gives Parrot more room to grow as future models are researched and developed.

To avoid conflicts between concurrency models, Parrot provides a single central concurrency scheduler for each interpreter instance. Each concurrency model defines a Task PMC that supports a standard minimal interface. The scheduler can interact with tasks from different models without direct access to the details of each model.

On multiprocessor systems, the scheduler is responsible for allocating tasks to processors by delegating that allocation to native OS threads.

For the most part, when we talk about concurrency, we mean concurrency across an interpreter pool. An interpreter pool is a set of interpreter instances that share common resources: the memory pools, arenas, and global namespace--pretty much everything except what's in the interpreter structure itself. They're essentially threads in the OS sense and implemented as OS threads.

Another form of concurrency is between completely independent interpreter instances, each with their own memory pools, arenas, namespaces, etc. Independent interpreters may run as separate processes on the same machine, or even as separate processes on different machines (in a clustering environment, for example). The concerns of shared-interpreter concurrency and independent-interpreter concurrency are similar, and in Parrot both use the same central concurrency scheduler. This PDD doesn't directly address independent-interpreter concurrency, but does include occasional notes on how it integrates with shared-interpreter concurrency.

Supported Concurrency Models

The following are a few of the concurrency models Parrot intends to support. The biggest differences between them are in how they handle variables shared across concurrent tasks. But the design is such that each of the different models can run simultaneously, coordinated through the central concurrency scheduler.

Mutex/Lock Concurrency

In this model, before performing an operation on a shared variable, you must first acquire a lock on it. Once a lock is acquired, any other concurrent tasks that attempt to acquire a lock on that variable will block until the existing lock is released.

A mutex is a thing that can be locked. They are not directly exposed to users. They're non-recursive, non-read/write, exclusive things. When a concurrent task gets a mutex, any other attempt to get that mutex will block until the owning task releases the mutex. Mutexes are implemented using the platform-native lock construct.

The first thing that any vtable function of a shared PMC must do is to acquire the mutex of the PMCs in its parameter list (in ascending address order). In this model only PMCs can be shared.

STM Concurrency

Older Parrot's preferred a model of concurrency based on Software Transactional Memory. In this model, rather than locking a shared variable while performing a series of operations on it, the changes are bundled into a transaction that acts as an atomic unit.

Within the transaction, STM creates a "hypothetical" copy of the variable, logs the changes made to that copy, and at the end of the transaction performs some validation steps to decide whether to save the hypothetical value back to the real variable (a commit) or discard the hypothetical value (a roll back). One common validation step is to check whether the value of the real variable was changed during the execution of the transaction (possibly by another concurrent task).

STM tasks can read/write shared variables from mutex/lock tasks, as they appear to the mutex/lock task as a single atomic operation. Mutex/lock tasks can read shared variables from STM tasks, but they cannot write them, as the STM tasks will not respect the lock and may commit a new value in the middle of a complex operation that requires the lock. As a safety mode, STM tasks may be configured to fail validation on any transaction attempting to commit to a variable locked by a mutex/lock task.

Intel Threading Building Blocks

Threading Building Blocks (TBB) is a library of tools for data-parallel programming, dividing large data sets into small pieces so that operations on those data-sets can be parallelized across multiple processors.

Parrot might provide two levels of integration with TBB: an interface for TBB's scheduling to interact with the central concurrency scheduler, and an interface for developers to access the TBB routines from within PIR/PASM.

Like Parrot, TBB is task-based. Since TBB performs its own scheduling, TBB tasks in Parrot will be given a lightweight scheduler that only has the responsibility of passing messages, events, etc, back and forth between the TBB task and the central scheduler. TBB tasks will not share variables with any other types of concurrent tasks in Parrot.

Note that since TBB is a C++ library, it is only available when Parrot is compiled with a C++ compiler.

POSIX Concurrency

This is the POSIX "share-everything" style of threading, such as is used in Perl 5's "pthread" model, as well as the thread models for Ruby and Python. [Recommended reading: "Programming with POSIX Threads" by Dave Butenhof.]

Process-type Concurrency

This is the Perl 5 "iThreads" threading model. In this model no data is shared implicitly, and all sharing must be done on purpose and explicitly. It resembles the Unix fork process with shared memory segment model, not a surprise as it was originally developed with emulation of Unix's fork system in mind.

Hybrid-thread Concurrency (current)

Lightweight "green" threads (i.e. Task) are used as messages in a system where reading shared variables is allowed but only the one owner thread may write to it.

With our current hybrid threads model all data sharing is implemented through Proxy PMCs. Reading foreign data is transparent, writing must be done via prioritized writer subs scheduled in the owners Task. To reduce latency, the task is flagged to run immediately. The data-owning interpreter will preempt the currently running task and process the new write task. Thus proxies allow a nearly lock-free multithreading implementation.

Each task is assigned a fixed amount of execution time. After this time is up a timer callback sets a flag which is checked at execution of every branch operation. Since the interpreter's state is well defined at this point, its internal consistency is guaranteed. The same holds for the GC. Since task preemption is only done while executing user-level code, the GC can do its work undisturbed and without the need for measures like locking. Since user-level code is allowed to disable the scheduler, it can be guaranteed to run undisturbed through critical sections.

Independent Concurrency

Independent tasks have no contact with the internal data of any other task in the current process. These are implemented as STM concurrency but only use transactions for the shared interpreter globals.

Note that independent tasks may still communicate back and forth by passing either atomic things (ints, floats, and pointers) or static buffers that can become the property of the destination thread.

Concurrency Scheduler API

The concurrency scheduler has two parts, a Scheduler PMC, which has an instance stored in the interpreter struct, and a set of core routines in src/scheduler.c.

An instance of the Scheduler PMC has 5 internal attributes, which are:

An unique ID for the scheduler
The current highest assigned task ID
The task list
The task priority index
The list of handlers

The unique ID of the scheduler is used by other schedulers to pass messages. With a small set of identifying information (including process ID, interpreter ID, scheduler ID, and possibly a URL/hostname) a scheduler can address other schedulers, both local to the current interpreter and remote.

The task list is a simple unordered integer indexed data structure, currently implemented as a hash. Each task in the list has an integer ID assigned when it is first inserted into the list. A task retains the same ID throughout its lifetime, and the ID is not reused once a task is finalized. The data structure is currently implemented as pre-allocated array, the number of CPU's plus one, overridable by --numthreads N.

Currently a task is implemented as OS thread so ranking is done by the OS. Prioritization is done with the interpreter method schedule_proxied. Previous versions used a task rank index, calculated based on the type, priority rating, age of the tasks in the task list. The index is a simple array, and in general the top (zeroth) element in the array is the next one to receive attention. The index is recalculated regularly as new tasks are inserted into the task list, existing tasks are modified or completed, and as time progresses so the age of some tasks pushes them to a higher priority. Because of the regular recalculation, the rank index may cache some frequently-accessed and rarely changing data from the tasks (though it is not required to do so).

The list of handlers is a simple stack of handler PMCs currently waiting for an appropriate task (event, exception). See PDD 24 on Events for more details on event handlers.

Flags (old)

PMC flags 0-7 are reserved for private use by a PMC. The scheduler uses flag 0 to indicate whether the priority index is currently valid or needs to be recalculated before the next use.

Vtable Functions

push_pmc
pop_pmc

Methods

add_handler
find_handler

Task PMC API

The interface of the Task PMC is also the minimum required interface for all subclasses, extensions, and alternate implementations of a task.

An instance of the Task PMC has 7 internal attributes, which are:

The task ID
The type of the task
The subtype of the task
The priority of the task
The birthtime stamp of the task
The status of the task
The code block of the task (optional)
An interpreter structure for the task (optional)

Types of tasks include 'event', 'exception', 'io', and 'code'. The subtype of a task is used by events and exceptions to identify appropriate handlers. Possible status values for tasks include 'created', 'invoked', 'inprocess', and 'completed'. The final state of a task is 'destroyed', but is never marked (the task PMC is removed from the task list and at some later point destroyed by GC). The priority of a task is an integer value between 0 and 100, with 0 as the lowest priority.

The birthtime stamp is the point at which the task was inserted into the task list, and is used for calculating the age of tasks.

The code block is optional and only for tasks that are associated with a simple code block. The interpreter structure is also optional and only used for thread-like tasks that maintain their own interpreter state.