Portable OpenCL Out-of-Order Execution Framework for Heterogeneous Platforms
Permanent address of the item is
Heterogeneous computing has become a viable option in seeking computing performance, to the side of conventional homogeneous multi-/single-processor approaches. The advantage of heterogeneity is the possibility to choose the best device on the platform for different distinct workloads in the application to gain performance and/or to lower power consumption. The drawback of heterogeneity is the increased complexity of applications all the way from the programming models to different instruction sets and architectures of the devices. OpenCL is the first standard for programming heterogeneous platforms. OpenCL offers a uniform Application Program Interface (API) and device platform abstraction that allows all different types of devices to be programmed in the same platform portable way. OpenCL has been widely adopted by major software and chip manufacturers and is increasing in its popularity. OpenCL requires an implementation for the standard in order to be used. One such implementation is the POrtable Computing Language (pocl) open source project launched in Tampere University of Technology. The aim for pocl is in easy portability on different devices. One goal of pocl is improved performance portability using a kernel compiler that is able to adopt to different parallel hardware resources on the devices. This thesis describes an out-of-order execution framework for pocl. This work offers a flexible and simple API for efficient offloading of computation to the devices, and for synchronising computation between the main application and other devices on the platform. The focus in this thesis is set on the task level operation, with fast task launching and efficient exploitation of available task level parallelism. An interest has emerged for using OpenCL as a middleware for other parallel programming models. The programming models might be highly task parallel and the size of the tasks might be much smaller than in nominal OpenCL use cases that tend to focus on data parallelism. In the runtime implementation of the proposed framework the focus was in minimising overheads in task scheduling in order to improve scalability for said programming models.