The Libra 3.0 Heterogeneous Cloud Computing SDK has recently been released by GPU Systems. It supports PC, Tablet and Mobile Devices and includes a new virtualizing function for cloud compute services of local and remote CPUs and GPUs. C/C++, Java, C# and Matlab are supported. Read the full press release here.
The rise of multi- and many-core architectures also gave birth to a plethora of new parallel programming models. Among these, the open industry standard OpenCL addresses this heterogeneity of programming environments by providing a uniﬁed programming framework. The price to pay, however, is that OpenCL requires additional low-level boilerplate code, when compared to vendor-speciﬁc solutions, even if only simple operations are to be performed. Also, the uniﬁed programming framework does not automatically provide any guarantees on performance portability of a particular implementation. Thus, device-speciﬁc compute kernels are still required for obtaining good performance across different hardware architectures.
We address both, the issue of programmability and portable performance, in this work: On the one hand, a high-level programming interface for linear algebra routines allows for the convenient speciﬁcation of the operations of interest without having to go into the details of the underlying hardware. On the other hand, we discuss the underlying generator for device-speciﬁc OpenCL kernels at runtime, which is supplemented by an auto-tuning framework for portable performance as well as with work partitioning and task scheduling for multiple devices. Our benchmark results show portable performance across hardware from major vendors. In all cases, at least 75 percent of the respective vendor tuned library was obtained, while in some cases we even outperformed the reference. We further demonstrate the convenient and efficient use of our high-level interface in a multi-device setting with good scalability.
(Philippe Tillet, Karl Rupp, Siegfried Selberherr, Chin-Teng Lin: “Towards Performance-Portable, Scalable, and Convenient Linear Algebra”. 5th USENIX Workshop on Hot Topics in Parallelism (HotPar’) 2013 [PDF].)
From a recent press release:
Amdahl Software, a leading supplier of development tools for multi-core software, after extensive beta testing by evaluators over a dozen countries and numerous end-user application markets, today announced the production release of OpenCL CodeBench. OpenCL CodeBench is an OpenCL Code Creation tool. It simplifies parallel software development, enabling developers to rapidly generate and optimize OpenCL applications. Engineering productivity is increased through the automation of overhead tasks. The tools suite enables engineers to work at higher levels of abstraction, accelerating the code development process. OpenCL CodeBench benefits both expert and novice engineers through a choice of command line or guided, wizard-driven development methodologies. Close cooperation with IP, SOC and platform vendors will enable future releases of OpenCL CodeBench to more tightly optimize software for specific end user platforms and development environments.
OpenCL CodeBench is available for trial or purchase. For additional information, please visit www.amdahlsoftware.com.
rCUDA (remote CUDA) v4.0 has just been released. It provides full binary compatibility with CUDA applications (no need to modify the application source code or recompile your program), native InfiniBand support, enhanced data transfers, and CUDA 5.0 API support (excluding graphics interoperability). This new release of rCUDA allows to execute existing GPU-accelerated applications by leveraging remote GPUs within a cluster (both via sharing and/or aggregating GPUs) with a negligible overhead. The new version is available free of charge ar www.rCUDA.net, along with examples, manuals and additional information.
The Computing Language Utility (CLU) is a lightweight API designed to help programmers explore, learn, and rapidly prototype programs with OpenCL. This API reduces the complexity associated with initializing OpenCL devices, contexts, kernels and parameters, etc. while preserving the ability to drop down to the lower level OpenCL API at will when programmers wants to get their hands dirty. The CLU release includes an open source implementation along with documentation and samples that demonstrate how to use CLU in real applications. It has been tested on Windows 7 with Visual Studio.
The MOSIX group announces the release of the Virtual OpenCL (VCL) cluster platform version 1.14. This version includes the SuperCL extension that allows micro OpenCL programs to run efficiently on devices of remote nodes. VCL provides an OpenCL platform in which all the cluster devices are seen as if they are located in the hosting-node. This platform benefits OpenCL applications that can use many devices concurrently. Applications written for VCL benefit from the reduced programming complexity of a single computer, the availability of shared-memory, multi-threads and lower granularity parallelism, as well as concurrent access to devices in many nodes. With SuperCL, a programmable sequence of kernels and/or memory operations can be sent to remote devices in cluster nodes, usually with just a single network round-trip. SuperCL also offers asynchronous communication with the host, to avoid the round-trip waiting time, as well as direct access to distributed file-systems. The VCL package can be downloaded from mosix.org.
Version 3.0 of the MC# programming system has been released. MC# is an universal parallel programming language aimed to any parallel architecture - multicore processors, systems with GPU, or clusters. It is an extension of C# language and supports high-level parallel programming style.
Libra Platform is a GPGPU-Heterogeneous Compute API and runtime environment available on Windows, Mac and Linux. Libra Compute API offers performance portability and direct compute access via standard programming environments C/C++, Java, C# and Matlab to execute math operations on top of current and future compute architectures, including the latest GPUs, x86/x64 CPUs and with broad support for compute devices compatible with low level specific APIs – OpenCL, CUDA, OpenGL and standard x86/x64 compute APIs.
Read more in the full announcement.
The rCUDA Team is proud to announce a new version of the rCUDA framework which will include many new functionalities as well as boosted performance. This new version, cooked for over a year, will incorporate pipelined transfers, full multi-thread and multi-node capabilities, CUDA 4.1 support, global scheduler integration, support for CUDA C extensions, and native InfiniBand support. A closed beta teting program has been started. See the complete text at http://www.rcuda.net/index.php/news/19-new-revolutionary-version-of-rcuda-to-be-launched.html.
Libra SDK is a sophisticated runtime including API, sample programs and documentation for massively accelerating software computations. This introduction tutorial provides an overview and usage examples of the powerful Libra API & math libraries executing on x86/x64, OpenCL, OpenGL and CUDA technology. Libra API enables generic and portable CPU/GPU computing within software development without the need to create multiple, specific and optimized code paths to support x86, OpenCL, OpenGL or CUDA devices. Link to PDF: www.gpusystems.com/doc/LibraGenericComputing.pdf