The Computing Language Utility (CLU) is a lightweight API designed to help programmers explore, learn, and rapidly prototype programs with OpenCL. This API reduces the complexity associated with initializing OpenCL devices, contexts, kernels and parameters, etc. while preserving the ability to drop down to the lower level OpenCL API at will when programmers wants to get their hands dirty. The CLU release includes an open source implementation along with documentation and samples that demonstrate how to use CLU in real applications. It has been tested on Windows 7 with Visual Studio.
The MOSIX group announces the release of the Virtual OpenCL (VCL) cluster platform version 1.14. This version includes the SuperCL extension that allows micro OpenCL programs to run efficiently on devices of remote nodes. VCL provides an OpenCL platform in which all the cluster devices are seen as if they are located in the hosting-node. This platform benefits OpenCL applications that can use many devices concurrently. Applications written for VCL benefit from the reduced programming complexity of a single computer, the availability of shared-memory, multi-threads and lower granularity parallelism, as well as concurrent access to devices in many nodes. With SuperCL, a programmable sequence of kernels and/or memory operations can be sent to remote devices in cluster nodes, usually with just a single network round-trip. SuperCL also offers asynchronous communication with the host, to avoid the round-trip waiting time, as well as direct access to distributed file-systems. The VCL package can be downloaded from mosix.org.
Graphics Core Next Architecture Overview
GCN is Designed to push not only the boundaries of DirectX® 11 gaming, the GCN Architecture is also AMD’s first design specifically engineered for general computing. Equipped with up to 32 compute units (2048 stream processors), each containing a scalar coprocessor, AMD’s 28nm GPUs are more than capable of handling workloads-and programming languages-traditionally exclusive to the processor. Coupled with the dramatic rise of GPU-aware programming languages like C++ AMP and OpenCL™, the GCN Architecture is truly the right architecture for the right time. Participate in this webinar to learn how you can take advantage of this new architecture in your GPGPU programs (North America – August 14, 2012 10AM Pacific Daylight savings Time; India- August 21, 2012, 5:30PM India Standard Time).
Performance Evaluation of AMD APARAPI Using Real World Applications
The fall schedule for Acceleware’s training courses is now available.
- OpenCL: August 21-24, 2012, Houston, TX
- CUDA: October 2-5, 2012, San Jose, CA
- OpenCL: October 16-19, 2012, Calgary, AB
- CUDA: November 6-9, 2012, Houston, TX
- CUDA: December 4-7, 2012, New York, NY – Finance Focus
- AMP: December 11-14, 2012, Chicago, IL
More information: http://www.acceleware.com/training
PGI Release 12.6 is now out. New in this release:
- PGI Accelerator compilers — first release of the Fortran and C compilers to include comprehensive support for the OpenACC 1.0 specification including the acc cache construct and the entire OpenACC API library. See the PGI Accelerator page for a complete list of supported features.
- CUDA Toolkit — PGI Accelerator compilers and CUDA Fortran now include support for CUDA Toolkit version 4.2; version 4.1 is now the default.
Download a free trial from the PGI website at http://www.pgroup.com/support/download_pgi2012.php?view=current. Upcoming PGI webinar with Michael Wolfe. 9:00AM PDT, July 31st sponsored by NVIDIA: “Using OpenACC Directives with the PGI Accelerator Compilers”. Register at http://www.pgroup.com/webinar212.htm?clicksource=gpgpu712.
Version 3.0 of the MC# programming system has been released. MC# is an universal parallel programming language aimed to any parallel architecture - multicore processors, systems with GPU, or clusters. It is an extension of C# language and supports high-level parallel programming style.
SnuCL is an OpenCL framework and freely available, open-source software developed at Seoul National University. It naturally extends the original OpenCL semantics to the heterogeneous cluster environment. The target cluster consists of a single host node and multiple compute nodes. They are connected by an interconnection network, such as Gigabit and InfiniBand switches. The host node contains multiple CPU cores and each compute node consists of multiple CPU cores and multiple GPUs. For such clusters, SnuCL provides an illusion of a single heterogeneous system for the programmer. A GPU or a set of CPU cores becomes an OpenCL compute device. SnuCL allows the application to utilize compute devices in a compute node as if they were in the host node. Thus, with SnuCL, OpenCL applications written for a single heterogeneous system with multiple OpenCL compute devices can run on the cluster without any modifications. SnuCL achieves both high performance and ease of programming in a heterogeneous cluster environment.
SnuCL consists of SnuCL runtime and compiler. The SnuCL compiler is based on the OpenCL C compiler in SNU-SAMSUNG OpenCL framework. Currently, the SnuCL compiler supports x86, ARM, and PowerPC CPUs, AMD GPUs, and NVIDIA GPUs.
Register today up now for a webinar series on how to use the Intel® SDK for OpenCL Applications to best utilize the CPU and Intel® HD Graphics of 3rd Gen Intel® Core™ processors for developing OpenCL applications:
- July 11 – Getting Started with Intel® SDK for OpenCL Applications
- July 18 – Writing Efficient Code for OpenCL Applications
- July 25 – Creating and Optimizing OpenCL Applications
VexCL is vector expression template library for OpenCL developed by the Supercomputer Center of Russian academy of sciences. It has been created for ease of C++ based OpenCL development. Multi-device (and multi-platform) computations are supported. The code is publicly available under MIT license.
- Selection and initialization of compute devices according to extensible set of device filters.
- Transparent allocation of device vectors spanning multiple devices.
- Convenient notation for vector arithmetic, sparse matrix-vector multiplication, reductions. All computations are performed in parallel on all selected devices.
- Appropriate kernels for vector expressions are generated automatically first time an expression is used.
Doxygen-generated documentation is available at http://ddemidov.github.com/vexcl/index.html.
C++ Accelerated Massive Parallelism (C++ AMP) is a new open specification heterogeneous programming model, which builds on the established C++ language. Developed for heterogeneous platforms C++ AMP is designed to accelerate the execution of C++ code by taking advantage of the data-parallel hardware that is commonly present as a GPU. These courses are aimed at programmers who are looking to develop comprehensive skills in writing and optimizing applications using C++ AMP. Read the rest of this entry »