GeoSpin has released the first version of CLyther for beta testing. Please visit the CLyther SourceForge website for more information. CLyther enables developers to seamlessly write GPGPU code completely in python with no additional syntax. CLyther’s core driver contains a python compiler to convert Python functions and types to OpenCL during runtime.
CLyther currently only supports a subset of the Python language definition but adds many new features to OpenCL such as:
- OpenCL interface similar to PyOpenCL
- Dynamic compilation of OpenCL code at runtime
- Fast prototyping of OpenCL code
- Create OpenCL code using the Python language definition
- Passing functions as arguments to OpenCL kernels
- Pure Python emulation mode of kernel functions
Read the rest of this entry »
NVIDIA has released version 3.0 of the CUDA Toolkit, providing developers with tools to prepare for the upcoming Fermi-based GPUs. Highlights of this release include:
- Support for the new Fermi architecture, with:
- Native 64-bit GPU support
- Multiple Copy Engine support
- ECC reporting
- Concurrent Kernel Execution
- Fermi HW debugging support in cuda-gdb
- Fermi HW profiling support for CUDA C and OpenCL in Visual Profiler
- C++ Class Inheritance and Template Inheritance support for increased programmer productivity
- A new unified interoperability API for Direct3D and OpenGL, with support for:
- OpenGL texture interop
- Direct3D 11 interop support
- CUDA Driver / Runtime Buffer Interoperability, which allows applications using the CUDA Driver API to also use libraries implemented using the CUDA C Runtime such as CUFFT and CUBLAS.
- Read the rest of this entry »
CLyther is an under-development python tool for OpenCL similar to Cython for C. CLyther is a python language extension intended to make writing OpenCL code as easy as Python itself. CLyther currently only supports a subset of the Python language definition but adds many new features for OpenCL.
CLyther exposes both the OpenCL C library and language to python. It’s features include:
- Fast prototyping of OpenCL code.
- OpenCL kernel function creation using the Python language definition.
- Strong OOP programming in OpenCL code.
- Passing functions as arguments to kernel functions.
- Python emulation mode for OpenCL code.
- Fancy indexing of arrays.
- Dynamic compilation at runtime.
Read the rest of this entry »
Occasionally, we receive news submissions pointing us to interesting older papers that somehow slipped by without our notice. This post collects a few of those. If you want your work to be posted on GPGPU.org in a timely manner, please remember to use the news submission form.
- Joshua A. Anderson, Chris D. Lorenz and Alex Travesset present and discuss molecular dynamics simulations and compare a single GPU against a 36-CPU cluster (General purpose molecular dynamics simulations fully implemented on graphics processing units, Journal of Computational Physics 227(10), May 2008, DOI 10.1016/j.jcp.2008.01.047).
- Wen-mei Hwu et al. derive and discuss goals and concepts of programming models for fine-grained parallel architectures, from the point of view of both a programmer and a hardware /compiler designer, and analyze CUDA as one current representative (Implicitly parallel programming models for thousand-core microprocessors, Proceedings of DAC’07, June 2007, DOI 10.1145/1278480.1278669).
- Jeremy Sugerman et al. present GRAMPS, a prototype implementation of future graphics hardware that allows pipelines to be specified as graphs in software (GRAMPS: A Programming Model for Graphics Pipelines, ACM Transactions on Graphics 28(1), January 2009, DOI 10.1145/1477926.1477930).
- William R. Mark discusses concepts of future graphics architectures in this contribution to the 2008 ACM Queue special issue on GPUs (Future graphics architectures, ACM Queue 6(2), March/April 2008, DOI 10.1145/1365490.1365501).
- BSGP by Qiming Hou et al. is a new programming language for general purpose GPU computing that achieves the same efficiency as well-tuned CUDA programs but makes code much easier to read, develop and maintain (BSGP: bulk-synchronous GPU programming, ACM Siggraph 2008, August 2008, DOI 10.1145/1399504.1360618).
- Finally, Che et al. and Garland et al. survey the field of GPU computing and discuss many different application domains. These articles are, in addition to the ones we have collected on the developer pages, recommended to GPGPU newcomers.
GPULib provides a library of mathematical functions that facilitate the use of high performance computing resources available on modern graphics processing units (GPUs) by engineers, scientists, analysts, and other technical professionals with minimal modification to their existing programs. This software library executes vectorized mathematical functions on graphics processing units (GPUs) from NVIDIA, bringing high-performance numerical operations to everyday desktop computers. By providing bindings for a number of Very High Level Languages (VHLLs) including MATLAB and IDL from ITT Visual Information Solutions, GPULib can accelerate new applications or be incorporated into existing applications with minimal effort. No knowledge of GPU programming and memory management is required. For more information regarding GPULib, please visit http://GPULib.txcorp.com.
High-performance scientific computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), and PyCUDA, an open-source toolkit that supports this technique.
In introducing PyCUDA, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. It is further observed that, compared to competing techniques, the effort required to create codes using run-time code generation with PyCUDA grows more gently in response to growing needs. The concept of RTCG is simple and easily implemented using existing, robust tools. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.
Preprint at arXiv
(Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, Ahmed Fasih. PyCUDA: GPU Run-Time Code Generation for High-Performance Computing, submitted. http://arxiv.org/abs/0911.3456)
A public beta release of the CUDA-enabled Fortran Compiler from PGI enables programmers to write code in Fortran for NVIDIA CUDA GPUs. From a press release:
What: NVIDIA today announced that a public beta release of the PGI® CUDA-enabled Fortran compiler is now available. Developed in collaboration with The Portland Group® , it is the first Fortran compiler compatible with NVIDIA® CUDA™ -enabled graphics processing units (GPUs).
A compiler is a software tool that translates applications from the high-level programming languages used by software developers into a binary form a computer can execute.
Why: GPU computing with the CUDA C-compiler has gained significant momentum in the High-Performance Computing (HPC) space as it enables developers to get transformative increases in performance with minimal coding required.
Fortran is particularly well suited to numeric computation and scientific computing and remains widely used in a wide range of applications such as weather modeling, computational fluid dynamics and seismic processing.
Where can I get it?: Read the rest of this entry »
Intel has acquired RapidMind, the company behind the RapidMind (formerly Sh) programming environment targeting multicore CPUs, AMD and NVIDIA GPUs and the Cell processor. The RapidMind Platform continues to be available, including support. In the medium term RapidMind’s technology and products will be integrated with Intel’s data-parallel products, in particular Intel’s Ct technology.
This blog entry by James Reinders from Intel describes the acquisition and future plans in more detail.
LabVIEW GPU Computing unleashes the computing power of NVIDIA GPUs via the CUDA interface from within a LabVIEW application. Code that calls the GPU for computation is integrated into the native parallel execution system of LabVIEW as if it were any other multi-threaded external library function call.
LabVIEW is a graphical programming environment used by millions of engineers and scientists to develop sophisticated measurement, test, and control systems using intuitive graphical icons and wires that resemble a flowchart. LabVIEW offers unrivaled integration with thousands of hardware devices and provides hundreds of built-in libraries for advanced analysis and data visualization. The LabVIEW platform is scalable across multiple targets and operating systems, and since its introduction in 1986 has become an industry leader.
Brook+, AMD’s extension of the BrookGPU programming environment, has been released in full source code to SourceForge. Brook+ supports an ATI CAL and x86 CPU backend, and allows developers to program GPUs in a C-like stream computing language.