You are here: Home » Developer Resources
March 23rd, 2010
Abstract:
We present our effort in developing an open-source GPU (graphics processing units) code library for the MATLAB Image Processing Toolbox (IPT). We ported a dozen of representative functions from IPT and based on their inherent characteristics, we grouped these functions into four categories: data independent, data sharing, algorithm dependent and data dependent. For each category, we present a detailed case study, which reveals interesting insights on how to efficiently optimize the code for GPUs and highlight performance-critical hardware features, some of which have not been well explored in existing literature. Our results show drastic speedups for the functions in the data-independent or data-sharing category by leveraging hardware support judiciously; and moderate speedups for those in the algorithm-dependent category by careful algorithm selection and parallelization. For the functions in the last category, fine-grain synchronization and data-dependency requirements are the main obstacles to an efficient implementation on GPUs.
(J. Kong, et. al., “Accelerating MATLAB Image Processing Toolbox Functions on GPUs”, Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), Pittsburgh, PA. Apr. 2010. Source code is available here.)
Posted in Developer Resources, Research | Tags: ATI Stream, Image Processing, MATLAB, NVIDIA CUDA, OpenCL, Papers | Write a comment
March 20th, 2010
NVIDIA has released version 3.0 of the CUDA Toolkit, providing developers with tools to prepare for the upcoming Fermi-based GPUs. Highlights of this release include:
- Support for the new Fermi architecture, with:
- Native 64-bit GPU support
- Multiple Copy Engine support
- ECC reporting
- Concurrent Kernel Execution
- Fermi HW debugging support in cuda-gdb
- Fermi HW profiling support for CUDA C and OpenCL in Visual Profiler
- C++ Class Inheritance and Template Inheritance support for increased programmer productivity
- A new unified interoperability API for Direct3D and OpenGL, with support for:
- OpenGL texture interop
- Direct3D 11 interop support
- CUDA Driver / Runtime Buffer Interoperability, which allows applications using the CUDA Driver API to also use libraries implemented using the CUDA C Runtime such as CUFFT and CUBLAS.
- Read the rest of this entry »
Posted in Developer Resources | Tags: APIs, Debugging, NVIDIA CUDA, OpenCL, Programming Languages, Tools | 1 Comment
March 20th, 2010
Abstract:
Heterogeneous systems, systems with multiple processors tailored for specialized tasks, are challenging programming environments. While it may be possible for domain experts to optimize a high performance application for a very specific and well documented system, it may not perform as well or even function on a different system. Developers who have less experience with either the application domain or the system architecture may devote a significant effort to writing a program that merely functions correctly. We believe that a comprehensive analysis and modeling framework is necessary to ease application development and automate program optimization on heterogeneous platforms.
This paper reports on an empirical evaluation of 25 CUDA applications on four GPUs and three CPUs, leveraging the Ocelot dynamic compiler infrastructure which can execute and instrument the same CUDA applications on either target. Using a combination of instrumentation and statistical analysis, we record 37 different metrics for each application and use them to derive relationships between program behavior and performance on heterogeneous processors. These relationships are then fed into a modeling framework that attempts to predict the performance of similar classes of applications on different processors. Most significantly, this study identifies several non-intuitive relationships between program characteristics and demonstrates that it is possible to accurately model CUDA kernel performance using only metrics that are available before a kernel is executed.
(Andrew Kerr, Gregory Diamos and Sudakhar Yalamanchili: “Modeling GPU-CPU Workloads and Systems”. Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), Pittsburgh, PA. Apr. 2010. PDF Link.)
Posted in Developer Resources, Research | Tags: NVIDIA CUDA, Papers, Performance Modeling, Profiling | Write a comment
March 9th, 2010
Yellow Dog Enterprise Linux for CUDA (YDEL for CUDA) is an open source, Linux operating system built for faster, easier, and more reliable GPU Computing. YDEL for CUDA, released and supported by Fixstars, goes beyond the basic Linux OS and integrates support for GPUs, NVIDIA CUDA, and GPU development tools.
From the YDEL for CUDA website:
Key benefits of Yellow Dog Enterprise Linux for CUDA:
- YDEL for CUDA users can experience up to a 9% performance improvement in some applications.
- Comprehensive support is offered to paid subscriptions with our skilled team able to assist you with both Linux and CUDA.
- YDEL’s unparalleled integrations means everything you need to write and run CUDA applications is included and configured.
- YDEL includes multiple versions of CUDA and can easily switch between them via a setting in a configuration file or an environment variable.
- Never worry about updates affecting your system, Fixstars offers YDEL users greater reliability with our strenuous test procedures that validate GPU computing functionality and performance.
For more information, visit the YDEL for CUDA website.
Posted in Business, Developer Resources | Tags: Linux, NVIDIA CUDA, Open Source | Write a comment
March 9th, 2010
CLyther is an under-development python tool for OpenCL similar to Cython for C. CLyther is a python language extension intended to make writing OpenCL code as easy as Python itself. CLyther currently only supports a subset of the Python language definition but adds many new features for OpenCL.
CLyther exposes both the OpenCL C library and language to python. It’s features include:
- Fast prototyping of OpenCL code.
- OpenCL kernel function creation using the Python language definition.
- Strong OOP programming in OpenCL code.
- Passing functions as arguments to kernel functions.
- Python emulation mode for OpenCL code.
- Fancy indexing of arrays.
- Dynamic compilation at runtime.
Read the rest of this entry »
Posted in Developer Resources | Tags: Open Source, OpenCL, Programming Languages, Python | 1 Comment
March 9th, 2010
Swan is a small tool that aids the reversible conversion of existing CUDA codebases to OpenCL. Its main features are the translation of CUDA kernel source-code to OpenCL, and a common API that abstracts both CUDA and OpenCL runtimes. Swan preserves the convenience of the CUDA <<< grid, block >>> kernel launch syntax by generating C source-code for kernel entry-point functions. Possible uses include:
- Evaluating OpenCL performance of an existing CUDA code
- Maintaining a dual-target OpenCL and CUDA code
- Reducing dependence on NVCC when compiling host code
- Support multiple CUDA compute capabilities in a single binary
Swan is developed by the MultiscaleLab, Barcelona, and is available under the GPL2 license.
Posted in Developer Resources | Tags: NVIDIA CUDA, OpenCL, Tools | Write a comment
March 1st, 2010
GMAC (Global Memory for ACcelerators) is a user-level library that implements an Asymmetric Distributed Shared Memory model to be used by CUDA programs. An ADSM model allows CPU code to access data hosted in accelerator (GPU) memory. In this model, a single pointer is used for data structures accessed both in the CPU and the GPU and the coherency of the data is transparently handled by the library. Moreover, the data allocated with GMAC can be accessed by all the host threads of the program. That makes your code simpler and cleaner. GMAC currently supports programs programmed with CUDA, but OpenCL support is planned.
A paper describing the Asymmetric Distributed Shared Memory model and its implementation in GMAC has been accepted in the ASPLOS XV conference. GMAC is being developed by the Operating System Group at the Universitat Politecnica de Catalunya and the IMPACT Research Group at the University of Illinois. Binary pre-compiled packages, the source code, documentation and examples are available at the project website.
(Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro and Wen-mei Hwu, “An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems”, accepted in: Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2010), March 2010.)
Posted in Developer Resources, Research | Tags: Libraries, NVIDIA CUDA, Open Source, Papers, Tools | Write a comment
February 21st, 2010
AccelerEyes has recently launched a number of resources to assist the gpu computing community in general and MATLAB users more specifically:
- In collaboration with Dr. Torben Larsen at Aalborg University in Denmark, Accelereyes has launched Torben’s Corner that consists of a wide variety of tips and tricks for application development and performance benchmarking of GPUs.
- The entire team at AccelerEyes is contributing to a weekly blog on GPU computing with MATLAB. Some recent posts include:
- Using Parallel For Loops (parfor) with MATLAB and Jacket
- Lazy Execution in MATLAB GPU computing
Join the AccelerEyes GPU computing blog for weekly insights to maximizing productivity with GPUs.
Posted in Developer Resources | Tags: MATLAB | Write a comment
February 21st, 2010
Graphic Remedy is proud to announce the release of gDEBugger Version 5.5 for Windows, Linux, Mac OS X and iPhone.
This version introduces a powerful AMD GPU performance counters integration, displaying AMD graphic hardware and driver performance counters in gDEBugger’s Performance Graph and Performance Dashboard views, allowing developers to optimize their application over AMD (ATI) graphics hardware.
AMD Performance counters are available on Windows, when using ATI Radeon (TM) HD 2000 series or newer with Catalyst (TM) 9.12 or newer.
This version also includes a large number of bug fixes and stability improvements.
Read the rest of this entry »
Posted in Developer Resources | Tags: AMD, Debugging, gDEBugger, Tools | Write a comment
February 14th, 2010
OpenNL (Open Numerical Library) is a library for solving sparse linear systems, especially designed for the Computer Graphics community. The goal of OpenNL is to be as small as possible, while offering the subset of functionalities required by this application field. The Makefiles of OpenNL can generate a single .c and .h file that make it very easy to integrate into other projects. The distribution includes an implementation of a Least Squares Conformal Maps parameterization method. The new version 3.0 of OpenNL includes support for CUDA (with Concurrent Number Cruncher and CUSP ELL formats).
Posted in Developer Resources, Research | Tags: Libraries, Numerical Algorithms, NVIDIA CUDA, Open Source, Sparse Linear Systems | 2 Comments