MATLAB Adds GPU Support

October 13th, 2010

Michael Feldman of HPCWire writes:

MATLAB users with a taste for GPU computing now have a perfect reason to move up to the latest version. Release R2010b adds native GPGPU support that allows user to harness NVIDIA graphics processors for engineering and scientific computing. The new capability is provided within the Parallel Computing Toolbox and Distributed Computing Server.

Full details of  MATLAB Release R1020b are available on the Mathworks site.  Information on other numerical packages accelerated using NVIDIA CUDA is available on NVIDIA’s site.

[Editor's Note: as pointed out in the comments by John Melanakos (from Accelereyes),  it may be worth checking out how MATLAB 2010b GPU support currently compares to Accelereyes Jacket.]

Introducing the OpenCL™ Programming Webinar Series

October 12th, 2010

OpenCL LogoThis webinar series is designed to help advance your OpenCL programming knowledge. Experts from AMD will cover both beginning and advanced topics starting with the basics of parallel and heterogeneous computing and an introduction to OpenCL, then progressing to more advanced topics such as performance optimization techniques and real world case studies.

This webinar describes how heterogeneous computing fits into the parallel computing paradigm, what problems it solves and what opportunities it presents. Read the rest of this entry »

A Fast GEMM Implementation on a Cypress GPU

October 12th, 2010


We present benchmark results of optimized dense matrix multiplication kernels for a Cypress GPU. We write general matrix multiply (GEMM) kernels for single (SP), double (DP) and double-double (DDP) precision. Our SGEMM and DGEMM kernels show 73% and 87% of the theoretical performance of the GPU, respectively. Currently, our SGEMM and DGEMM kernels are fastest with one GPU chip to our knowledge. Furthermore, the performance of our matrix multiply kernel in DDP is 31 Gflop/s. This performance in DDP is more than 200 times faster than the performance in DDP on single core of a recent CPU (with mpack version 0.6.5). We describe our GEMM kernels with main focus on the SGEMM implementation since all GEMM kernels share common programming and optimization techniques. While a conventional wisdom of GPU programming recommends us to heavily use shared memory on GPUs, we show that texture cache is very effective on the Cypress architecture.

(N. Nakasato: “A Fast GEMM Implementation on a Cypress GPU”, 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 10) November 2010. A sample program is available at

HOOMD-blue 0.9.1 release

October 12th, 2010

HOOMD-blue performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular mynamics simulations of nano-maertials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and crystallization of metals.

HOOMD-blue 0.9.1 adds many new features. Highlights include:

  • 10 to 50 percent faster performance over 0.9.0
  • DPD (Dissipative Particle Dynamics) capability
  • EAM (Embedded Atom Method) capability
  • Removed limitation on number of exclusions
  • Support for compute 2.1 devices (such as the GTX 460)
  • Support for CUDA 3.1
  • and more

HOOMD-blue 0.9.1 is available for download under an open source license. Check out the quick start tutorial to get started, or check out the full documentation to see everything it can do.

Thrust v1.3 release

October 7th, 2010

Thrust v1.3, an open-source template library for CUDA applications, has been released. Modeled after the C++ Standard Template Library (STL), Thrust brings a familiar abstraction layer to the realm of GPU computing.

Version 1.3 adds several new features, including:

  • a state-of-the-art sorting implementation, recently featured on Slashdot.
  • performance improvements to stream compaction and reduction
  • robust error reporting and failure detection
  • support for CUDA 3.2 and gf104-based GPUs
  • search algorithms
  • and more!

Get started with Thrust today! First download Thrust v1.3 and then follow the online quick-start guide. Refer to the online documentation for a complete list of features. Many concrete examples and a set of introductory slides are also available. Read the rest of this entry »

Insilicos Awarded NIH Grant Applying GPU Computing to Human Disease

October 7th, 2010

Seattle, WA, 4 October, 2010 – Insilicos today announced the company has received a grant applying GPU computing to the role of epistasis in human disease. Funding comes from the National Human Genome Research Institute, part of the National Institutes of Health.

Epistasis refers to the interaction of two or more genes and is thought to play a major role in the genetics of susceptability to disease. One way to detect epistasis is through computationally-intensive statistical algorithms, such as those employed in data mining. Insilicos plans to exploit the concurrency inherent in these algorithms by using commodity graphics processors. Read the rest of this entry »

Free CUDA Beginners Workshop in Cologne

October 7th, 2010

In cooperation with NVIDIA, empulse GmbH is hosting a free CUDA workshop on October 26th in Cologne. An introduction to CUDA programming will be presented, including real examples and scenarios. The main objective of the workshop is to provide an understanding of the basic programming paradigms for the creation of CUDA applications, as well as the underlying hardware architecture.

The seminar is geared towards beginners in CUDA programming. A solid, basic knowledge of C/C++ programming is required. The workshop will be presented in English. Details are available at the empulse website.

A GPGPU transparent virtualization component for high performance computing clouds

October 4th, 2010


The promise of exascale computing power is enforced by the many core technology, that involves all purpose CPUs and specialized computing devices, such as FPGA, DSP and GPUs. In particular GPUs, due also to their wide market footprint, have currently achieved one of the best core/cost rate in that category. Relying to some APIs provided by GPU vendors, the use of GPUs as general purpose massive parallel computing device (GPGPUs) is now routinely carried out in the scientific community. The increasing number of CPUs cores on chip has driven the development and spreading of the cloud computing, leveraging on consolidated technologies such as, but not limited to, grid computing and virtualization. In recent years the use of grid computing in high performance demanding applications in e-science has become a common issue. Elastic computer power and storage provided by a cloud infrastructure may be attractive but it is still limited by poor communication performance and lack of support in using GPGPUs within a virtual machine instance. The GPU Virtualization Service (gVirtuS) presented in this work tries to fill the gap between in-house hosted computing clusters, equipped with GPGPUs devices, and pay-for-use high performance virtual clusters deployed via public or private computing clouds. gVirtuS allows an instanced virtual machine to access GPGPUs in a transparent way, with an overhead slightly greater than a real machine/GPGPU setup. gVirtuS is hypervisor independent, and, even though it currently virtualizes nVIDIA CUDA based GPUs, it is not limited to a specific brand technology. The performance of the components of gVirtuS is assessed through a suite of tests in different deployment scenarios, such as providing GPGPU power to cloud computing based HPC clusters and sharing remotely hosted GPGPUs among HPC nodes.

(Giunta G., R. Montella, G. Agrillo, and G. Coviello: “A GPGPU transparent virtualization component for high performance computing clouds”. In P. D’Ambra, M. Guarracino, and D. Talia, editors, Euro-Par 2010 – Parallel Processing, volume 6271 of Lecture Notes in Computer Science, chapter 37, pages 379-391. Springer Berlin / Heidelberg, 2010. DOI. Link to project webpage with source code.)

CfP: New Frontiers in High-performance and Hardware-aware Computing (HipHaC’11)

September 30th, 2010

The Second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC’11) is to be held in conjunction with the 17th IEEE International Symposium on High-Performance Computer Architecture (HPCA-17), colocated with 16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2011), February 13, 2011, San Antonio, Texas, USA.

This workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Topics of interest for workshop submissions include (but are not limited to): Read the rest of this entry »

PyCULA: Python Bindings for CULA GPGPU LAPACK

September 30th, 2010

PyCULA is a module providing transparent PyCUDA and ctypes based Python bindings for CULAtools LAPACK by Louis Theran and Garrett Wright of Temple University. It provides support for mixing PyCUDA-style kernel code with CULA device functions and also has a complete set of ctypes wrappers for CULA.

Key Features Include:

  • Reduce Memory Leaks by using Automatic Memory Management (via PyCUDA)
  • Utilize both simple Numpy style and GPUArray manual device style interfaces.
  • Supports mixing LAPACK via CULA with your Custom Kernels.
  • Combine seamlessly with handy Python modules like SQL, gzip, SciPy, R, etc.
  • Develop, Debug, Optimize, and Get Help right at the interactive command line.

The PyCULA0.9a4 alpha release is avaiable at PyCULA was developed as part of the ASU/Temple Zeolite Project, which is supported by CDI-I grant DMR 0835586 to Igor Rivin and M. M. J. Treacy.

Page 41 of 105« First...102030...3940414243...506070...Last »