OpenCL Studio 1.0 has been released

March 21st, 2011

OpenCL Studio combines OpenCL and OpenGL into a single integrated development environment for high performance computing. The feature rich editor, interactive scripting language and extensible plug-in architecture support the rapid development of complex parallel algorithms and accompanying visualization. The first production version of OpenCL Studio including instructional videos and demo applications are available at

Jacket v1.7 for Faster MATLAB® Code

March 21st, 2011

AccelerEyes has released version 1.7 of Jacket for GPU computing with MATLAB®. Version 1.7 delivers even more speed to MATLAB with a new Sparse Linear Algebra Library, a new Signal Processing Library, a big boost to convolution functions, and more.

Jacket is the premier GPU software plugin for MATLAB. It enables rapid prototyping and problem solving across a range of government, manufacturing, energy, media, biomedical, financial, and scientific research applications. Jacket accelerates performance of common arithmetic and linear algebra functionality using the complete line of CUDA-capable GPUs from NVIDIA, including top of the line Tesla GPUs as well as Quadro visualization GPUs and GeForce gaming GPUs.

Some of the new features available with Jacket 1.7 include:

Read the rest of this entry »

Exact string matching algorithms in CUDA

March 21st, 2011

Exact String Matching algorithms are heavily used in a lot of applications like antivirus engines, DNA sequencing, text editors etc. This project provides CUDA implementations of  the naive, horspool and quicksearch algorithms, including a performance comparison against CPU versions:

CUDA 4.0 Release Aims to Make Parallel Programming Easier

March 1st, 2011

Today NVIDIA announced the upcoming 4.0 release of CUDA.  While most of the major CUDA releases accompanied a new GPU architecture, 4.0 is a software-only release, but that doesn’t mean there aren’t a lot of new features.  With this release, NVIDIA is aiming to lower the barrier to entry to parallel programming on GPUs, with new features including easier multi-GPU programming, a unified virtual memory address space, the powerful Thrust C++ template library, and automatic performance analysis in the Visual Profiler tool.  Full details follow in the quoted press release below.

Read the rest of this entry »

PFAC: A library for string matching on NVIDIA GPUs

February 28th, 2011

PFAC, the Parallel Failureless Aho-Corasick algorithm is a variant of the well-known Aho-Corasick (AC) algorithm with all failure transitions removed. The purpose of PFAC is to match all longest patterns in a given input stream against patterns pre-defined by users. The data-parallel nature of PFAC makes it perform well on GPUs, especially NVIDIA Fermi-based GPUs. The PFAC library, implemented in CUDA, provides a C level API that is easy to use. Users need not know CUDA programming. The user guide provides simple example to make it easy to use PFAC for content searches or virus detection on the GPU.

The PFAC library does not use multiple GPUs intrinsically but users can combine PFAC library with OpenMP or PThreads libraries to perform string matching on Multiple GPUs. The PFAC release includes OpenMP and PThreads examples. Download and further information:

New GPGPU meetup Groups: NYC, Boston, Chicago, Tokyo and More

February 28th, 2011

Following in the footsteps of the highly successful GPU Users meetup groups in Brisbane, Sydney, Perth and Melbourne, Australia, new GPU meetup groups are popping up around the USA and other countries. Professional “meetup” groups have now formed in New York City, Silicon Valley, BostonChicago, Albuquerque and Tokyo, bringing practitioners together to discuss the applications, methods, and technical challenges of using GPUs for algorithm acceleration. The events are free to attend. More information can be found at

Check out our User Groups page for more.

HIPHAC’11 Proceedings Available

February 20th, 2011

Proceedings from the 2nd International Workshop on High Performance and Hardware-Aware Computing (HIPHAC 2011) are now available from KIT Scientific Publishing. Individual copies can be ordered here, and the electronic proceedings are available free of charge.

CfP: The Second International Workshop on Frontier of GPU Computing (FCG 2011)

February 20th, 2011

FGC 2011 – The Second International Workshop on Frontier of GPU Computing, is held in conjunction with CSE 2011, Dalian, China, 24 – 26 August, 2011. More information can be found at

Call for Papers: CACHES-2011

February 13th, 2011

The First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (co-located with ICS, June 4, 2011) is intended to provide evaluations of the characteristics of computational kernels and applications, and how different software stacks impact them, to guide future accelerator-based HPC system designs.

We solicit papers on all aspects of HPC application studies, especially those that involve accelerators such as GPUs, FPGAs, etc. The topics include (but are not limited to):

  • Categorizing/characterizing of HPC applications and kernels with respect to patterns in computation structure, communication, cache accesses, memory, I/O, and file accesses.
  • Evaluating the importance of individual kernels within an entire application.
  • Modeling for applications running on accelerator-based heterogeneous HPC systems.
  • Implication of workload characterization in heterogeneous design issues.
  • Benchmarking of applications, kernels or software stacks and tools supporting applications.

The call for papers and more details about the workshop may be found on the website.

GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method

February 13th, 2011


The paper discusses a fast implementation of the conjugate gradient iterative method with E-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results indicate that performance of electromagnetic simulations can be significantly improved thereby enabling full wave optimization of microwave components in more manageable time.

(A. Dziekonski, A. Lamecki and M. Mrozowski: “GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method”, IEEE Microwave and Wireless Components Letters 21(1) pp.1-3, Jan. 2011. [DOI])

Page 41 of 112« First...102030...3940414243...506070...Last »