PFAC: A library for string matching on NVIDIA GPUs

February 28th, 2011

PFAC, the Parallel Failureless Aho-Corasick algorithm is a variant of the well-known Aho-Corasick (AC) algorithm with all failure transitions removed. The purpose of PFAC is to match all longest patterns in a given input stream against patterns pre-defined by users. The data-parallel nature of PFAC makes it perform well on GPUs, especially NVIDIA Fermi-based GPUs. The PFAC library, implemented in CUDA, provides a C level API that is easy to use. Users need not know CUDA programming. The user guide provides simple example to make it easy to use PFAC for content searches or virus detection on the GPU.

The PFAC library does not use multiple GPUs intrinsically but users can combine PFAC library with OpenMP or PThreads libraries to perform string matching on Multiple GPUs. The PFAC release includes OpenMP and PThreads examples. Download and further information:

New GPGPU meetup Groups: NYC, Boston, Chicago, Tokyo and More

February 28th, 2011

Following in the footsteps of the highly successful GPU Users meetup groups in Brisbane, Sydney, Perth and Melbourne, Australia, new GPU meetup groups are popping up around the USA and other countries. Professional “meetup” groups have now formed in New York City, Silicon Valley, BostonChicago, Albuquerque and Tokyo, bringing practitioners together to discuss the applications, methods, and technical challenges of using GPUs for algorithm acceleration. The events are free to attend. More information can be found at

Check out our User Groups page for more.

HIPHAC’11 Proceedings Available

February 20th, 2011

Proceedings from the 2nd International Workshop on High Performance and Hardware-Aware Computing (HIPHAC 2011) are now available from KIT Scientific Publishing. Individual copies can be ordered here, and the electronic proceedings are available free of charge.

CfP: The Second International Workshop on Frontier of GPU Computing (FCG 2011)

February 20th, 2011

FGC 2011 – The Second International Workshop on Frontier of GPU Computing, is held in conjunction with CSE 2011, Dalian, China, 24 – 26 August, 2011. More information can be found at

Call for Papers: CACHES-2011

February 13th, 2011

The First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (co-located with ICS, June 4, 2011) is intended to provide evaluations of the characteristics of computational kernels and applications, and how different software stacks impact them, to guide future accelerator-based HPC system designs.

We solicit papers on all aspects of HPC application studies, especially those that involve accelerators such as GPUs, FPGAs, etc. The topics include (but are not limited to):

  • Categorizing/characterizing of HPC applications and kernels with respect to patterns in computation structure, communication, cache accesses, memory, I/O, and file accesses.
  • Evaluating the importance of individual kernels within an entire application.
  • Modeling for applications running on accelerator-based heterogeneous HPC systems.
  • Implication of workload characterization in heterogeneous design issues.
  • Benchmarking of applications, kernels or software stacks and tools supporting applications.

The call for papers and more details about the workshop may be found on the website.

GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method

February 13th, 2011


The paper discusses a fast implementation of the conjugate gradient iterative method with E-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results indicate that performance of electromagnetic simulations can be significantly improved thereby enabling full wave optimization of microwave components in more manageable time.

(A. Dziekonski, A. Lamecki and M. Mrozowski: “GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method”, IEEE Microwave and Wireless Components Letters 21(1) pp.1-3, Jan. 2011. [DOI])

OpenCLcc: Offline OpenCL Compilation

February 10th, 2011

A simple tool for off-line compilation of OpenCL kernel code, called “OpenCLcc”,  is now available at

OpenCLcc takes a text file with the OpenCL kernel code as input and calls the OpenCL run-time to compile it, echoing errors to the console.

A GPU-accelerated bioinformatics application for large-scale protein networks

February 10th, 2011


Proteins, nucleic acids, and small molecules form a dense network of molecular interactions in a cell. The architecture of molecular networks can reveal important principles of cellular organization and function, similarly to the way that protein structure tells us about the function and organization of a protein. Protein complexes are groups of proteins that interact with each other at the same time and place, forming a single multimolecular machine. Functional modules, in contrast, consist of proteins that participate in a particular cellular process while binding each other at a different time and place.

A protein-protein interaction network is represented as proteins are nodes and interactions between proteins are edges. Protein complexes and functional modules can be identified as highly interconnected subgraphs and computational methods are now inevitable to detect them from protein interaction data. In addition, High-throughput screening techniques such as yeast two-hybrid screening enable identification of detailed protein-protein interactions map in multiple species. As the interaction dataset increases, the scale of interconnected protein networks increases exponentially so that the increasing complexity of network gives computational challenges to analyze the networks. Read the rest of this entry »

GMAC 0.0.20 Released

February 10th, 2011

GMAC is a user-level library that implements an Asymmetric Distributed Shared Memory model to be used by CUDA programs. An ADSM model builds a global memory space that allows CPU code to transparently access data hosted in accelerators’ (GPUs’) memories. Moreover, the coherency of the data is automatically handled by the library. This removes the necessity for manual memory transfers (cudaMemcpy) between the host and GPU memories. Furthermore, GMAC assigns a different “virtual GPU” to each host thread, and the virtual GPUs are evenly mapped to physical GPUs. This is especially useful for multi-GPU programs since each host thread can access the memory of all GPUs and simple GPU-to-GPU transfers can be performed with simple memcpy calls. Read the rest of this entry »

PEER 1 Hosting: Large-Scale Hosted NVIDIA GPU Cloud

February 10th, 2011

Press release (submitted to very late…):

LOS ANGELES,CA – July 26, 2010 – PEER 1 Hosting (TSX:PIX), a global online IT hosting provider, today announced the availability of the industry’s first large-scale, hosted graphics processing unit (GPU) Cloud at the 37th Annual Siggraph International Conference.

The system runs the RealityServer® 3D web application service platform, developed by mental images, a wholly owned subsidiary of NVIDIA. The RealityServer platform is a powerful combination of NVIDIA Tesla GPUs and 3D web services software. It delivers interactive and photorealistic applications over the web using the iray® renderer, which enables animators, product designers, architects and consumers to easily visualize 3D scenes with remarkable realism. Read the rest of this entry »

Page 40 of 110« First...102030...3839404142...506070...Last »