Exact String Matching algorithms are heavily used in a lot of applications like antivirus engines, DNA sequencing, text editors etc. This project provides CUDA implementations of the naive, horspool and quicksearch algorithms, including a performance comparison against CPU versions: http://code.google.com/p/exactstrmatchgpu.
Today NVIDIA announced the upcoming 4.0 release of CUDA. While most of the major CUDA releases accompanied a new GPU architecture, 4.0 is a software-only release, but that doesn’t mean there aren’t a lot of new features. With this release, NVIDIA is aiming to lower the barrier to entry to parallel programming on GPUs, with new features including easier multi-GPU programming, a unified virtual memory address space, the powerful Thrust C++ template library, and automatic performance analysis in the Visual Profiler tool. Full details follow in the quoted press release below.
PFAC, the Parallel Failureless Aho-Corasick algorithm is a variant of the well-known Aho-Corasick (AC) algorithm with all failure transitions removed. The purpose of PFAC is to match all longest patterns in a given input stream against patterns pre-defined by users. The data-parallel nature of PFAC makes it perform well on GPUs, especially NVIDIA Fermi-based GPUs. The PFAC library, implemented in CUDA, provides a C level API that is easy to use. Users need not know CUDA programming. The user guide provides simple example to make it easy to use PFAC for content searches or virus detection on the GPU.
The PFAC library does not use multiple GPUs intrinsically but users can combine PFAC library with OpenMP or PThreads libraries to perform string matching on Multiple GPUs. The PFAC release includes OpenMP and PThreads examples. Download and further information: http://code.google.com/p/pfac/
Following in the footsteps of the highly successful GPU Users meetup groups in Brisbane, Sydney, Perth and Melbourne, Australia, new GPU meetup groups are popping up around the USA and other countries. Professional “meetup” groups have now formed in New York City, Silicon Valley, Boston, Chicago, Albuquerque and Tokyo, bringing practitioners together to discuss the applications, methods, and technical challenges of using GPUs for algorithm acceleration. The events are free to attend. More information can be found at http://gpu.meetup.com/.
Check out our User Groups page for more.
Proceedings from the 2nd International Workshop on High Performance and Hardware-Aware Computing (HIPHAC 2011) are now available from KIT Scientific Publishing. Individual copies can be ordered here, and the electronic proceedings are available free of charge.
FGC 2011 – The Second International Workshop on Frontier of GPU Computing, is held in conjunction with CSE 2011, Dalian, China, 24 – 26 August, 2011. More information can be found at http://www.comp.hkbu.edu.hk/~chxw/fgc2011/index.php.
The First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (co-located with ICS, June 4, 2011) is intended to provide evaluations of the characteristics of computational kernels and applications, and how different software stacks impact them, to guide future accelerator-based HPC system designs.
We solicit papers on all aspects of HPC application studies, especially those that involve accelerators such as GPUs, FPGAs, etc. The topics include (but are not limited to):
- Categorizing/characterizing of HPC applications and kernels with respect to patterns in computation structure, communication, cache accesses, memory, I/O, and file accesses.
- Evaluating the importance of individual kernels within an entire application.
- Modeling for applications running on accelerator-based heterogeneous HPC systems.
- Implication of workload characterization in heterogeneous design issues.
- Benchmarking of applications, kernels or software stacks and tools supporting applications.
The call for papers and more details about the workshop may be found on the website.
GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element MethodFebruary 13th, 2011
The paper discusses a fast implementation of the conjugate gradient iterative method with E-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results indicate that performance of electromagnetic simulations can be significantly improved thereby enabling full wave optimization of microwave components in more manageable time.
(A. Dziekonski, A. Lamecki and M. Mrozowski: “GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method”, IEEE Microwave and Wireless Components Letters 21(1) pp.1-3, Jan. 2011. [DOI])
A simple tool for off-line compilation of OpenCL kernel code, called “OpenCLcc”, is now available at
OpenCLcc takes a text file with the OpenCL kernel code as input and calls the OpenCL run-time to compile it, echoing errors to the console.
Proteins, nucleic acids, and small molecules form a dense network of molecular interactions in a cell. The architecture of molecular networks can reveal important principles of cellular organization and function, similarly to the way that protein structure tells us about the function and organization of a protein. Protein complexes are groups of proteins that interact with each other at the same time and place, forming a single multimolecular machine. Functional modules, in contrast, consist of proteins that participate in a particular cellular process while binding each other at a different time and place.
A protein-protein interaction network is represented as proteins are nodes and interactions between proteins are edges. Protein complexes and functional modules can be identified as highly interconnected subgraphs and computational methods are now inevitable to detect them from protein interaction data. In addition, High-throughput screening techniques such as yeast two-hybrid screening enable identification of detailed protein-protein interactions map in multiple species. As the interaction dataset increases, the scale of interconnected protein networks increases exponentially so that the increasing complexity of network gives computational challenges to analyze the networks. Read the rest of this entry »