July 4th, 2010
Abstract:
In the ongoing arms race against malware, antivirus soft-ware is at the forefront, as one of the most important defense tools in our arsenal. Antivirus software is flexible enough to be deployed from regular users desktops, to corporate e-mail proxies and file servers. Unfortunately, the signatures necessary to detect incoming malware number in the tens of thousands. To make matters worse, antivirus signatures area lot longer than signatures in network intrusion detection systems. This leads to extremely high computation costs necessary to perform matching of suspicious data against those signatures.In this paper, we present GrAVity, a massively parallel antivirus engine.Our engine utilized the compute power of modern graphics processors,that contain hundreds of hardware microprocessors. We have modified ClamAV, the most popular open source antivirus software, to utilize our engine. Our prototype implementation has achieved end-to-end throughput in the order of 20 Gbits/s, 100 times the performance of the CPU-only ClamAV, while almost completely offloading the CPU, leaving it free to complete other tasks. Our micro-benchmarks have measured our engine to be able to sustain throughput in the order of 40 Gbits/s. The results suggest that modern graphics cards can be used effectively to perform heavy-duty anti-malware operations at speeds that cannot be matched by traditional CPU based techniques.
(Giorgos Vasiliadis and Sotiris Ioannidis. “GrAVity: A Massively Parallel Antivirus Engine”. In Proceedings of the 13th International Symposium On Recent Advances In Intrusion Detection (RAID). September 2010, Ottawa, Canada. Link to PDF.)
Posted in Research | Tags: Papers, Pattern Matching, Virus Detection | 5 Comments
July 4th, 2010
Abstract:
The expressive power of regular expressions has been often exploited in network intrusion detection systems, virus scanners, and Spam filtering applications. However, the flexible pattern matching functionality of regular expressions in these systems comes with significant overheads in terms of both memory and CPU cycles, since every byte of the inspected input needs to be processed and compared against a large set of regular expressions.
In this paper we present the design, implementation and evaluation of a regular expression matching engine running on graphics processing units (GPUs). The significant spare computational power and data parallelism capabilities of modern GPUs permits the efficient matching of multiple inputs at the same time against a large set of regular expressions. Our evaluation shows that regular expression matching on graphics hardware can result to a 48 times speedup over traditional CPU implementations and up to 16 Gbit/s in processing throughput. We demonstrate the feasibility of GPU regular expression matching by implementing it in the popular Snort intrusion detection system, which results to a 60% increase in the packet processing throughput.
(Giorgos Vasiliadis, Michalis Polychronakis, Spiros Antonatos, Evangelos P. Markatos and Sotiris Ioannidis: “Regular Expression Matching on Graphics Hardware for Intrusion Detection”. In Proceedings of the 12th International Symposium On Recent Advances In Intrusion Detection (RAID). September 2009, Saint-Malo, France. Link to PDF.)
Posted in Research | Tags: Intrusion Detection, Papers, Pattern Matching | Write a comment
July 4th, 2010
SagivTech plans to offer a 3-days course that deals with Image Processing with CUDA in the USA this September. This is an advanced course that is intended for experienced CUDA developers looking for optimization methods for image processing applications implemented on NVIDIA GPUs.
The course will be held in the San Francisco area, 9am to 5pm September 27-29.
Read the rest of this entry »
Posted in Business, Developer Resources, Events | Tags: Image Processing, NVIDIA CUDA, Tutorials & Courses | Write a comment
July 4th, 2010
Petapath, NVIDIA and Supermicro would like to invite researchers, students and industrial users to a series of free seminars and workshops dedicated to the Bio Workbench. The seminars will principally cover the use of AMBER 11′s CUDA-accelerated PMEMD (Particle Mesh Ewald Molecular Dynamics) tool but will be of interest to anyone using other molecular dynamics packages covered by the Bio Workbench.
Guest speakers include Ross Walker (SDSC) and Ian Gould (UCL) and currently there are two events being held in the UK, the 8th of July at Imperial College London and the 16th of July at The University of Manchester. Please visit www.petapath.com/nvidia to register.
Posted in Events | Tags: Molecular Dynamics, NVIDIA CUDA, Tutorials & Courses | Write a comment
June 23rd, 2010
Abstract:
We implement a high-order finite-element application, which performs the numerical simulation of seismic wave propagation resulting for instance from earthquakes at the scale of a continent or from active seismic acquisition experiments in the oil industry, on a large cluster of NVIDIA Tesla graphics cards using the CUDA programming environment and non-blocking message passing based on MPI. Contrary to many finite-element implementations, ours is implemented successfully in single precision, maximizing the performance of current generation GPUs. We discuss the implementation and optimization of the code and compare it to an existing very optimized implementation in C language and MPI on a classical cluster of CPU nodes. We use mesh coloring to efficiently handle summation operations over degrees of freedom on an unstructured mesh, and non-blocking MPI messages in order to overlap the communications across the network and the data transfer to and from the device via PCIe with calculations on the GPU. We perform a number of numerical tests to validate the single-precision CUDA and MPI implementation and assess its accuracy. We then analyze performance measurements and depending on how the problem is mapped to the reference CPU cluster, we obtain a speedup of 20x or 12x.
(Dimitri Komatisch, Gordon Erlebacher, Dominik Göddeke and David Michéa: “High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster”, accepted for publication in: Journal of Computational Physics, Jun. 2010. PDF preprint. DOI link.)
Posted in Research | Tags: Clusters, Finite Element Methods, High-Performance Computing, NVIDIA CUDA, Papers, Scientific Computing | Write a comment
June 18th, 2010
The OpenCL
1.1 specification, including header files and documentation, has been released. It includes significant new functionality:
- Host-thread safety, enabling OpenCL commands to be enqueued from multiple host threads
- Sub-buffer objects to distribute regions of a buffer across multiple OpenCL devices
- User events to enable enqueued OpenCL commands to wait on external events
- Event callbacks that can be used to enqueue new OpenCL commands based on event state changes in a non-blocking manner
- 3-component vector data types
- Global work-offset which enable kernels to operate on different portions of the NDRange
- Memory object destructor callback
- Read, write and copy a 1D, 2D or 3D rectangular region of a buffer object
- Mirrored repeat addressing mode and additional image formats
- New OpenCL C built-in functions such as integer clamp, shuffle and asynchronous strided copies
- Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL event objects to OpenGL fence sync objects
- Optional features in OpenCL 1.0 have been bought into core OpenCL 1.1 including: writes to a pointer of bytes or shorts from a kernel, and conversion of atomics to 32-bit integers in local or global memory
Posted in Developer Resources | Tags: OpenCL | Write a comment
June 18th, 2010
In response to the large number of requests from the community, the organizing committee of HiBi 2010 extend the deadline for paper and abstract submission from Monday June 21 to Thursday July 1, 2010.
The HiBi workshop establishes a forum to link researchers in the areas of parallel computing and computational systems biology. One of the main limitations in managing models of biological systems comes from the fundamental difference between the high parallelism evident in biochemical reactions and the sequential environments employed for the analysis of these reactions. Such limitations affect all varieties of continuous, deterministic, discrete and stochastic models; undermining the applicability of simulation techniques and analysis of biological models. The goal of HiBi is therefore to bring together researchers in the fields of high performance computing and computational systems biology. Experts from around the world will present their current work, discuss profound challenges, new ideas, results, applications and their experience relating to key aspects of high performance computing in biology.
Posted in Events, Research | Tags: Call for Papers, Computational Biology, Conferences, Workshops | Write a comment
June 18th, 2010

GPU-Accelerated Ion Placement
The Theoretical and Computational Biophysics Group, NIH Resource for Macromolecular Modeling and Bioinformatics (www.ks.uiuc.edu) at the University of Illinois at Urbana-Champaign, presents a Workshop on GPU Programming for Molecular Modeling to be held August 6-8, 2010, at the Beckman Institute for Advanced Science and Technology, on the University of Illinois campus in Urbana, Illinois, USA. Application, selection, and notification of participants is on-going through July 29, 2010.
Note: Participants are encouraged to attend the multi-site “Proven Algorithmic Techniques for Many-core Processors” workshop the preceding week (August 2-6) at the location of their choice. Registration for this workshop is required for participants without equivalent GPU-programming training or experience.
Posted in Developer Resources, Events | Tags: Molecular Dynamics, Multicore, Tutorials & Courses, Workshops | Write a comment
June 18th, 2010
OpenCurrent version 1.1.0 has been released. OpenCurrent is a library for solving certains types of PDEs over 3D cartesian grids. It supports single and double precision, and includes solvers for Poisson equations, diffusion, and incompressible Navier-Stokes.
New features:
- Multi-GPU communication library
- Multi-GPU versions of Multigrid solver, Incompressible Navier-Stokes solver, and more
- NetCDF support now optional
- Support for Fermi/CUDA 3.0
- Numerous bug fixes and enhancements
Get it here: http://code.google.com/p/opencurrent/downloads/list
Posted in Developer Resources | Tags: Fluid Simulation, Numerical Algorithms, NVIDIA CUDA, Programming Environments, Tools | Write a comment
June 15th, 2010
The Vienna Computing Library (ViennaCL) is a scientific computing library written in C++ and based on OpenCL. It allows simple, high-level access to the vast computing resources available on parallel architectures such as GPUs and is primarily focused on common linear algebra operations (BLAS level 1 and 2) and the solution of large systems of equations by means of iterative methods. The following iterative solvers are implemented:
- Conjugate Gradient (CG)
- Stabilized BiConjugate Gradient (BiCGStab)
- Generalized Minimum Residual (GMRES)
Read the rest of this entry »
Posted in Developer Resources | Tags: Linear Algebra, Numerical Algorithms, OpenCL | Write a comment