CUDA finance course Dec 2-5, 2014, New York

October 22nd, 2014

Developed in partnership with NVIDIA, this hands-on four day course will teach you how to write and optimize applications that fully leverage the multi-core processing capabilities of the GPU. This course will have a finance focus. Commonly used algorithms such as random number generation and Monte Carlo simulations will be used and profiled in examples. A background in finance is not necessary. For more information please visit: http://acceleware.com/training/988

Cf4ocl Brings Object-Oriented API to OpenCL C API

October 22nd, 2014

The Cf4ocl project is a GPLv3/LGPLv3 initiative to provide an object-oriented interface to the OpenCL C API with integrated profiling, promoting the rapid development of OpenCL host programs and avoiding boilerplate code. Its main goal is to allow developers to focus on OpenCL device code. After two alpha releases, the first beta is out, and can be tested on Linux, Windows and OS X. The framework is independent of the OpenCL platform version and vendor, and includes utilities to simplify the analysis of the OpenCL environment and of kernel requirements. While the project is making progress, it doesn’t yet offer OpenGL/DirectX interoperability, support for sub-devices, and doesn’t support pipes and SVM.

Cf4ocl can be downloaded from http://fakenmc.github.io/cf4ocl/.

Release of OpenCLIPP 2.0: an OpenCL library for computer vision and image processing

October 16th, 2014

Version 2.0 of OpenCLIPP, an Open Source OpenCL library for computer vision and image processing primitives, bas been released. For more information about the library, for programming contributions and for download, please refer to the OpenCLIPP Website.

Approximate TF–IDF based on topic extraction from massive message stream using the GPU

October 16th, 2014

Abstract:

The Web is a constantly expanding global information space that includes disparate types of data and resources. Recent trends demonstrate the urgent need to manage the large amounts of data stream, especially in specific domains of application such as critical infrastructure systems, sensor networks, log file analysis, search engines and more recently, social networks. All of these applications involve large-scale data-intensive tasks, often subject to time constraints and space complexity. Algorithms, data management and data retrieval techniques must be able to process data stream, i.e., process data as it becomes available and provide an accurate response, based solely on the data stream that has already been provided. Data retrieval techniques often require traditional data storage and processing approach, i.e., all data must be available in the storage space in order to be processed. For instance, a widely used relevance measure is Term Frequency–Inverse Document Frequency (TF–IDF), which can evaluate how important a word is in a collection of documents and requires to a priori know the whole dataset.
To address this problem, we propose an approximate version of the TF–IDF measure suitable to work on continuous data stream (such as the exchange of messages, tweets and sensor-based log files). The algorithm for the calculation of this measure makes two assumptions: a fast response is required, and memory is both limited and infinitely smaller than the size of the data stream. In addition, to face the great computational power required to process massive data stream, we present also a parallel implementation of the approximate TF–IDF calculation using Graphical Processing Units (GPUs).
This implementation of the algorithm was tested on generated and real data stream and was able to capture the most frequent terms. Our results demonstrate that the approximate version of the TF–IDF measure performs at a level that is comparable to the solution of the precise TF–IDF measure.

(Ugo Erra, Sabrina Senatore, Fernando Minnella and Giuseppe Caggianese: “Approximate TF-IDF based on topic extraction from massive message stream using the GPU”, Information Sciences 292, pp.141-163, Feb. 2015. [DOI])

CfP: International Workshop on Data (Co-)Processing on Heterogeneous Hardware DAPHNE 2015

October 8th, 2014

The goal of this one-day workshop is to investigate challenges and opportunities for data processing on existing and upcoming heterogeneous hardware architectures. The workshop is co-located  to EDBT/ICDT 2015, March 23-27, Brussels, Belgium, and more information is available at http://daphne.uk.to.

Increased heterogeneity is one of the major current challenges in data processing on modern hardware. With multi-core CPUs, graphics cards, massively parallel accelerator cards (e.g. Intel Xeon Phi), heterogeneous mobile processors (e.g. ARM big.LITTLE) and FPGAs, we already face a huge variety of available processing devices with different capabilities, strengths and weaknesses. This trend is expected to accelerate in the near future, and tomorrow’s database systems will need to exploit and embrace this increased heterogeneity in order to keep up with the performance requirements of the modern information society.

Read the rest of this entry »

Workshop: Directives and Tools for Accelerators: A Seismic Programming Shift

October 8th, 2014

High-level, directive-based programming models have been rapidly gaining traction as a portable, productive means to develop application code for multicore platforms and accelerators. Due to their usability and portability, programming APIs such as OpenMP and OpenACC, are increasingly being adopted as an alternative to lower-level APIs such as CUDA and OpenCL. This workshop focuses on the use of directives to program accelerators, such as NVIDIA/AMD GPUs and coprocessors such as Intel’s Xeon Phi. It will also provide a forum for HPC application developers and technical managers to learn more about these high-level, directive-based programming strategies and their usage. It will also offer an opportunity for users to discuss their experiences and express their needs with respect to such a programming interface. More information: http://www.cacds.uh.edu/?q=ONGWorkshop

GPU Analytic SQL Database

September 12th, 2014

From a recent product announcement:

DeepCloud Whirlwind is an analytics only SQL database using modern GPUs for accelerated SQL processing. We see over 700x performance increase over a “well known” database on the same machine. Features include:

  • column based storage
  • vector processing
  • SSD optimized
  • smart compression – Ultra fast compression and decompression on the GPU
  • MySQL like API – works with many MySQL client tools
  • Oracle subset dialect
  • data skipping
  • zone maps
  • fast schema-light data loading

Use Whirlwind database technology to get maximum database performance from significantly cheaper hardware or go all out with a state of the art system built from modern components. Beta avalable now under the GPL at: http://deepcloud.co

Webinar Sep. 17: An Introduction to OpenCL using AMD GPUs

September 12th, 2014

This tutorial will begin with a brief overview of OpenCL and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, OpenCL syntax and work-item hierarchy. For more information and to register visit: http://acceleware.com/event/introduction-opencl-using-amd-gpus

CfP: 23rd High Performance Computing Symposium

September 5th, 2014

The 23rd High Performance Computing Symposium (April 12-15, 2015 in Alexandria, VA, USA) is devoted to the impact of high performance computing and communications on computer simulations. Topics of interest include:

  • GPU for general purpose computations (GPGPU)
  • Hybrid system modeling and simulation
  • Tools and environments for coupling parallel codes
  • Parallel algorithms and architectures
  • High performance software tools

Submission deadline for full papers: November 22, 2014. More information can be found at http://hosting.cs.vt.edu/hpc2015.

CUDPP 2.2 released

September 4th, 2014

CUDPP release 2.2 is a feature release that adds a new parallel primitive and improves some existing primitives. We have added cudppSuffixArray, a parallel skew algorithm (SA) implementation that computes the suffix array of a string. This suffix array primitive is now used in burrowsWheelerTransform, delivering better performance than CUDPP 2.1’s use of cudppStringSort. The new BWT is further used in cudppCompress, which is now faster than the original parallel compression and supports compression of text containing all possible unsigned char values. Some bugs in cudppMoveToFrontTransform and cudppStringSort have also been fixed. OS X users might also be interested in how we supported the use of OS X’s clang compiler in OS X Mavericks (10.9).

Page 1 of 10912345...102030...Last »