rCUDA now available for the ARM architecture

July 26th, 2013

The rCUDA team is glad to announce that its remote GPU virtualization technology now supports the ARM processor architecture. The new release of rCUDA for this low-power processor has been developed for the Ubuntu 11.04 and Ubuntu 12.04 ARM linux distributions. With this new rCUDA release, it is also possible to leverage hybrid platforms where the application uses ARM CPUs while requesting acceleration services provided by remote GPUs installed in x86 nodes. The opposite is also possible: an application running in an x86 computer can access remote GPUs attached to ARM systems. Please visit rCUDA website for more information or for requesting a free copy of the rCUDA middleware.

Back Testing of HFT Strategies with Xcelerit and GPUs

July 26th, 2013

Algorithmic trading has become ever more popular in recent years – accounting for approximately half of all European and American stock trades placed in 2012. The trading strategies need to be back-tested regularly using historical market data for calibration and to check the expected return and risk. This is a computationally demanding process that can take hours to complete. However, back-testing the strategies frequently intra-day can significantly increase the profits for the trading institution.

Read the rest of this entry »

OpenCV and CUDA webinar, July 30th

July 23rd, 2013

Anatoly Baksheev, OpenCV GPU Module Team Leader at Itseez will demonstrate how to obtain and build OpenCV, its GPU module, and the sample programs. You will learn how to use the OpenCV GPU module and create your own custom GPU functions for OpenCV. Register for the July 30th webinar: http://goo.gl/5V3eA

Heterogeneous compute event during Siggraph 2013

July 19th, 2013

The HSA Foundation will be hosting a Birds of a Feather session on heterogeneous computing on July 24 from 1-2 p.m., at the Anaheim Convention Center, Room 202B. For more info: http://slidesha.re/16JSqK7

GPU Technology Conference 2014 Call for Submissions

July 14th, 2013

GPU Technology Conference (GTC) is NVIDIA’s annual developer event and consistently attracts the world’s best and brightest GPU developers, creating opportunities for connection and learning through technical sessions and in-depth tutorials in science, professional graphics, game development, mobile computing, cloud computing and automotive applications, as well as first-hand interactions with peers, luminaries, and emerging and established companies.

If you are doing innovative work using GPU, please submit a proposal at https://gtc2014.consenseus.com/

The deadline is Friday, September 27.

Acceleware Training

July 14th, 2013

Acceleware recently announced a couple of courses:

  • CUDA for Finance: December 10 – 13, 2013, New York, NY [Details]
  • OpenCL: October 22 – 25, 2013, Houston, TX [details]
  • CUDA: September 24-27, [Details]
  • C++ AMP: September 10-13, [Details]

 

Acceleration of iterative Navier-Stokes solvers on graphics processing units

July 14th, 2013

Abstract:

While new power-efficient computer architectures exhibit spectacular theoretical peak performance, they require specific conditions to operate efficiently, which makes porting complex algorithms a challenge. Here, we report results of the semi-implicit method for pressure linked equations (SIMPLE) and the pressure implicit with operator splitting (PISO) methods implemented on the graphics processing unit (GPU). We examine the advantages and disadvantages of the full porting over a partial acceleration of these algorithms run on unstructured meshes. We found that the full-port strategy requires adjusting the internal data structures to the new hardware and proposed a convenient format for storing internal data structures on GPUs. Our implementation is validated on standard steady and unsteady problems and its computational efficiency is checked by comparing its results and run times with those of some standard software (OpenFOAM) run on central processing unit (CPU). The results show that a server-class GPU outperforms a server-class dual-socket multi-core CPU system running essentially the same algorithm by up to a factor of 4.

See also supplementary materials and the follow up at http://vratis.com/blog/?p=7.

(Tadeusz Tomczak, Katarzyna Zadarnowska, Zbigniew Koza, Maciej Matyka and Łukasz Mirosław: “Acceleration of iterative Navier-Stokes solvers on graphics processing units”, International Journal of Computational Fluid Dynamics, accepted, July 2013. [DOI])

AMD Releases APP SDK 2.8.1 with support for Bolt C++ Template Library, OpenCV, and GCN

July 14th, 2013

From a recent press release:

AMD’s APP SDK is an essential resource for developers who wish to leverage the processing power of heterogeneous computing. OpenCL™ is the primary mechanism for achieving this today, but AMD’s goal is to enable developers to accelerate applications with the programming paradigm of their choice. Toward that end, AMD has added support for heterogeneous libraries such as the newly released Bolt open source C++ template library and OpenCV computer vision library which now includes heterogeneous acceleration.

New to APP SDK 2.8.1:

Bolt: With the recent launch of Bolt 1.0, AMD has added several samples to the APP SDK to demonstrate Bolt 1.0 features. These showcase the usage of Bolt APIs such as scan, sort, reduce and transform. Other new samples highlight the ease of porting from STL and the performance benefits achieved over equivalent STL implementations. We’ve also included samples to demonstrate the different fallback options available in Bolt 1.0 when no GPU is available which ensure your code runs correctly on any platform.

OpenCV: AMD has been working closely with the OpenCV open source community to add heterogeneous acceleration capability to the world’s most popular computer vision library. These changes are already integrated into OpenCV and are readily available for developers who want to improve performance and efficiency of their computer vision applications. AMD has included samples to illustrate these improvements and highlight how simple it is to include them in your app.

GCN: AMD recently launched its new Graphics Core Next (GCN) architecture on several AMD products. GCN is based on a scalar architecture vs. the VLIW vector architecture of prior generations, so hand-tuned vectorization to optimize hardware utilization is no longer needed. We’ve modified several samples in AMD APP SDK 2.8.1 to show the ease of writing scalar code as compared to vectorization.

For more information, see developer.amd.com.

Thrust v1.7 Released

July 4th, 2013

The Thrust team is pleased to announce the release of Thrust v1.7, an open-source C++ library for developing high-performance parallel applications. Modeled after the C++ Standard Template Library, Thrust brings a familiar abstraction layer to the realm of parallel computing

Thrust 1.7.0 introduces a new interface for controlling algorithm execution as well as several new algorithms and performance improvements. With this new interface, users may directly control how algorithms execute as well as details such as the allocation of temporary storage. Key/value versions of thrust::merge and the set operation algorithms have been added, as well stencil versions of partitioning algorithms. For 32b types, new CUDA merge and set operations provide 2-15x faster performance while a new CUDA comparison sort provides 1.3-4x faster performance.

Thrust is open-source software distributed under the OSI-approved Apache License 2.0.

LibBi: Bayesian State-Space Modelling and Inference on GPU

June 24th, 2013

Abstract:

LibBi is a software package for state-space modelling and Bayesian inference on modern computer hardware, including multi-core central processing units (CPUs), many-core graphics processing units (GPUs) and distributed-memory clusters of such devices. The software parses a domain-specific language for model specification, then optimises, generates, compiles and runs code for the given model, inference method and hardware platform. In presenting the software, this work serves as an introduction to state-space models and the specialised methods developed for Bayesian inference with them. The focus is on sequential Monte Carlo (SMC) methods such as the particle filter for state estimation, and the particle Markov chain Monte Carlo (PMCMC) and SMC^2 methods for parameter estimation. All are well-suited to current computer hardware. Two examples are given and developed throughout, one a linear three-element windkessel model of the human arterial system, the other a nonlinear Lorenz ’96 model. These are specified in the prescribed modelling language, and LibBi demonstrated by performing inference with them. Empirical results are presented, including a performance comparison of the software with different hardware configurations.

(Lamwrence M. Murray: “Bayesian state-space modelling on high-performance hardware using LibBi”, Preprint, June 2013. [arXiv])

Page 11 of 109« First...910111213...203040...Last »