Boost.Compute is an open-source, header-only C++ library for GPGPU and parallel-computing based on OpenCL. It provides a low-level C++ wrapper over OpenCL and high-level STL-like API with containers and algorithms for the GPU. Boost.Compute is available on GitHub and its documentation can be found here. See the full announcement here: http://kylelutz.blogspot.com/2014/12/boost-compute-0.4-released.html
The wide majority of current state-of-the-art compressed GPU volume renderers are based on block-transform coding, which is susceptible to blocking artifacts, particularly at low bit-rates. In this paper the authors address the problem for the first time, by introducing a specialized deferred filtering architecture working on block-compressed data and including a novel deblocking algorithm. The architecture efficiently performs high quality shading of massive datasets by closely coordinating visibility- and resolution-aware adaptive data loading with GPU-accelerated per-frame data decompression, deblocking, and rendering. A thorough evaluation including quantitative and qualitative measures demonstrates the performance of our approach on large static and dynamic datasets including a massive 512^4 turbulence simulation (256GB), which is aggressively compressed to less than 2 GB, so as to fully upload it on graphics board and to explore it in real-time during animation.
(Fabio Marton, José Antonio Iglesias Guitián, Jose Díaz and Enrico Gobbetti: “Real-time deblocked GPU rendering of compressed volumes”. Proc. 19th International Workshop on Vision, Modeling and Visualization (VMV), pp. 167-174, Oct. 2014. [WWW])
The 23rd High Performance Computing Symposium (HPC’15) is held in conjunction with the SCS Spring Simulation Multiconference (SpringSim’15), April 12-15, 2015, in Alexandria, VA, USA.
Topics of interest include:
- High performance/large scale application case studies
- GPU for general purpose computations (GPGPU)
- Multicore and many-core computing
- Power aware computing
- Cloud, distributed, and grid computing
- Asynchronous numerical methods and programming
- Hybrid system modeling and simulation
- Large scale visualization and data management
- Tools and environments for coupling parallel codes
- Parallel algorithms and architectures
- High performance software tools
- Resilience at the simulation level
- Component technologies for high performance computing
More information: http://hosting.cs.vt.edu/hpc2015.
PARALUTION is a library for sparse iterative methods which can be performed on various parallel devices, including multi-core CPU, GPU (CUDA and OpenCL) and Intel Xeon Phi. The new 0.8.0 release provides the following extra features:
- Complex support
- TNS, Variable preconditioner
- BiCGStab(l), QMRCGStab, FCG solvers
- RS and PairWise AMG
- SIRA eigenvalue solver
- Replace/Extract column/row functions
- Stencil computation
For details, visit http://www.paralution.com.
Massive exploration of perturbed conditions of the blood coagulation cascade through GPU parallelizationNovember 3rd, 2014
The introduction of general-purpose Graphics Processing Units (GPUs) is boosting scientific applications in Bioinformatics, Systems Biology, and Computational Biology. In these fields, the use of high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC), defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations.
(Cazzaniga P., Nobile M.S., Besozzi D., Bellini M., Mauri G.: “Massive exploration of perturbed conditions of the blood coagulation cascade through GPU parallelization”. BioMed Research International, vol. 2014. [DOI])
This webinar provides an overview of the improved analysis performance tools available in CUDA 6.0 and key optimization strategies for compute, latency and memory bound problems. The webinar includes techniques for ensuring peak utilization of CUDA cores, how to improve branching efficiency, intrinsic functions and loop unrolling. Optimal access patterns for global and shared memory are presented, including a comparison between the Fermi and Kepler architectures. To view the webinar go to: http://acceleware.com/blog/webinar-essential-cuda-optimization-techniques
Since 2011, the most powerful supercomputers systems ranked in the Top500 list have been hybrid systems composed of thousands of nodes that includes CPUs and accelerators, as Xeon Phi and GPUs. Programming and deploying applications on those systems is still a challenge due to complexity of the system and the need to mix several programming interfaces (MPI, CUDA, Intel Xeon Phi) in the same application. This special issue of the International Journal of Computers & Electrical Engineering is aimed at exploring the state of the art of developing applications in accelerated massive HPC architectures, including practical issues of hybrid usage models with MPI, OpenMP, and other accelerators programming models. The idea is to publish novel work on the use of available programming interfaces (MPI, CUDA, Intel Xeon Phi) and tools for code development, application performance optimizations, application deployment on accelerated systems, as well as the advantages and limitations of accelerated HPC systems. Experiences with real-world applications, including scientific computing, numerical simulations, healthcare, energy, data-analysis, etc. are also encouraged.
The goal of this workshop is to provide a forum to discuss new and emerging general-purpose purpose programming environments and platforms, as well as evaluate applications that have been able to harness the horsepower provided by these platforms. This year’s work is particularly interested on new heterogeneous GPU platforms, new forms of concurrency, and novel/irregular applications that can leverage these platforms. Papers are being sought on many aspects of GPUs, including (but not limited to): Read the rest of this entry »
Developed in partnership with NVIDIA, this hands-on four day course will teach you how to write and optimize applications that fully leverage the multi-core processing capabilities of the GPU. This course will have a finance focus. Commonly used algorithms such as random number generation and Monte Carlo simulations will be used and profiled in examples. A background in finance is not necessary. For more information please visit: http://acceleware.com/training/988
The Cf4ocl project is a GPLv3/LGPLv3 initiative to provide an object-oriented interface to the OpenCL C API with integrated profiling, promoting the rapid development of OpenCL host programs and avoiding boilerplate code. Its main goal is to allow developers to focus on OpenCL device code. After two alpha releases, the first beta is out, and can be tested on Linux, Windows and OS X. The framework is independent of the OpenCL platform version and vendor, and includes utilities to simplify the analysis of the OpenCL environment and of kernel requirements. While the project is making progress, it doesn’t yet offer OpenGL/DirectX interoperability, support for sub-devices, and doesn’t support pipes and SVM.
Cf4ocl can be downloaded from http://fakenmc.github.io/cf4ocl/.