You are here: Home » Archives for Compilers
July 29th, 2010
Abstract:
Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (PTX) to many-core processors that leverages the Low Level Virtual Machine (LLVM) code generator to target x86 and other ISAs. The dynamic compiler is able to execute existing CUDA binaries without recompilation from source and supports switching between execution on an NVIDIA GPU and a many-core CPU at runtime. It has been validated against over 130 applications taken from the CUDA SDK, the UIUC Parboil benchmark, the Virginia Rodinia benchmarks, the GPU-VSIPL signal and image processing library, the Thrust library, and several domain specific applications.
This paper presents a high level overview of the implementation of the Ocelot dynamic compiler highlighting design decisions and trade-offs, and showcasing their effect on application performance. Several novel code transformations are explored that are applicable only when compiling explicitly parallel applications and traditional dynamic compiler optimizations are revisited for this new class of applications. This study is expected to inform the design of compilation tools for explicitly parallel programming models (such as OpenCL) as well as future CPU and GPU architectures.
This paper identifies several key areas of research and open problems for optimizing the performance of data parallel programs (such as CUDA and OpenCL) that were encountered when designing a binary translator from PTX to LLVM/x86. The complete implementation of Ocelot is available open-source under the new BSD license at http://code.google.com/p/gpuocelot. Ongoing work involves translating PTX to AMD’s IL allowing CUDA programs to be executed on AMD GPUs, developing parallel-aware PTX to PTX optimizations, and exploring new programming and execution models that are layered on PTX.
(Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili and Nathan Clark: “Ocelot: A dynamic compiler for bulk-synchroneous applications in heterogeneous systems”. 19 International Conference on Parallel Architectures and Compilation Techniques (PACT2010), September 2010).
Posted in Developer Resources, Research | Tags: Compilers, Heterogeneneous Computing, NVIDIA CUDA, Ocelot, Papers | 1 Comment
June 6th, 2010
CAPS has recently added an OpenCL code generator to the just released 2.3 version of its HMPP directive-based hybrid compiler. Also, the CUDA back-end generator has been enhanced with Fermi capabilities and this new release brings support for more native compilers with Intel ifort/icc, GNU gcc/gfortran and PGI pgcc/pgfort compilers, enabling developers to freely use their favorite compiler with HMPP 2.3.
Based on GPU programming and tuning directives, HMPP offers an incremental programming model that allows developers with different levels of expertise to fully exploit GPU hardware accelerators in their legacy code. Read the rest of this entry »
Posted in Business, Developer Resources | Tags: Compilers, Programming Environments | Write a comment
April 12th, 2010
Abstract:
A new compilation framework enables the execution of numerical-intensive applications, written in Python, on a hybrid execution environment formed by a CPU and a GPU. This compiler automatically computes the set of memory locations that need to be transferred to the GPU, and produces the correct mapping between the CPU and the GPU address spaces. Thus, the programming model implements a virtual shared address space. This framework is implemented as a combination of unPython, an ahead-of-time compiler from Python/NumPy to the C++ programming language, and jit4GPU, a just-in-time compiler to the AMD CAL interface using CAL pixel shaders. Jit4GPU includes an optimizer that performs several loop transformations and reduces the number of texture instructions. Experimental evaluation was done on a Radeon 4850 and demonstrates that for some benchmarks the generated GPU code is 50 times faster than generated OpenMP code. The GPU performance also compares favorably with optimized CPU BLAS code for single-precision computations in most cases. Code transformations performed by Jit4GPU on GPU code were also shown to produce considerable speedup compared to unoptimized GPU code.
(Rahul Garg and José Nelson Amaral: “Compiling Python to a Hybrid Execution Environment”. Third Workshop on General-Purpose Computation on Graphics Processing Units, held in conjunction with ASPLOS XV, Pittsburgh, PA, March, 2010. [DOI])
Posted in Research | Tags: AMD CAL, ATI Stream, Compilers, Papers, Python | Write a comment
November 24th, 2009
The Portland Group has announced the general availability of its CUDA Fortran compiler for x64 and x86 processor-based systems running Linux, Mac OS X and Windows, including a 15-day trial license. From the press release:
Developed in collaboration with NVIDIA Corporation (Nasdaq: NVDA), the inventor of the graphics processing unit (GPU), PGI Release 2010 includes the first Fortran compiler compatible with the NVIDIA line of CUDA-enabled GPUs. A compiler is a software tool that translates applications from the high-level programming languages in which they are written by software developers into a binary form a computer can execute.
With developers taking advantage of the hundreds of cores and the relatively low cost of NVIDIA GPUs, programming to take advantage of the CUDA C compiler has become a popular means for accelerating the solution of complex computing problems. The PGI CUDA Fortran compiler is expected to accelerate GPU adoption even further in the High-Performance Computing (HPC) industry, where many important applications are written in Fortran. HPC is the field of technical computing engaged in the modeling and simulation of complex processes, such as ocean modeling, weather forecasting, environmental modeling, seismic analysis, bioinformatics and other areas.
The CUDA Fortran compiler is compatible with all NVIDIA GPUs that include Compute Capability 1.3 or higher, which includes most NVIDIA Quadro Professional Graphics solutions and all NVIDIA Tesla GPU Computing solutions. Developers are invited to download the PGI CUDA Fortran compiler from The Portland Group website at www.pgroup.com/support/downloads.php.
A 15-day trial license is available at no charge. In an effort to simplify adoption, NVIDIA has granted PGI rights to redistribute the relevant components of the CUDA Software Development Kit (SDK) as part of the PGI CUDA Fortran installation package.
Posted in Developer Resources | Tags: Compilers, Fortran, NVIDIA CUDA | Write a comment
September 29th, 2009
A public beta release of the CUDA-enabled Fortran Compiler from PGI enables programmers to write code in Fortran for NVIDIA CUDA GPUs. From a press release:
What: NVIDIA today announced that a public beta release of the PGI® CUDA-enabled Fortran compiler is now available. Developed in collaboration with The Portland Group® , it is the first Fortran compiler compatible with NVIDIA® CUDA™ -enabled graphics processing units (GPUs).
A compiler is a software tool that translates applications from the high-level programming languages used by software developers into a binary form a computer can execute.
Why: GPU computing with the CUDA C-compiler has gained significant momentum in the High-Performance Computing (HPC) space as it enables developers to get transformative increases in performance with minimal coding required.
Fortran is particularly well suited to numeric computation and scientific computing and remains widely used in a wide range of applications such as weather modeling, computational fluid dynamics and seismic processing.
Where can I get it?: Read the rest of this entry »
Posted in Developer Resources, Press | Tags: Compilers, Fortran, NVIDIA CUDA, Programming Languages | 2 Comments
June 24th, 2009
Yesterday The Portland Group announced the release of version 9.0 of its Fortran and C compilers with support for GPUs and x64 multi-core CPUs. An introduction to PGI Accelerator Fortran and C programming is available online, as is the PGI Accelerator v1.0 specification. Evaluation copies of the new PGI 9.0 compilers are available from The Portland Group web site. Registration is required.
From the press release:
The use of Graphics Processing Units (GPUs) as general purpose accelerators has been a growing trend in high-performance computing (HPC). Until now, use of GPUs from Fortran applications has been extremely limited. Developers targeting GPU accelerators have had to program in C at a detailed level using sequences of function calls to manage movement of data between the x64 host and GPU, and to offload computations from the host to the GPU. The PGI Accelerator Fortran and C compilers automatically analyze whole program structure and data, split portions of an application between a multi-core x64 CPU and a GPU as specified by user directives, and define and generate a mapping of loops to automatically use the parallel cores, hardware threading capabilities and SIMD vector capabilities of modern GPUs.
Read the rest of this entry »
Posted in Developer Resources | Tags: C/C++, Compilers, Fortran, NVIDIA CUDA | Write a comment
December 3rd, 2008
At SC08, Aggregate.Org/University of Kentucky demonstrated open source technology for running arbitrary MIMD programs directly on GPUs. There are two environments for MOG, a simulator which interprets the MIMD code and a “Meta-State Converter” compilation system which does state space transformation of MIMD code into pure (SIMD) native GPU code. Applying the current version of either, MIMD C code using shared memory communication can do recursion, etc., while running on a CUDA GPU. Support for both C and Fortran, with both shared memory and MPI for communications, and support of both NVIDIA CUDA and ATI CAL targets, is planned. The work is very new, but detailed publications, performance benchmarks, and code releases are expected to start to appear by early next year. (MOG at SC08)
Posted in Developer Resources | Tags: Compilers | Write a comment
August 19th, 2004
Linear expressions constitute one of the most basic operations in scientific computations. This paper by proposes a SIMD code optimization technique that enables efficient shader codes to be generated for evaluating linear expressions. Performance can be improved considerably by efficiently packing arithmetic operations into four-wide SIMD instructions through reordering of the operations in linear expressions. We demonstrate that this technique can be used effectively for programming both vertex and pixel shaders for a variety of mathematical applications. (SIMD Optimization of Linear Expressions for Programmable Graphics Hardware. C. Bajaj, I. Ihm, J. Min, and J. Oh)
Posted in Research | Tags: Compilers, Optimization, Papers | Write a comment