Overview

Architectural Overview

The CU2CL (CUDA-to-OpenCL) translator is implemented as a Clang Tool. The tool interface allows a single invocation of CU2CL to translate all the CUDA source files that make up a complete, fully-linked executable, rather than performing a single invocation for each translation unit. CU2CL utilizes Clang's CUDA front-end to perform preprocessing, parsing, and abstract syntax tree (AST) generation. It then makes use of a number of Clang's libraries throughout the translation process. These libraries provide core functionalities to the translator like file management (Basic), AST traversal and information retrieval (AST), the tool interface (libTooling), preprocessor token access (Lex), and rewriting mechanisms (Rewrite/Replacements). By utilizing these libraries, a robust CUDA-to-OpenCL translator was realized in just over 4000 lines of code.

CU2CL Architecture

Translation Process

CU2CL performs a translation process we refer to as "AST-Driven, String Based Translation". This process uses the AST generated by Clang to identify sections of source code that contain CUDA structures that need to be translated. Once such a structure is identified, the translator recurses into each of the individual components of the structure to look for further sub-structures that may need translation. Once all sub-structures are checked, the translator performs highly-localized string-based rewrites of the CUDA structures from the bottom up.

For example, consider a typical CUDA kernel math function, the native power.

__powf(x[threadIdx.x], y[threadIdx.y]

CU2CL first identifies that this is a CUDA kernel function that must be translated.

__powf identified as func node

It then identifies that the function has two children, for its first and second parameters.

x[threadIdx.x] and y[threadIdx.y] identified as param nodes

These are standard C arrays so will not need to be translated. However, CU2CL must check whether their indices are CUDA structures. In this case, they are the CUDA specific threadIdx structures.

threadIdx.x and threadIdx.y identified as struct nodes

Finally, as the array indices were structs, CU2CL recurses one step further to determine the specific fields which are referenced.

x and y identified as field nodes

After reaching the AST leaf nodes, the translator must rewrite the text for each of these nodes that is either a CUDA structure or contains a child which is a CUDA structure. Upon returning from each recursive call the rewritten text corresponding to child nodes is integrated with the rewritten text from the current node, and returned to the parent.

So for the native power translation, CU2CL first rewrites the threadIdx fields for each of the arguments.

x and y rewritten to 0 and 1

The rewritten zero and one are then passed upwards to their parent nodes, which rewrite the threadIdx structure into the corresponding OpenCL get_local_id call, using the zero and one as parameters to the call.

threadIdx rewritten to get_local_id

This rewritten structure is then passed upwards to the parent nodes associated with the x and y arrays. No special rewriting is performed on the arrays themselves, but new strings are returned to the parent node with the original CUDA threadIdx indices replaced.

array indicies rewritten

Finally, after receiving the rewritten strings for both function parameters, CU2CL translates the __powf function itself.

function rewritten

Once the CUDA structure is fully translated it is then inserted into the source code as a direct replacement to the original structure, without modifying the surrounding code. By providing this highly-localized translation mechanism, the original formatting and commenting in source code is preserved, significantly simplifying maintenance of the automatically translated OpenCL code.

finished translation

Recent News

CU2CL 0.8.0b Released

03/21/17: We are pleased to announce the release of the 0.8.0b version of CU2CL. The whole program translation architecture debuted in CU2CL 0.7.0b has been expanded to include our first cross-AST translation and type propagations. A binary tarball is available with registration here and the full source is available on GitHub.

CU2CL at SC'16 Emerging Technologies Showcase

09/27/16: Sathre, Gardner and Feng have been selected to demonstrate CU2CL's effectiveness during the Emerging Technologies Showcase at Supercomputing'16 in Salt Lake City.

More publications ...

CU2CL Releases

Source Release 0.8.0b

→ Latest update (03/21/17)

Binary Release 0.8.0b

→ Latest update (03/21/17)

CU2CL License

→ Read | Download

CU•2•CL

Automating CUDA-to-OpenCL Translation