The CU2CL (CUDA-to-OpenCL) translator is implemented as a Clang Tool. The tool interface allows a single invocation of CU2CL to translate all the CUDA source files that make up a complete, fully-linked executable, rather than performing a single invocation for each translation unit. CU2CL utilizes Clang's CUDA front-end to perform preprocessing, parsing, and abstract syntax tree (AST) generation. It then makes use of a number of Clang's libraries throughout the translation process. These libraries provide core functionalities to the translator like file management (Basic), AST traversal and information retrieval (AST), the tool interface (libTooling), preprocessor token access (Lex), and rewriting mechanisms (Rewrite/Replacements). By utilizing these libraries, a robust CUDA-to-OpenCL translator was realized in just over 4000 lines of code.
CU2CL performs a translation process we refer to as "AST-Driven, String Based Translation". This process uses the AST generated by Clang to identify sections of source code that contain CUDA structures that need to be translated. Once such a structure is identified, the translator recurses into each of the individual components of the structure to look for further sub-structures that may need translation. Once all sub-structures are checked, the translator performs highly-localized string-based rewrites of the CUDA structures from the bottom up.
For example, consider a typical CUDA kernel math function, the native power.
CU2CL first identifies that this is a CUDA kernel function that must be translated.
It then identifies that the function has two children, for its first and second parameters.
These are standard C arrays so will not need to be translated. However, CU2CL must check whether their indices are CUDA structures. In this case, they are the CUDA specific threadIdx structures.
Finally, as the array indices were structs, CU2CL recurses one step further to determine the specific fields which are referenced.
After reaching the AST leaf nodes, the translator must rewrite the text for each of these nodes that is either a CUDA structure or contains a child which is a CUDA structure. Upon returning from each recursive call the rewritten text corresponding to child nodes is integrated with the rewritten text from the current node, and returned to the parent.
So for the native power translation, CU2CL first rewrites the threadIdx fields for each of the arguments.
The rewritten zero and one are then passed upwards to their parent nodes, which rewrite the threadIdx structure into the corresponding OpenCL get_local_id call, using the zero and one as parameters to the call.
This rewritten structure is then passed upwards to the parent nodes associated with the x and y arrays. No special rewriting is performed on the arrays themselves, but new strings are returned to the parent node with the original CUDA threadIdx indices replaced.
Finally, after receiving the rewritten strings for both function parameters, CU2CL translates the __powf function itself.
Once the CUDA structure is fully translated it is then inserted into the source code as a direct replacement to the original structure, without modifying the surrounding code. By providing this highly-localized translation mechanism, the original formatting and commenting in source code is preserved, significantly simplifying maintenance of the automatically translated OpenCL code.
As a thanks for supporting our work, sponsors receive early access to all major releases. If you are interested in becoming a sponsor, please contact us.