Automating CUDA-to-OpenCL Translation

  • If I have seen further it is by standing on the shoulders of giants — Issac Newton


Release Notes

CU2CL Beta Release 0.7.0b - 2016.11.13

Welcome to the CU2CL (CUDA-to-OpenCL) source-to-source translator project. CU2CL is a work-in-progress academic prototype translator that endeavors to provide translation of many of the most frequently used CUDA features. However, CUDA is a large, moving target and CU2CL may not support all syntax and API elements of the current CUDA specification.

Currently it targets OpenCL 1.0 as the output standard. Further it aims to provide an easily-maintained translated source by preserving formatting and developer comments from the original CUDA, as well as injecting searchable comment tags for untranslated structures. (These are intended to aid manual post-translation intervention.)

The tool translated both host- and device-side code, generating separate families of output files suitable for a standard C++ compiler (*-cl.cpp/h files), as well as an OpenCL kernel compiler (* files). The majority of supported translations are focused on core Runtime API calls for memory, device, thread, stream, and event management as well as the CUDA C-style kernel invocation and full kernel function translation.

Additionally, the 0.7.0b release has added a range of new features to support global translation of large, multi-file codebases. Please see the full release notes for more information.

This release currently has limited or no support for several features including (but not limited to):

  • CUDA Textures and Surfaces
  • CUDA/OpenGL Interoperability
  • C++ Templates in kernel code
  • CUDA Driver API
  • CUDA Runtime API features newer than 3.2
  • CUDA Kernel features newer than 3.2
  • Precompiled CUDA libraries

We have endeavored to provide translation-time warnings for as many of these cases as possible, but the tool is undergoing active development and the CUDA specification is a moving target so there are likely cases we've missed. There is also a potential for the translator to produce small mistranslations in the presence of atypical usage of CUDA syntax. There will be bugs, and we'd love to hear about them so that we can improve the tool. Contact us!


Currently only tested and officially supported on 64-bit variants of Ubuntu Linux (14.04 and later) and Centos 6.5. (But likely to work on other 64-bit Debian and RHEL variants.)

  • make
  • cmake >= 2.8
  • wget
  • CUDA Runtime >=3.2
  • GNU C Library (glibc) >= 2.11.1

Ensure you have internet access as the install process will download the specific LLVM/Clang revision used to build the CU2CL binary.

Unpack this tarball, and invoke the BASH script to start the install process. The script will use wget to download the necessary revision of the LLVM/Clang source, and unpack it to subdirectory “llvm-3.4” and create the build directory “llvm-build”. It will then configure cmake and build the LLVM/Clang binaries required by CU2CL. Compiling Clang can take some time, so by default the installation script is configured to use two make jobs for compilation. To change this behavior, edit line 23 of the script “make -j2” to reflect the desired number of jobs.

The script then symbolically links a small number of include files generated during the LLVM/Clang build file so they are locatable my the CU2CL build process. It then configures cmake to build CU2CL and performs the build. CU2CL is a single source tool, so no threaded make is necessary for this step.

After compiling LLVM/Clang and CU2CL, the installation should be complete. The binary cu2cl-tool in directory "cu2cl-build" can now be used to perform translations. Do not remove the "llvm-build" directory as CU2CL needs access to (If you have a local packaged install of this library, you may be able to use that, however this is untested.)


Once built, the cu2cl_tool binary can be used to translate sets of CUDA source files, along with any *.cu or *.cuh source files they include. However, it will not provide translation of any included headers specified with the system include "<...>" syntax. After successful translation, CU2CL will output *-cl.cpp, *, and *-cl.h files as appropriate for host, device, and header code. (*-cl.h headers will only be generated from *-.cuh input.)

In order to run the tool the following syntax must be used:

"./cu2cl_tool fooRuntimeMain.cpp -- -DGPU_ON -I ./fooIncludes" 

Which would generate fooRuntimeMain.cpp-cl.cpp,,,, and any associated translated header files.

Additionally, a set of CU2CL utility functions will be generated in cu2cl_util.c/h/cl. cu2cl_util.c must be compiled and linked into the finished executable for the linking to succeed, as it includes requisite initialization, cleanup, and other OpenCL utility functions.

It will not attempt to translate any *.c *.h, *.cpp, *.hpp or other included source file types, regardless of whether they contain CUDA syntax (variables, runtime function calls, or special syntax) or not, ONLY *.cu or *.cuh. It will also not attempt to translate any includes specified with the #include <...> syntax reserved for system headers - project headers should use the #include "..." syntax. Finally, it does not support the CUDA SDK Samples' shrUtils or cutils, and will likely emit malformed source code if they are present. Please manually refactor your code to handle these constraints before attempting translation.


I've included a (*.h, *.c, *.hpp, *.cpp) source file which contains CUDA, but I don't see any translator output!

While CU2CL will translate these file types if specified as the main translation file, it will not translate these types if they are specified as headers. Please convert them to *.cu or *.cuh types as appropriate if you wish them to be translated as includes.

I have CUDA syntax in my header files but I don't see any translator output!

CU2CL will not translate headers specified using the angle brace syntax reserved for system header files, only those using the appropriate double quotes syntax for locally-included files.

i.e. NOT

#include <localCUDAheader.cuh>

but instead

#include "localCUDAheader.cuh" 
My code uses NVIDIA's cUtils/shrUtils, but the translated code looks broken.

CU2CL does not support translation of these non-canonical CUDA libraries, and we do not intend to. Please manually refactor your code to not rely on any of these non-canonical functions.

My code uses macro FOO(a, b), why does it look broken in output OpenCL?

CU2CL's macro support is still rather basic. We are working on improving it's ability to handle preprocessor directives, but in the mean time, factoring out preprocessor directives as much as possible will dramatically improve the correctness of output.

Why are sections of my code surrounded my #ifdef, #ifndef not translated?

CU2CL acts like a compiler and will only provide translation of the conditionally compiled regions defined at translation time. If you wish to translate other regions, preprocessor definitions can be specified to the translation script.

i.e "./cu2cl_tool -- -D SOMEVAR=1 -I ./" 
My CUDA device variable that's declared in header file "foo.h" isn't translated, even though it's initialization in another file is. What's going on?

Rewriting the cl_mem type for device variables across files still has some issues. To improve translation, move global device variable declarations as close to their cudaMalloc and kernel functions as possible.

My well-packed 3-member vector type is taking up too much memory in OpenCL!

CU2CL currently targets OpenCL 1.0 compatibility which does not provide a three-member vector type. Therefore these are automatically upconverted to four-member vectors, which may cause issues with data alignment.

The templates/classes in my kernel code aren't translated!

OpenCL does not support C++ in device code, and CU2CL does not support translation of these features to C. Please manually remove all C++ from device code before translation.

The CUDA library my application relies on doesn't appear to be used anymore.

CU2CL cannot translate precompiled CUDA libraries. Manually convert your OpenCL code to use an equivalent OpenCL library after translation.

My OpenCL code is trying to call a nonexistent kernel function! / Invoking a CUDA function pointer doesn't appear to be working after translation.

We do not currently support recognizing CUDA kernel function pointers. You will need to manually adapt your OpenCL code to emulate this behavior.

My OpenCL kernel code is trying to include system headers! Why is it doing that?

This kernel code likely resided in a CUDA file which included these system headers. They were copied without modification into the OpenCL device source code file. Please manually remove them.

I'm getting errors about "invalid output constraint '=h' in asm" in CUDA headers from CUDA versions >= 5.0 and can't proceed with translation!

Some versions of Clang 3.2 have difficulty parsing some inline assembly from the CUDA headers (from CUDA versions >= 5.0). A workaround that allows translation to proceed in most cases --- those not requiring these specialized intrinsics for Compute Capability > 3.0 --- is to simply nullify these headers with a #define.

Try adding the "-D __SM_32_INTRINSICS_H__" override to the CU2CL invocation like so:

i.e "./cu2cl_tool -- -D __SM_32_INTRINSICS_H__ -I ./" 

One warning, if your CUDA code actually makes use of these intrinsics, the translation behavior after this workaround is essentially undefined.


Xilinx Harris

As a thanks for supporting our work, sponsors receive early access to all major releases. If you are interested in becoming a sponsor, please contact us.

Recent News

CU2CL 0.7.0b Released

11/13/16: We are pleased to announce the release of the LGPL-licensed 0.7.0b version of CU2CL. Based on Clang 3.4+'s libTooling interface, this new rebuild supports higher-fidelity whole program translation in a single invocation. A binary tarball is available with registration here and the full source is available on GitHub.

CU2CL at SC'16 Emerging Technologies Showcase

09/27/16: Sathre, Gardner and Feng have been selected to demonstrate CU2CL's effectiveness during the Emerging Technologies Showcase at Supercomputing'16 in Salt Lake City.

More publications ...

CU2CL Releases

Source Release 0.7.0b

Latest update (11/13/16)

Binary Release 0.7.0b

Latest update (11/13/16)

CU2CL License

Read | Download