![]() ![]() Additionally, the chapter discusses GPU Ocelot's role as a dynamic compilation framework for heterogeneous many-core compute systems that leverage GPUs and multicore CPUs. This assists to explain how users may benefit from the rich application profiling and correctness tools built into Ocelot as well as how to extend Ocelot's trace generator interface to perform custom workload characterization and profiling. Ocelot's support for efficient execution on multicore CPUs has enabled research in heterogeneous computing such as predictive performance modeling and research in optimization techniques for data-parallel workloads. To enable the efcient migration of existing CUDA applications across diverse many-core architectures, this paper introduces a set of translation techniques, implemented in a framework called Ocelot, for mapping low level GPU specic operations to many-core architectures. ![]() GPU Ocelot is tested for correctness against all of the CUDA SDK, Parboil benchmark suite, and Thrust unit tests and is currently part of the development toolchains of several GPU-computing related projects. CUDA applications efciently on generic architectures. To use Ocelot, a developer links their compiled CUDA application against Ocelot's static library instead of NVIDIA's libcudart making integration with existing compiled applications seamless. This chapter discusses some implementation details of graphics processing unit (GPU) Ocelot, particularly the implementation of the PTX emulator, and how GPU Ocelot may be used to prototype, debug, and tune CUDA applications for efficient execution on GPUs.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |