CUDA- and NUMA-Aware Multi-CPU / Multi-GPU communication benchmarks for C3SR Scope.
Leverage the TensorCore in modern GPUs and DNN accelerators to implement a high performance reduction and scan primitives.
A hardware/software agnostic, extensible and customizable platform for evaluating and profiling ML models across datasets/frameworks/hardware, and within AI application pipelines.
Cognitive Application Builder