high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library

Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library

Zhiyi Zhang, Pengfei Zhang, Qi Wang

Hefei Institutes of Physical Science, Chinese Academy of Sciences

arXiv:2305.08819 [cs.LG], (15 May 2023)

DOI:10.48550/arXiv.2305.08819

@misc{zhang2023dragonalphacu32,

title={Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library},

author={Zhiyi Zhang and Pengfei Zhang and Qi Wang},

year={2023},

eprint={2305.08819},

archivePrefix={arXiv},

primaryClass={cs.LG}

}

Download (PDF)

View

Source

Source codes

Package:

Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library

955

views

Java is very powerful, but in Deep Learning field, its capabilities probably has not been sufficiently exploited. Compared to the Java-based deep-learning-frameworks, the Python-based (PyTorch, TensorFlow, etc) are undoubtedly the mainstream, due to their easy-to-use, flexibility and better ecosystem. Dragon-Alpha is a Java-based Tensor Computing Framework, with easy-to-use, high-scalability and high-performance, trying to break Java’s dilemma in deep learning field and make it more effective. Dragon-Alpha supports different levels of APIs, and can be used as a deep-learning-framework through its user-friendly high-level APIs. Dragon-Alpha has potential to aggregate computing-power across heterogeneous platforms and devices, based on its multi-layer architecture and Java’s big-data ecosystem. Dragon-Alpha has its asynchronized APIs to improve parallelism, and highly-optimized CUDA library cu32 which adopts unique convolutiondeconvolution operators for small feature maps. The experiments show that, compared to PyTorch&cuDNN, Dragon-Alpha&cu32 costs less time and memory (75.38% to 97.32%, 29.2% to 66.4%), to train some typical neural networks (AlexNet, VGG, GoogleNet, ResNet) on Cifar-10.

Tags: Computer science, CUDA, Deep learning, Heterogeneous systems, Java, nVidia, nVidia GeForce RTX 3060 Ti, Package

May 21, 2023 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library

Package:

Your response

Recent source codes

NVIDIA Nemotron Parse 1.1

ThunderKittens: Tile primitives for speedy kernels

Iris: AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Most viewed papers (last 30 days)

Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)