high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory

TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory

Andi Drebes, Lorenzo Chelini, Oleksandr Zinenko, Albert Cohen, Henk Corporaal, Tobias Grosser, Kanishkan Vadivel, Nicolas Vasilache

Inria and École Normale Supérieure Paris, France

hal-02441163, (16 January 2020)

@inproceedings{drebes2020tc,

title={TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory},

author={Drebes, Andi and Chelini, Lorenzo and Zinenko, Oleksandr and Cohen, Albert and Corporaal, Henk and Grosser, Tobias and Vadivel, Kanishkan and Vasilache, Nicolas},

booktitle={IMPACT 2020-10th International Workshop on Polyhedral Compilation Techniques},

year={2020}

}

Download (PDF)

View

Source

1604

views

Memristor-based, non-von-Neumann architectures performing tensor operations directly in memory are a promising approach to address the ever-increasing demand for energy-efficient, high-throughput hardware accelerators for Machine Learning (ML) inference. A major challenge for the programmability and exploitation of such Computing-InMemory (CIM) architectures consists in the efficient mapping of tensor operations from high-level ML frameworks to fixed-function hardware blocks implementing in-memory computations. We demonstrate the programmability of memristor-based accelerators with TC-CIM, a fully-automatic, end-to-end compilation flow from Tensor Comprehensions, a mathematical notation for tensor operations, to fixed-function memristor-based hardware blocks. Operations suitable for acceleration are identified using Loop Tactics, a declarative framework to describe computational patterns in a polyhedral representation. We evaluate our compilation flow on a system-level simulator based on Gem5, incorporating crossbar arrays of memristive devices. Our results show that TC-CIM reliably recognizes tensor operations commonly used in ML workloads across multiple benchmarks in order to offload these operations to the accelerator.

Tags: Compilers, Computer science, CUDA, Machine learning

February 9, 2020 by hgpu

Rating: 3.0/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory

Your response

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)

TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)