18463

Posts

Sep, 16

Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization with Medical Applications

Medical applications challenge today’s text categorization techniques by demanding both high accuracy and ease-of-interpretation. Although deep learning has provided a leap ahead in accuracy, this leap comes at the sacrifice of interpretability. To address this accuracy-interpretability challenge, we here introduce, for the first time, a text categorization approach that leverages the recently introduced Tsetlin Machine. […]
Sep, 16

A deep learning approach to autonomous lunar landing

Over the past few years, in the huge field of Artificial Intelligence (AI), new Machine Learning techniques are playing a central role, proving to be very powerful and versatile. For this reason, it is expected that they could become protagonist of space applications and they are already under study. Thanks to the large availability of […]
Sep, 16

Benchmarking and Optimization of Gradient Boosted Decision Tree Algorithms

Gradient boosted decision trees (GBDTs) have seen widespread adoption in academia, industry and competitive data science due to their state-of-the-art performance in a wide variety of machine learning tasks. In this paper, we present an extensive empirical comparison of XGBoost, LightGBM and CatBoost, three popular GBDT algorithms, to aid the data science practitioner in the […]
Sep, 16

ZUCL: A ZYNQ UltraScale+ Framework for OpenCL HLS Applications

In this work, we are proposing the ZUCL framework for implementing and running OpenCL applications for the latest Xilinx ZYNQ UltraScale+ platform. ZUCL is a holistic framework addressing the FPGA OS infrastructure, high level synthesis (HLS) module implementation as well as the runtime management. ZUCL enables partial reconfiguration (PR) on this platform by providing an […]
Sep, 9

Efficient and Scalable k-Means on GPUs

k-Means is a versatile clustering algorithm widely used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU. We identify two main shortcomings of this approach. First, it requires expensive data exchange between […]
Sep, 9

Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU

Sparse matrix-vector multiplication (SpMV) can be used to solve diverse-scaled linear systems and eigenvalue problems that exist in numerous, and varying scientific applications. One of the scientific applications that SpMV is involved in is known as Configuration Interaction (CI). CI is a linear method for solving the non-relativistic Schroedinger equation for quantum chemical multi-electron systems, […]
Sep, 9

Doctor AI: Interpretable Deep Learning for Modeling Electronic Health Records

Deep learning recently has been showing superior performance in complex domains such as computer vision, audio processing and natural language processing compared to traditional statistical methods. Naturally, deep learning techniques, combined with large electronic health records (EHR) data generated from healthcare organizations have potential to bring dramatic changes to the healthcare industry. However, typical deep […]
Sep, 9

Using SIMD and SIMT vectorization to evaluate sparse chemical kinetic Jacobian matrices and thermochemical source terms

Accurately predicting key combustion phenomena in reactive-flow simulations, e.g., lean blow-out, extinction/ignition limits and pollutant formation, necessitates the use of detailed chemical kinetics. The large size and high levels of numerical stiffness typically present in chemical kinetic models relevant to transportation/power-generation applications make the efficient evaluation/factorization of the chemical kinetic Jacobian and thermochemical source-terms critical […]
Sep, 9

Cracks in the Sky: Abelian-Higgs Cosmic String Evolution with CUDA

Topological defects form at cosmological phase transitions by the Kibble mechanism, with cosmic strings and superstrings having the most interesting phenomenology. A rigorous analysis of their astrophysical consequences is limited by the availability of accurate numerical simulations, and therefore by hardware resources and computation time. Improving the speed and efficiency of existing codes is therefore […]
Sep, 2

Optimizing Communication for Clusters of GPUs

GPUs are frequently used to accelerate data-parallel workloads across a wide variety of application domains. While GPUs offer a large amount of computational throughput within a single node, the largest problems require a cluster of such devices communicating with different compute nodes across a network. These clusters can range in size from a small handful […]
Sep, 2

Performance Evaluation and Tuning of An OpenCL based Matrix Multiplier

Matrix multiplication is one of the fundamental building blocks of numerical linear algebra. It requires computer systems have huge computing capability and consumes much more power as problem size is increased. In this research, an OpenCL-based matrix multiplier is presented. When data are single precision floating-points, compared with the software simulations based on the Intel […]
Sep, 2

Implementing Strassen’s Algorithm with CUTLASS on NVIDIA Volta GPUs

Conventional GPU implementations of Strassen’s algorithm (Strassen) typically rely on the existing high-performance matrix multiplication (GEMM), trading space for time. As a result, such approaches can only achieve practical speedup for relatively large, "squarish" matrices due to the extra memory overhead, and their usages are limited due to the considerable workspace. We present novel Strassen […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: