15340

Posts

Jan, 26

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. […]
Jan, 26

Compositional Compilation for Sparse, Irregular Data Parallelism

While contemporary GPU architectures are heavily biased towards the execution of predictably regular data parallelism, many real application domains are based around data structures which are naturally sparse and irregular. In this paper we demonstrate that high level programming and high performance GPU execution for sparse, irregular problems are not mutually exclusive. Our insight is […]
Jan, 26

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in […]
Jan, 26

Power Consumption Modeling and Prediction in a Hybrid CPU-GPU-MIC Supercomputer

Power consumption is a major obstacle for High Performance Computing (HPC) systems in their quest towards the holy grail of ExaFLOP performance. Significant advances in power efficiency have to be made before this goal can be attained and accurate modeling is an essential step towards power efficiency by optimizing system operating parameters to match dynamic […]
Jan, 26

4th International Symposium on Computational and Business Intelligence (IEEE-ISCBI), 2016

2016 4th International Symposium on Computational and Business Intelligence (ISCBI 2016) will be held in Olten, Switzerland during September 5-7, 2016. ISCBI 2016 is organized by International Neural Network Society (INNS) India Regional Chapter and University of Applied Sciences and Arts Northwestern Switzerland, Switzerland, is the flagship event of INNS-India. All submissions will be peer […]
Jan, 22

Parallel Explicit FEM Algorithms Using GPU’s

The Explicit Finite Element Method is a powerful tool in nonlinear dynamic finite element analysis. Recent major developments in computational devices, in particular, General Purpose Graphical Processing Units (GPGPU’s) now make it possible to increase the performance of the explicit FEM. This dissertation investigates existing explicit finite element method algorithms which are then redesigned for […]
Jan, 22

Heterogeneous (CPU+GPU) Working-set Hash Tables

In this paper, we propose heterogeneous (CPU+GPU) hash tables, that optimize operations for frequently accessed keys. The idea is to maintain a dynamic set of most frequently accessed keys in the GPU memory and the rest of the keys in the CPU main memory. Further, queries are processed in batches of fixed size. We measured […]
Jan, 22

Exploring LLVM Infrastructure for Simplified Multi-GPU Programming

GPUs have established themselves in the computing landscape, convincing users and designers by their excellent performance and energy efficiency. They differ in many aspects from general-purpose CPUs, for instance their highly parallel architecture, their thread-collective bulk-synchronous execution model, and their programming model. In particular, languages like CUDA or OpenCL require users to express parallelism very […]
Jan, 22

Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts

Scalable sparse LU factorization is critical for efficient numerical simulation of circuits and electrical power grids. In this work, we present a new scalable sparse direct solver called Basker. Basker introduces a new algorithm to parallelize the Gilbert-Peierls algorithm for sparse LU factorization. As architectures evolve, there exists a need for algorithms that are hierarchical […]
Jan, 22

GPU Multisplit

Multisplit is a broadly useful parallel primitive that permutes its input data into contiguous buckets or bins, where the function that categorizes an element into a bucket is provided by the programmer. Due to the lack of an efficient multisplit on GPUs, programmers often choose to implement multisplit with a sort. However, sort does more […]
Jan, 22

The 9th International Conference on Machine Vision (SPIE-ICMV), 2016

All accepted of ICMV 2016 will be published by SPIE, which will be indexed by [EI&Scopus]. Previous proceedings from 2007 to 2015 (http://www.icmv.org/pro.html), all have get published and indexed by EI&Scopus. Conference photos, please check 2015 (http://www.icmv.org/photo2015.html); 2014 (http://www.icmv.org/photo2014.html); 2013 (http://www.icmv.org/photo2013.html). Keynote &Plenary Speakers Prof. AntanasVerikas, Halmstad University, Sweden; Prof. PetiaRadeva, University of Barcelona, Spain; […]
Jan, 19

Adaptive and Hybrid Machine Learning Approaches Utilizing General Purpose Computing on Graphical Processing Units

Unlike machines, humans and animals have very complex reasoning capability that allows them to adapt to changes in the naturally world, while computers tend to be very limited in that same aspect. What limits machines from becoming adaptable can span many topics, but of these attributes which would limit a machines ability to adapt is […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: