Posts
Jul, 8
Sorting and Permuting without Bank Conflicts on GPUs
In this paper, we look at the complexity of designing algorithms without any bank conflicts in the shared memory of Graphical Processing Units (GPUs). Given input of size $n$, $w$ processors and $w$ memory banks, we study three fundamental problems: sorting, permuting and $w$-way partitioning (defined as sorting an input containing exactly $n/w$ copies of […]
Jul, 8
Experiments on Parallel Training of Deep Neural Network using Model Averaging
In this work we apply model averaging to parallel training of deep neural network (DNN). Parallelization is done in a model averaging manner. Data is partitioned and distributed to different nodes for local model updates, and model averaging across nodes is done every few minibatches. We use multiple GPUs for data parallelization, and Message Passing […]
Jul, 6
High Performance Extreme Learning Machines: A Complete Toolbox for Big Data Applications
This work presents a complete approach to a successful utilization of a high performance Extreme Learning Machines (ELMs) Toolbox for Big Data. It summarizes recent advantages in algorithmic performance; gives a fresh view on the ELM solution in relation to the traditional linear algebraic performance; and reaps the latest software and hardware performance achievements. The […]
Jul, 6
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations
The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well exploited with a combination of SIMD, multi-threading, and MPI-based SPMD/MPMD parallelism, while GPUs can be used as accelerators to compute interactions offloaded from the CPU. Here we evaluate which […]
Jul, 6
LTTng CLUST: A system-wide unified CPU and GPU tracing tool for OpenCL applications
As computation schemes evolve and many new tools become available to programmers to enhance the performance of their applications, many programmers started to look towards highly parallel platforms such as Graphical Processing Unit (GPU). Offloading computations that can take advantage of the architecture of the GPU is a technique that has proven fruitful in recent […]
Jul, 6
Hetero-DB: Next Generation High-Performance Database Systems by Best Utilizing Heterogeneous Computing and Storage Resources
With recent advancement on hardware technologies, new general-purpose high-performance devices have been widely adopted, such as the graphics processing unit (GPU) and solid state drive (SSD). GPU may offer an order of higher throughput for applications with massive data parallelism, compared with the multicore CPU. Moreover, new storage device SSD is also capable of offering […]
Jul, 6
Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores
In-memory key-value stores play a critical role in data processing to provide high throughput and low latency data accesses. In-memory key-value stores have several unique properties that include (1) data intensive operations demanding high memory bandwidth for fast data accesses, (2) high data parallelism and simple computing operations demanding many slim parallel computing units, and […]
Jul, 3
High Performance Approximate Sort Algorithm Using GPUs
Sorting is a fundamental problem in computer science, and the strict sorting usually means a strict order with ascending or descending. However, some applications in reality don’t require the strict ascending or descending order and the approximate ascending or descending order just meets the requirement. Graphics processing units (GPUs) have become accelerators for parallel computing. […]
Jul, 3
Modelling the Formation of Ordered Acentrosomal Microtubule Arrays
Acentrosomal microtubules are not bound to a microtubule organising centre yet are still able to form ordered arrays. Two clear examples of this behaviour are the acentrosomal apico-basal (side wall) array in epithelial cells and the parallel organisation of plant cortical microtubules. This research investigates their formation through mathematical modelling and Monte Carlo simulations with […]
Jul, 3
Texture Cache Approximation on GPUs
We present texture cache approximation as a method for using existing hardware on GPUs to eliminate costly global memory accesses. We develop a technique for using a GPU’s texture fetch units to generate approximate values, and argue that this technique is applicable to a wide variety of GPU kernels. Applying texture cache approximation to an […]
Jul, 3
GPU phase-field lattice Boltzmann simulations of growth and motion of a binary alloy dendrite
A GPU code has been developed for a phase-field lattice Boltzmann (PFLB) method, which can simulate the dendritic growth with motion of solids in a dilute binary alloy melt. The GPU accelerated PFLB method has been implemented using CUDA C. The equiaxed dendritic growth in a shear flow and settling condition have been simulated by […]
Jul, 3
Parallel Sparse Coding for Seafloor Image Analysis
Sparse coding has been a popular learning model in machine learning field. However, due to the complexity of the learning model, the high computational cost has seriously hindered its application. Toward this purpose, this paper presents a parallel sparse coding method to improve the performance by exploiting the power of acceleration technologies such as Intel […]