15558

Posts

Mar, 12

Automatic and Explicit Parallelization Approaches for Mathematical Simulation Models

The move from single core and processor systems to multi-core and many-processors systemscomes with the requirement of implementing computations in a way that can utilizethese multiple units eciently. This task of writing ecient multi-threaded algorithmswill not be possible with out improving programming languages and compilers to providethe mechanisms to do so. Computer aided mathematical modeling […]
Mar, 10

Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit

Forward Wright-Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the CPU, thus limiting their usefulness. The single-locus Wright-Fisher forward algorithm is, however, exceedingly parallelizable, with many steps which are so-called embarrassingly parallel, consisting of a vast number of individual computations that are all […]
Mar, 10

Automatic Data Layout Generation and Kernel Mapping for CPU+GPU Architectures

The ubiquity of hybrid CPU+GPU architectures has led to renewed interest in automatic data layout generation owing to the fact that data layouts have a large impact on performance, and that different data layouts yield the best performance on CPUs vs. GPUs. Unfortunately, current programming models still fail to provide an effective solution to the […]
Mar, 10

Pragma Directed Shared Memory Centric Optimizations on GPUs

GPUs become a ubiquitous choice as coprocessors since they have excellent ability in concurrent processing. In GPU architecture, shared memory plays a very important role in system performance as it can largely improve bandwidth utilization and accelerate memory operations. However, even for affine GPU applications that contain regular access patterns, optimizing for shared memory is […]
Mar, 10

Study and evaluation of an Irregular Graph Algorithm on Multicore and GPU Processor Architectures

One area of Computing applications which poses significant challenge of performance scalability on Chip Multiprocessors(CMP’s) are Irregular applications. Such applications have very little computation and unpredictable memory access patterns making them memory-bound in contrast to compute-bound applications. Since the gap between processor and memory performance continues to exist, difficulty to hide and decrease this gap […]
Mar, 10

Testing fine-grained parallelism for the ADMM on a factor-graph

There is an ongoing effort to develop tools that apply distributed computational resources to tackle large problems or reduce the time to solve them. In this context, the Alternating Direction Method of Multipliers (ADMM) arises as a method that can exploit distributed resources like the dual ascent method and has the robustness and improved convergence […]
Mar, 8

D-face: Parallel Implementation of CNN Based Face Classifier using Drone Data On K40 & Jetson TK1

Convolutional Neural Networks (CNNs) are shown to perform very well in the areas such as video surveillance, object classification and face classification. Face classification has become pertinent to numerous applications, especially in this big data era of social platforms and social media. With the usage of unmanned air-borne vehicles like drones, the problem of face […]
Mar, 8

Enhancing productivity and performance portability of OpenCL applications on heterogeneous systems using runtime optimizations

Initially driven by a strong need for increased computational performance in science and engineering, heterogeneous systems have become ubiquitous and they are getting increasingly complex. The single processor era has been replaced with multi-core processors, which have quickly been surrounded by satellite devices aiming to increase the throughput of the entire system. These auxiliary devices, […]
Mar, 8

Compiler and runtime techniques for bulk-synchronous programming models on CPU architectures

The rising pressure to simultaneously improve performance and reduce power consumption is driving more heterogeneity into all aspects of computing devices. However, wide adoption of specialized computing devices such as GPUs and Xeon Phis comes with a programming challenge. A carefully optimized program that is well matched to the target hardware can run many times […]
Mar, 8

A Novel Mapping of Arbitrary Precision Integer Operations to the GPU

With modern processing hardware converging on the physical barrier in terms of transistor size and speed per single core, hardware manufacturers have shifted their focus to improve performance from raw clock power towards parallelization. Solutions to utilize the computation power of GPUs are published and supported by graphics card manufacturers. While there exist solutions for […]
Mar, 7

Topology optimization design of 3D electrothermomechanical actuators by using GPU as a co-processor

The topology optimization method (TOM) requires high computational resources to be solved, especially in multiphysics problems. The high number of computational requirements is because TOM is an iterative technique, in which the iterations go from tens to thousands. Furthermore, at each TOM iteration, it is necessary to execute several routines such as the finite element […]
Mar, 5

Performance Analysis of kNN on large datasets using CUDA & Pthreads

Several organizations have large databases which are growing at a rapid rate day by day, which need to be regularly maintained. Content based searches are similar searched based on certain features that are obtained from various multi media data. For various applications like multimedia content retrieval, data mining, pattern recognition, etc., performing the nearest neighbor […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: