17532

Posts

Sep, 7

Multi-Tasking Scheduling for Heterogeneous Systems

Heterogeneous platforms play an increasingly important role in modern computer systems. They combine high performance with low power consumption. From mobiles to supercomputers, we see an increasing number of computer systems that are heterogeneous. The most well-known heterogeneous system, CPU+GPU platforms have been widely used in recent years. As they become more mainstream, serving multiple […]
Sep, 3

Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters

Data analytics is undergoing a revolution in many scientific domains, demanding cost-effective parallel data analysis techniques. Traditional Java-based Big Data processing tools like Hadoop MapReduce are designed for commodity CPUs. In contrast, emerging manycore processors like Xeon Phi has an order of magnitude of computation power and memory bandwidth. To harness the computing capabilities, we […]
Sep, 3

Integer sorting on multicores: some (experiments and) observations

There have been many proposals for sorting integers on multicores/GPUs that include radix-sort and its variants or other approaches that exploit specialized hardware features of a particular multicore architecture. Comparison-based algorithms have also been used. Network-based algorithms have also been used with primary example Batcher’s bitonic sorting algorithm. Although such a latter approach is theoretically […]
Sep, 3

Real-Time Rendering of Molecular Dynamics Simulation Data: A Tutorial

Achieving real-time molecular dynamics rendering is a challenge, especially when the rendering requires intensive computation involving a large simulation data-set. The task becomes even more challenging when the size of the data is too large to fit into random access memory (RAM) and the final imagery depends on the input and output (I/O) performance. The […]
Sep, 3

Towards On-Chip Optical FFTs for Convolutional Neural Networks

Convolutional neural networks have become an essential element of spatial deep learning systems. In the prevailing architecture, the convolution operation is performed with Fast Fourier Transforms (FFT) electronically in GPUs. The parallelism of GPUs provides an efficiency over CPUs, however both approaches being electronic are bound by the speed and power limits of the interconnect […]
Sep, 3

Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes

The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set […]
Aug, 26

Accelerate Local Tone Mapping for High Dynamic Range Images Using OpenCL with GPU

Tone mapping has been used to transfer HDR (high dynamic range) images to low dynamic range. This paper describes an algorithm to display high dynamic range images. Although local tone-mapping operator is better than global operator in reproducing images with better details and contrast, however, local tone mapping algorithm usually requires a huge amount of […]
Aug, 26

Vulnerability Analysis and Attacks on Intel Xeon Phi Coprocessor

The Intel Xeon Phi coprocessor is a PCIe based add-in card. Though it is prone to simple attacks, many high performance computing systems are constructed by combining CPUs and coprocessors. This paper describes two attacks that exploit vulnerabilities related to the boot process of coprocessor and ownership of offload user. Proof of concept codes are […]
Aug, 26

Large Integer Arithmetic in GPU for Cryptography

Most computer nowadays support 32 bits or 64 bits of data type on various type of programming languages and they are sufficient for most use cases. However, in cryptography, the required range and precision are more than 64 bits which are computationally expensive on CPUs. In this report, we present our design and implementation of […]
Aug, 26

Dynamic Parallelism in GPU Optimized Barnes Hut Trees for Molecular Dynamics Simulations

Since the beginning of the modern computing era, high performance computing has been pushing the boundaries of the types of problems that can be solved in many different disciplines. One of the leading fields is computational biophysics where molecular dynamics (MD) simulations provide microscopic resolution details of how biomolecules move, fold, and assemble into intricate […]
Aug, 26

eccCL: parallelized GPU implementation of Ensemble Classifier Chains

BACKGROUND: Multi-label classification has recently gained great attention in diverse fields of research, e.g., in biomedical application such as protein function prediction or drug resistance testing in HIV. In this context, the concept of Classifier Chains has been shown to improve prediction accuracy, especially when applied as Ensemble Classifier Chains. However, these techniques lack computational […]
Aug, 17

Simulating the Cardinal Movements of Childbirth Using Finite Element Analysis on the Graphics Processing Unit

Many problems can occur during childbirth which may lead to instant or future morbidity and even mortality. Therefore the computer-based simulation of the mechanisms and biomechanics of human childbirth is becoming an increasingly important area of study, to avoid potential trauma to the baby and the mother throughout, and immediately following, the childbirth process. Computer-based […]
Page 3 of 92912345...102030...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: