Posts
Feb, 12
Increasing precision of uniform pseudorandom number generators
A general method to produce uniformly distributed pseudorandom numbers with extended precision by combining two pseudorandom numbers with lower precision is proposed. In particular, this method can be used for pseudorandom number generation with extended precision on graphics processing units (GPU), where the performance of single and double precision operations can vary significantly.
Feb, 12
Designing Bit-Reproducible Portable High-Performance Applications
Bit-reproducibility has many advantages in the context of high-performance computing. Besides simplifying and making more accurate the process of debugging and testing the code, it can allow the deploying of applications on heterogeneous systems, maintaining the consistency of the computations. In this work we analyze the basic operations performed by scientific applications and identify the […]
Feb, 12
GROMACS on Hybrid CPU-GPU and CPU-MIC Clusters: Preliminary Porting Experiences, Results and Next Steps
This report introduces hybrid implementation of the Gromacs application, and provides instructions on building and executing on PRACE prototype platforms with Graphical Processing Units (GPU) and Many Intergrated Cores (MIC) accelerator technologies. GROMACS currently employs message-passing MPI parallelism, multi-threading using OpenMP and contains kernels for non-bonded interactions that are accelerated using the CUDA programming language. […]
Feb, 12
Transparent use of Java objects on the GPU in the JaMP/OpenMP framework
Many computationally intensive applications profit by parallel execution, based on using multiple cores in CPUs, data-parallel GPGPU processing or even several machines like in clusters. However, changing a program to run in parallel requires a high effort and is therefore a time-consuming step during development. During the implementation, it is necessary to consider many steps […]
Feb, 12
Minerals detection for hyperspectral images using adapted linear unmixing: LinMin
Minerals detection over large volume of spectra is the challenge addressed by current hyperspectral imaging spectrometer in Planetary Science. Instruments such OMEGA (Mars Express), CRISM (Mars Reconnaissance Orbiter), M^{3} (Chandrayaan-1), VIRTIS (Rosetta) and many more, have been producing very large datasets since one decade. We propose here a fast supervised detection algorithm called LinMin, in […]
Feb, 12
Yang-Mills lattice on CUDA
The Yang-Mills fields have an important role in the non-Abelian gauge field theory which describes the properties of the quark-gluon plasma. The real time evolution of the classical fields is given by the equations of motion which are derived from the Hamiltonians to contain the term of the SU(2) gauge field tensor. The dynamics of […]
Feb, 11
A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators
We present a system that enables simple and intuitive programming of CPU+GPU clusters. This system relieves the programmer of the burden of load balancing, detailed data communication, task mapping, scheduling, etc. Our programming model is based on bulk synchronous distributed shared memory model, which is suitable for heterogenous multi-GPU clusters, especially so for compute intensive […]
Feb, 11
Confidentiality Issues on a GPU in a Virtualized Environment
General-Purpose computing on Graphics Processing Units (GPGPU) combined to cloud computing is already a commercial success. However, there is little literature that investigates its security implications. Our objective is to highlight possible information leakage due to GPUs in virtualized and cloud computing environments. We provide insight into the different GPU virtualization techniques, along with their […]
Feb, 11
Exploiting GPU Parallelism to Optimize Real-World Problems
Construction of optimal schedule for airline crew-scheduling requires high computation time. The main objective to create this optimal schedule is to assign all the crews to available flights in a minimum amount of time. This is a highly constrained optimization problem. In this paper, we implement co-evolutionary genetic algorithm in order to solve this problem. […]
Feb, 11
Exploring Multiple Levels of Performance Modeling for Heterogeneous Systems
One of the major challenges faced by the HPC community today is user-friendly and accurate heterogeneous performance modeling. Although performance prediction models exist to fine-tune applications, they are seldom easy-to-use and do not address multiple levels of design space abstraction. Our research aims to bridge the gap between reliable performance model selection and user-friendly analysis. […]
Feb, 11
Benchmarking the Intel Xeon Phi Coprocessor
This document summarizes our first experience with the Intel Xeon Phi. This is a coprocessor that uses Intel’s Many Integrated Core (MIC) architecture to speed up highly parallel processes involving intensive numerical computations. The MIC coprocessor communicates with a regular Intel Xeon ("host") processor through its operating system. The Xeon Phi coprocessor is sometimes referred […]
Feb, 11
Point Rendering in CUDA Path Tracer
A novel technique for point rendering in a CUDA path tracer is introduced in this proposal. The approach makes it possible to render point represented geometries with global illumination effects. Octree data structure is combined in order for more efficient intersection determination. Furthermore, Octree enables the users/artists to choose the level of details of the […]