14232

Applications

T. Takaki, R. Rojas, M Ohno, T. Shimokawabe, T. Aoki
A GPU code has been developed for a phase-field lattice Boltzmann (PFLB) method, which can simulate the dendritic growth with motion of solids in a dilute binary alloy melt. The GPU accelerated PFLB method has been implemented using CUDA C. The equiaxed dendritic growth in a shear flow and settling condition have been simulated by […]
View View   Download Download (PDF)   
Jun Xiao, Hao Chen, Jianhua Sun
Sorting is a fundamental problem in computer science, and the strict sorting usually means a strict order with ascending or descending. However, some applications in reality don’t require the strict ascending or descending order and the approximate ascending or descending order just meets the requirement. Graphics processing units (GPUs) have become accelerators for parallel computing. […]
View View   Download Download (PDF)   
Alexander Ross Mace
Acentrosomal microtubules are not bound to a microtubule organising centre yet are still able to form ordered arrays. Two clear examples of this behaviour are the acentrosomal apico-basal (side wall) array in epithelial cells and the parallel organisation of plant cortical microtubules. This research investigates their formation through mathematical modelling and Monte Carlo simulations with […]
View View   Download Download (PDF)   
Mark Sutherland, Joshua San Miguel, Natalie Enright Jerger
We present texture cache approximation as a method for using existing hardware on GPUs to eliminate costly global memory accesses. We develop a technique for using a GPU’s texture fetch units to generate approximate values, and argue that this technique is applicable to a wide variety of GPU kernels. Applying texture cache approximation to an […]
View View   Download Download (PDF)   
Genlang Chen, Chenggang Lai, Miaoqing Huang
Sparse coding has been a popular learning model in machine learning field. However, due to the complexity of the learning model, the high computational cost has seriously hindered its application. Toward this purpose, this paper presents a parallel sparse coding method to improve the performance by exploiting the power of acceleration technologies such as Intel […]
View View   Download Download (PDF)   
Mu Wang, John F. Brady
In this work we develop the Spectral Ewald Accelerated Stokesian Dynamics (SEASD), a novel computational method for dynamic simulations of polydisperse colloidal suspensions with full hydrodynamic interactions. SEASD is based on the framework of Stokesian Dynamics (SD) with extension to compressible solvents, and uses the Spectral Ewald (SE) method [Lindbo & Tornberg, J. Comput. Phys. […]
View View   Download Download (PDF)   
Andrea Miele
We present a preliminary study of buffer overflow vulnerabilities in CUDA software running on GPUs. We show how an attacker can overrun a buffer to corrupt sensitive data or steer the execution flow by overwriting function pointers, e.g., manipulating the virtual table of a C++ object. In view of a potential mass market diffusion of […]
View View   Download Download (PDF)   
Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, Thomas S. Huang
Data embedding is used in many machine learning applications to create low-dimensional feature representations, which preserves the structure of data points in their original space. In this paper, we examine the scenario of a heterogeneous network with nodes and content of various types. Such networks are notoriously difficult to mine because of the bewildering combination […]
View View   Download Download (PDF)   
Suejb Memeti, Sabri Pllana
Genetic information is increasing exponentially, doubling every 18 months. Analyzing this information within a reasonable amount of time requires parallel computing resources. While considerable research has addressed DNA analysis using GPUs, so far not much attention has been paid to the Intel Xeon Phi coprocessor. In this paper we present an algorithm for large-scale DNA […]
View View   Download Download (PDF)   
Andre Viebke, Sabri Pllana
Supervised learning of Convolutional Neural Networks (CNNs), also known as supervised Deep Learning, is a computationally demanding process. To find the most suitable parameters of a network for a given application, numerous training sessions are required. Therefore, reducing the training time per session is essential to fully utilize CNNs in practice. While numerous research groups […]
View View   Download Download (PDF)   
Amir Gholami, Judith Hill, Dhairya Malhotra, George Biros
We present a new library for parallel distributed Fast Fourier Transforms (FFT). Despite the large amount of work on FFTs, we show that significant speedups can be achieved for distributed transforms. The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements. AccFFT extends existing FFT libraries for […]
Imran Ashraf, Vlad-Mihai Sima, Koen Bertels
The growing demand of processing power is being satisfied mainly by an increase in the number of computing cores in a system. One of the main challenges to be addressed is efficient utilization of these architectures. This demands data-communication aware mapping of applications on these architectures. Appropriate tools are required to provide the detailed intra-application […]
View View   Download Download (PDF)   
Page 1 of 78812345...102030...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1496 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

255 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: