high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Security » Automatic classification of object code using machine learning

Automatic classification of object code using machine learning

John Clemens

University of Maryland, Baltimore County (UMBC), Baltimore, MD, USA

Digital Investigation, Volume 14, Supplement 1, Pages S156-S162, 2015

DOI:10.1016/j.diin.2015.05.007

@article{clemens2015automatic,

title={Automatic classification of object code using machine learning},

author={Clemens, John},

journal={Digital Investigation},

volume={14},

pages={S156–S162},

year={2015},

publisher={Elsevier}

}

Download (PDF)

View

Source

2421

views

Recent research has repeatedly shown that machine learning techniques can be applied to either whole files or file fragments to classify them for analysis. We build upon these techniques to show that for samples of un-labeled compiled computer object code, one can apply the same type of analysis to classify important aspects of the code, such as its target architecture and endianess. We show that using simple byte-value histograms we retain enough information about the opcodes within a sample to classify the target architecture with high accuracy, and then discuss heuristic-based features that exploit information within the operands to determine endianess. We introduce a dataset with over 16000 code samples from 20 architectures and experimentally show that by using our features, classifiers can achieve very high accuracy with relatively small sample sizes.

Tags: Computer science, CUDA, Machine learning, nVidia, Security

August 14, 2015 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Automatic classification of object code using machine learning

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Automatic classification of object code using machine learning

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)