high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Security » Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Zhao Xu, Fan Liu, Hao Liu

AI Thrust, The Hong Kong University of Science and Technology (Guangzhou)

arXiv:2406.09324 [cs.CR], (13 Jun 2024)

DOI:10.48550/arXiv.2406.09324

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking

1054

views

Although Large Language Models (LLMs) have demonstrated significant capabilities in executing complex tasks in a zero-shot manner, they are susceptible to jailbreak attacks and can be manipulated to produce harmful outputs. Recently, a growing body of research has categorized jailbreak attacks into token-level and prompt-level attacks. However, previous work primarily overlooks the diverse key factors of jailbreak attacks, with most studies concentrating on LLM vulnerabilities and lacking exploration of defense-enhanced LLMs. To address these issues, we evaluate the impact of various attack settings on LLM performance and provide a baseline benchmark for jailbreak attacks, encouraging the adoption of a standardized evaluation framework. Specifically, we evaluate the eight key factors of implementing jailbreak attacks on LLMs from both target-level and attack-level perspectives. We further conduct seven representative jailbreak attacks on six defense methods across two widely used datasets, encompassing approximately 320 experiments with about 50,000 GPU hours on A800-80G. Our experimental results highlight the need for standardized benchmarking to evaluate these attacks on defense-enhanced LLMs. Our code is available.

Tags: AI, Benchmarking, Computer science, nVidia, nVidia A800, Package, Security

June 16, 2024 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)