Type to search posts and projects to navigate

Instrument · Adversarial

Quality-diversity archive

Optimize one number and search collapses onto the single best exploit. MAP-Elites keeps the best solution in every cell of a behavior space, mapping a diverse archive of vulnerabilities. Run both on the same budget and watch the difference.

Sources: Cross-generational transfer (extended) · Code

Paper / Quality-diversity, Red Queen

Why quality-diversity finds attacks a single objective misses

Optimize for one number, attack success, and search collapses onto the single best exploit. MAP-Elites instead keeps the best attack in every cell of a behavior space, illuminating a diverse, transferable archive of vulnerabilities. Run it against a fitness-only baseline or a novelty search and watch coverage and quality diverge. Axes are Red Queen's real descriptors: the six attack strategies by prompt length.

MAP-Elites · quality-diversity
Single-objective · fitness only
Hover a cell to inspect its elite.   X: attack strategy · Y ↑: prompt length · shade: attack success.
Coverage · MAP-Elites
%
Coverage · baseline
%
QD-score · MAP-Elites
QD-score · baseline
MAP-Elites coverage baseline coverage over evaluations
How this is computed

A faithful toy of the algorithm, not a live LLM attack. Genomes live in a 2-D behavior space: six real attack strategies (Roleplay, Encoding, Authority, Hypothetical, MultiTurn, DirectJailbreak) × prompt length. Fitness is a fixed multi-modal "attack success" landscape where different strategy/length regions succeed to different degrees.

  • MAP-Elites keeps the best genome per niche, mutating a random elite and storing it if it beats that niche. It maximizes both coverage (niches filled) and QD-score (summed elite fitness).
  • Single-objective (tournament) only cares about global fitness, so the population converges onto the strongest exploit: high fitness, almost no coverage.
  • Novelty search selects for behavioral uniqueness and ignores fitness, so it spreads across the space (high coverage) but does not optimize each niche (lower QD-score).

The lesson: only quality-diversity gives you coverage and quality, which is the whole point of Red Queen (extended paper: arXiv:2606.00813). Method: MAP-Elites (Mouret & Clune, 2015); selection operators after the Red Queen core.

Embed this on your site

Paste this HTML where you want the widget. It stays in sync with the live version, and matches your page in light or dark.

Subhadip Mitra