SmoothGuard – Robust Defense for Multimodal LLMs

SmoothGuard is a lightweight, model-agnostic defense for multimodal large language models (MLLMs) that improves robustness against adversarial attacks (repo).
The method injects calibrated noise perturbations into inputs and applies clustering aggregation over multiple noisy runs to filter out harmful or unstable responses while preserving model utility.
This project includes evaluation pipelines on MM-SafetyBench, Bench-in-the-Wild, and POPE, demonstrating that SmoothGuard can significantly reduce attack success rates with minimal impact on standard multimodal performance.