A Sanity Check for AI-generated Image Detection

1Xiaohongshu Inc.     2University of Science and Technology of China     3Shanghai Jiao Tong University    

Figure 1. Two contemporary AI-generated image benchmarks, namely (a) AIGCDetect Benchmark and (b) GenImage Benchmark, where all images are generated from publicly available generators, including ProGAN (GAN-based), SD v1.4 (DM-based), and Midjourney (commercial API). These images are conditioned on simple prompts (e.g., photo of a plane) without delicate manual adjustments, thereby inclined to generate obvious anti-facts in consistency and semantics (marked with red boxes). In contrast, our Chameleon dataset in (c) aims to simulate real-world scenarios by collecting diverse images from online websites, where these online images are carefully adjusted by photographers and AI artists.

Abstract

With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AI-generated images that are genuinely challenging for human perception. To quantify the generalization of existing methods, we evaluate 9 off-the-shelf AI-generated image detectors on Chameleon dataset. Upon analysis, almost all models classify AI-generated images as real ones. Later, we propose AIDE~(AI-generated Image DEtector with Hybrid Features), which leverages multiple experts to simultaneously extract visual artifacts and noise patterns. Specifically, to capture the high-level semantics, we utilize CLIP to compute the visual embedding. This effectively enables the model to discern AI-generated images based on semantics or contextual information; Secondly, we select the highest frequency patches and the lowest frequency patches in the image, and compute the low-level patchwise features, aiming to detect AI-generated images by low-level artifacts, for example, noise pattern, anti-aliasing, etc. While evaluating on existing benchmarks, for example, AIGCDetectBenchmark and GenImage, AIDE achieves+3.5% and +4.6% improvements to state-of-the-art methods, and on our proposed challenging Chameleon benchmarks, it also achieves the promising results, despite this problem for detecting AI-generated images is far from being solved.

Experiments

We benchmark the state-of-the-art methods to the best of our knowledge, please see the AIDE Report for details.

TABLE 1. AIGCDetect Benchmark. Accuracy (\%) of different detectors (rows) in detecting real and fake images from different generators (columns). GAN-Average and DM-Average are averaged over the first 8 and the last 8 test sets, respectively.

TABLE 2. GenImage Benchmark. Accuracy (\%) of different baselines (columns) in detecting real and fake images from different generators (rows). These methods are trained on real images from ImageNet and fake images generated by SD v1.4 and evaluated over eight generators.
s
TABLE 3. Chameleon Benchmark. Accuracy (\%) of different detectors (rows) in detecting real and fake images of Chameleon testset (rows). For each training dataset, the first row indicates the average Acc evaluated on the Chameleon testset, and the second row gives "fake image Acc / real image Acc" for detailed analysis.

Downloads



Citation

Please consider to cite PanoVOS if it helps your research.
@article{yan2024sanity,
  title={A Sanity Check for AI-generated Image Detection},
  author={Yan, Shilin and Li, Ouxiang and Cai, Jiayin and Hao, Yanbin and Jiang, Xiaolong and Hu, Yao and Xie, Weidi},
  journal={arXiv preprint arXiv:2406.19435},
  year={2024}
}

License

Creative Commons License
AIDE is licensed under a CC BY-NC-SA 4.0 License.

Contact

Any questions, suggestions and feedback are welcomed. Please concat tattoo.ysl@gmail.com