Dataset Viewer
The dataset could not be loaded because the splits use different data file formats, which is not supported. Read more about the splits configuration. Click for more details.
Couldn't infer the same data file format for all splits. Got {NamedSplit('train'): (None, {}), NamedSplit('validation'): (None, {}), NamedSplit('test'): ('imagefolder', {})}
Error code:   FileFormatMismatchBetweenSplitsError

Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.

SEED_balanced

SEED overview

SEED_balanced is the public balanced release of SEED, a benchmark for provenance tracing in sequential deepfake facial edits. Unlike conventional deepfake datasets that focus on single-step manipulations or binary real/fake detection, SEED models multi-step diffusion-based facial editing trajectories and supports three complementary tasks: Authenticity Analysis, Editing Trace Analysis, and Spatial Evidence Analysis. The full SEED benchmark contains 91,526 images with step-wise provenance annotations, while the balanced benchmark partition contains 100,000 images with equal proportions of sequence lengths (L=0,1,2,3,4). :contentReference[oaicite:4]{index=4} :contentReference[oaicite:5]{index=5} :contentReference[oaicite:6]{index=6}


Overview

Item Description
Dataset name SEED_balanced
Full benchmark scale 91,526 images
Balanced benchmark scale 100,000 images
Domain Facial imagery
Editing type Sequential diffusion-based facial edits
Source real datasets FFHQ, CelebAMask-HQ
Step-wise metadata Edit order, attribute labels, prompts, masks, editor identity
Supported tasks Authenticity, Editing Trace, Spatial Evidence
Official evaluation CodaBench

SEED is built from FFHQ and CelebAMask-HQ and edited using diffusion-based pipelines. Each manipulated sample is generated by applying one to four attribute edits sequentially, and each step is logged with provenance metadata including edited attribute, prompt, mask, and editing model. :contentReference[oaicite:7]{index=7} :contentReference[oaicite:8]{index=8}


Supported Tasks

Task Description Output
Authenticity Analysis Distinguish real images from sequentially edited ones Binary or sequence-based decision
Editing Trace Analysis Predict edited attributes and their temporal order Ordered attribute sequence
Spatial Evidence Analysis Localize manipulated regions Mask / localization map

These three tasks are explicitly described in the paper and illustrated in the benchmark overview figure. :contentReference[oaicite:9]{index=9}


Data Construction

Construction pipeline

SEED is constructed in three stages:

Stage Description
Preprocessing Build attribute-specific masks and text conditions
Sequential manipulation Sample sequence length (L \in {1,2,3,4}), choose attributes, and apply a diffusion editor step by step
Quality evaluation Filter degenerate results using perceptual and semantic consistency checks

The editing pipeline uses multiple diffusion editors, including LEdits, SDXL, and SD3-style models fine-tuned with UltraEdit. Prompt templates are varied to preserve edit intent while increasing linguistic diversity. :contentReference[oaicite:10]{index=10}


Prompt Template Examples

Attribute Instruction Template Caption Template
Eyes Make the eyes {color}. A person with {color} eyes.
Lip Change the lipstick color to {color}. A person with {color} lipstick.
Hair Turn the hair {color}. / Make the hair {style}. A person with {color} hair. / A person with {style} hair.
Eyebrows Make the eyebrows {style}. A person with {style} eyebrows.
Glasses Add a pair of {glasses}. A person wearing {glasses}.
Hat Add a {hat}. A person wearing a {hat}.

These prompt templates are taken from the paper’s dataset construction section. :contentReference[oaicite:11]{index=11}


Dataset Statistics

Statistic Value
Full SEED images 91,526
Sequence length (L=1) 29.91%
Sequence length (L=2) 26.21%
Sequence length (L=3) 21.88%
Sequence length (L=4) 22.00%
UltraEdit 38.28%
LEdits 37.34%
SDXL 24.38%
Attribute Proportion
Lip 28%
Eyebrow 18%
Eye 17%
Hat 14%
Hair 14%
Glasses 9%

These distributions are reported in the dataset statistics section of the paper. :contentReference[oaicite:12]{index=12}


Balanced Partition and Split Protocol

Length bucket Count
(L=0), real 20,000
(L=1) 20,000
(L=2) 20,000
(L=3) 20,000
(L=4) 20,000
Total 100,000

Benchmark Evaluation

Official evaluation is conducted on CodaBench using three metrics:

Metric Meaning
Fixed-Acc Token-level accuracy under a fixed sequence comparison protocol
Adaptive-Acc Token-level accuracy under adaptive sequence comparison
Full-Acc Exact sequence match, the strictest metric

The paper emphasizes that Full-Acc is the strictest metric because the whole predicted edit history must match the ground truth. :contentReference[oaicite:14]{index=14}

Average Results Reported in the Paper

Model Fixed-Acc Adaptive-Acc Full-Acc
Shuai et al. 71.50 54.07 48.72
FreqNet 70.08 52.59 48.27
Ba et al. 68.78 54.80 50.80
SeqFakeFormer 81.62 68.53 66.97
FAITH (DCT) 81.70 68.56 67.02
FAITH (FFT) 81.75 68.58 67.03
FAITH (DWT) 81.87 68.84 67.26

The paper reports that performance drops as edit chains become longer, and that DWT-based FAITH is the strongest average variant overall. :contentReference[oaicite:15]{index=15}

Robustness Settings

The paper also evaluates robustness under:

Perturbation Levels
JPEG compression 25%, 50%, 75%
Gaussian noise 10%, 15%, 20%

Repository Contents

This Hugging Face repository hosts the public release only.

File Description
seqdeepfake_train_data.zip Public training archive
README.md Dataset card
sample_submission.csv Optional example submission file

This repository does not contain:

  • hidden test labels
  • hidden reference annotations
  • official private evaluation data

Those components are handled through CodaBench.


Intended Usage

This dataset is intended for:

  • deepfake forensics research
  • diffusion-edit provenance tracing
  • edit-order prediction
  • localization and evidence analysis
  • robustness benchmarking under image degradation

Recommended workflow:

  1. Download and extract the public training data from this repository.
  2. Train or fine-tune your method locally.
  3. Validate locally using your own protocol.
  4. Submit predictions to CodaBench for official hidden-set evaluation.

CodaBench: <PASTE_CODABENCH_LINK_HERE>


Data Usage Policy

Please use this dataset for research, benchmarking, and forensic analysis only.

Please do not use it for:

  • identity recognition or surveillance
  • face-based profiling
  • deceptive content generation
  • unauthorized inference about real individuals

Users should also respect the licenses and usage conditions of the original source datasets and any benchmark-specific release conditions.


FAITH Baseline Setup and Usage

This repository provides the FAITH baseline and the associated training data package for the SeqDeepFake setting.

Repository Contents

After downloading the full repository, the top-level structure is expected to look like this:

.
β”œβ”€β”€ FAITH/
β”œβ”€β”€ assets/
β”œβ”€β”€ .gitattributes
β”œβ”€β”€ README.md
β”œβ”€β”€ req.txt
└── seqdeepfake_train_data.zip

1. Environment Setup

Create a new conda environment from req.txt, then activate it:

conda create -n <environment-name> --file req.txt
conda activate <environment-name>

Example:

conda create -n faith --file req.txt
conda activate faith

2. Download the Complete Repository

Please download the complete repository contents from Hugging Face, not only the code folder.

You should have all of the following at the repository root:

  • FAITH/
  • assets/
  • req.txt
  • README.md
  • seqdeepfake_train_data.zip

If you use Git, a typical workflow is:

git clone <your-huggingface-repository-url>
cd <your-repository-name>

If you download from the web interface instead, make sure the downloaded archive is fully extracted before continuing.

3. Prepare the Training Data

The training data is provided as:

seqdeepfake_train_data.zip

Unzip this file into the FAITH directory, then rename the extracted folder to data.

Run the following commands from the repository root:

unzip seqdeepfake_train_data.zip -d FAITH/

Then rename the extracted folder to data.

For example, if the extracted folder is named seqdeepfake_train_data, run:

mv FAITH/seqdeepfake_train_data FAITH/data

After this step, the expected structure should be:

.
β”œβ”€β”€ FAITH/
β”‚   β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ assets/
β”œβ”€β”€ .gitattributes
β”œβ”€β”€ README.md
β”œβ”€β”€ req.txt
└── seqdeepfake_train_data.zip

4. Verify the Data Placement

Before running the baseline, confirm that the dataset is located at:

FAITH/data

That is the expected folder name used by the baseline instructions in this repository.

5. Run the Baseline

After the environment is ready and the dataset has been placed in FAITH/data, enter the FAITH directory and run the baseline script.

cd FAITH
python <your_main_script>.py

Please replace <your_main_script>.py with the actual entry script used in your repository.

If your repository provides separate scripts for training and evaluation, use the appropriate one instead, for example:

cd FAITH
python train.py

or

cd FAITH
python test.py

Notes

  1. Make sure you download the full repository contents, not only individual files.
  2. Make sure the extracted dataset folder is renamed exactly to data.
  3. If unzip is not installed on your system, install it first or extract the archive manually.
  4. If the extracted folder name is different on your machine, rename that extracted folder to FAITH/data.
  5. If the project has a custom launch script, use that script instead of the generic python <your_main_script>.py command.

Troubleshooting

PackagesNotFoundError during conda creation

This usually means some packages in req.txt are unavailable in your current conda channels. In that case, try updating conda first, or recreate the environment with the channels required by your project.

The dataset cannot be found

Check that the final path is exactly:

FAITH/data

python: can't open file ...

This means the entry script name is different from the placeholder command in this README. Please replace <your_main_script>.py with the actual script name in the FAITH folder.


If you are preparing the Hugging Face repository page, you can copy this file directly as the project README.md and then replace the script placeholder with the exact training or evaluation command used by your codebase.


Citation

If you use this dataset, please cite the SEED paper.

@inproceedings{seed2026,
  title={SEED: A Large-Scale Benchmark for Provenance Tracing in Sequential Deepfake Facial Edits},
  author={Anonymous ECCV 2026 Submission},
  booktitle={Proceedings of the European Conference on Computer Vision},
  year={2026}
}
Downloads last month
20