The dataset could not be loaded because the splits use different data file formats, which is not supported. Read more about the splits configuration. Click for more details.

Couldn't infer the same data file format for all splits. Got {NamedSplit('train'): (None, {}), NamedSplit('validation'): (None, {}), NamedSplit('test'): ('imagefolder', {})}

Error code:   FileFormatMismatchBetweenSplitsError

Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.

SEED_balanced

SEED_balanced is the public balanced release of SEED, a benchmark for provenance tracing in sequential deepfake facial edits. Unlike conventional deepfake datasets that focus on single-step manipulations or binary real/fake detection, SEED models multi-step diffusion-based facial editing trajectories and supports three complementary tasks: Authenticity Analysis, Editing Trace Analysis, and Spatial Evidence Analysis. The full SEED benchmark contains 91,526 images with step-wise provenance annotations, while the balanced benchmark partition contains 100,000 images with equal proportions of sequence lengths (L=0,1,2,3,4). :contentReference[oaicite:4]{index=4} :contentReference[oaicite:5]{index=5} :contentReference[oaicite:6]{index=6}

Overview

Item	Description
Dataset name	SEED_balanced
Full benchmark scale	91,526 images
Balanced benchmark scale	100,000 images
Domain	Facial imagery
Editing type	Sequential diffusion-based facial edits
Source real datasets	FFHQ, CelebAMask-HQ
Step-wise metadata	Edit order, attribute labels, prompts, masks, editor identity
Supported tasks	Authenticity, Editing Trace, Spatial Evidence
Official evaluation	CodaBench

SEED is built from FFHQ and CelebAMask-HQ and edited using diffusion-based pipelines. Each manipulated sample is generated by applying one to four attribute edits sequentially, and each step is logged with provenance metadata including edited attribute, prompt, mask, and editing model. :contentReference[oaicite:7]{index=7} :contentReference[oaicite:8]{index=8}

Supported Tasks

Task	Description	Output
Authenticity Analysis	Distinguish real images from sequentially edited ones	Binary or sequence-based decision
Editing Trace Analysis	Predict edited attributes and their temporal order	Ordered attribute sequence
Spatial Evidence Analysis	Localize manipulated regions	Mask / localization map

These three tasks are explicitly described in the paper and illustrated in the benchmark overview figure. :contentReference[oaicite:9]{index=9}

Data Construction

SEED is constructed in three stages:

Stage	Description
Preprocessing	Build attribute-specific masks and text conditions
Sequential manipulation	Sample sequence length (L \in {1,2,3,4}), choose attributes, and apply a diffusion editor step by step
Quality evaluation	Filter degenerate results using perceptual and semantic consistency checks

The editing pipeline uses multiple diffusion editors, including LEdits, SDXL, and SD3-style models fine-tuned with UltraEdit. Prompt templates are varied to preserve edit intent while increasing linguistic diversity. :contentReference[oaicite:10]{index=10}

Prompt Template Examples

Attribute	Instruction Template	Caption Template
Eyes	Make the eyes {color}.	A person with {color} eyes.
Lip	Change the lipstick color to {color}.	A person with {color} lipstick.
Hair	Turn the hair {color}. / Make the hair {style}.	A person with {color} hair. / A person with {style} hair.
Eyebrows	Make the eyebrows {style}.	A person with {style} eyebrows.
Glasses	Add a pair of {glasses}.	A person wearing {glasses}.
Hat	Add a {hat}.	A person wearing a {hat}.

These prompt templates are taken from the paper’s dataset construction section. :contentReference[oaicite:11]{index=11}

Dataset Statistics

Statistic	Value
Full SEED images	91,526
Sequence length (L=1)	29.91%
Sequence length (L=2)	26.21%
Sequence length (L=3)	21.88%
Sequence length (L=4)	22.00%
UltraEdit	38.28%
LEdits	37.34%
SDXL	24.38%

Attribute	Proportion
Lip	28%
Eyebrow	18%
Eye	17%
Hat	14%
Hair	14%
Glasses	9%

These distributions are reported in the dataset statistics section of the paper. :contentReference[oaicite:12]{index=12}

Balanced Partition and Split Protocol

Length bucket	Count
(L=0), real	20,000
(L=1)	20,000
(L=2)	20,000
(L=3)	20,000
(L=4)	20,000
Total	100,000

Benchmark Evaluation

Official evaluation is conducted on CodaBench using three metrics:

Metric	Meaning
Fixed-Acc	Token-level accuracy under a fixed sequence comparison protocol
Adaptive-Acc	Token-level accuracy under adaptive sequence comparison
Full-Acc	Exact sequence match, the strictest metric

The paper emphasizes that Full-Acc is the strictest metric because the whole predicted edit history must match the ground truth. :contentReference[oaicite:14]{index=14}

Average Results Reported in the Paper

Model	Fixed-Acc	Adaptive-Acc	Full-Acc
Shuai et al.	71.50	54.07	48.72
FreqNet	70.08	52.59	48.27
Ba et al.	68.78	54.80	50.80
SeqFakeFormer	81.62	68.53	66.97
FAITH (DCT)	81.70	68.56	67.02
FAITH (FFT)	81.75	68.58	67.03
FAITH (DWT)	81.87	68.84	67.26

The paper reports that performance drops as edit chains become longer, and that DWT-based FAITH is the strongest average variant overall. :contentReference[oaicite:15]{index=15}

Robustness Settings

The paper also evaluates robustness under:

Perturbation	Levels
JPEG compression	25%, 50%, 75%
Gaussian noise	10%, 15%, 20%

Repository Contents

This Hugging Face repository hosts the public release only.

File	Description
`seqdeepfake_train_data.zip`	Public training archive
`README.md`	Dataset card
`sample_submission.csv`	Optional example submission file

This repository does not contain:

hidden test labels
hidden reference annotations
official private evaluation data

Those components are handled through CodaBench.

Intended Usage

This dataset is intended for:

deepfake forensics research
diffusion-edit provenance tracing
edit-order prediction
localization and evidence analysis
robustness benchmarking under image degradation

Recommended workflow:

Download and extract the public training data from this repository.
Train or fine-tune your method locally.
Validate locally using your own protocol.
Submit predictions to CodaBench for official hidden-set evaluation.

CodaBench: <PASTE_CODABENCH_LINK_HERE>

Data Usage Policy

Please use this dataset for research, benchmarking, and forensic analysis only.

Please do not use it for:

identity recognition or surveillance
face-based profiling
deceptive content generation
unauthorized inference about real individuals

Users should also respect the licenses and usage conditions of the original source datasets and any benchmark-specific release conditions.

FAITH Baseline Setup and Usage

This repository provides the FAITH baseline and the associated training data package for the SeqDeepFake setting.

Repository Contents

After downloading the full repository, the top-level structure is expected to look like this:

.
├── FAITH/
├── assets/
├── .gitattributes
├── README.md
├── req.txt
└── seqdeepfake_train_data.zip

1. Environment Setup

Create a new conda environment from req.txt, then activate it:

conda create -n <environment-name> --file req.txt
conda activate <environment-name>

Example:

conda create -n faith --file req.txt
conda activate faith

2. Download the Complete Repository

Please download the complete repository contents from Hugging Face, not only the code folder.

You should have all of the following at the repository root:

FAITH/
assets/
req.txt
README.md
seqdeepfake_train_data.zip

If you use Git, a typical workflow is:

git clone <your-huggingface-repository-url>
cd <your-repository-name>

If you download from the web interface instead, make sure the downloaded archive is fully extracted before continuing.

3. Prepare the Training Data

The training data is provided as:

seqdeepfake_train_data.zip

Unzip this file into the FAITH directory, then rename the extracted folder to data.

Run the following commands from the repository root:

unzip seqdeepfake_train_data.zip -d FAITH/

Then rename the extracted folder to data.

For example, if the extracted folder is named seqdeepfake_train_data, run:

mv FAITH/seqdeepfake_train_data FAITH/data

After this step, the expected structure should be:

.
├── FAITH/
│   ├── data/
│   ├── ...
├── assets/
├── .gitattributes
├── README.md
├── req.txt
└── seqdeepfake_train_data.zip

4. Verify the Data Placement

Before running the baseline, confirm that the dataset is located at:

FAITH/data

That is the expected folder name used by the baseline instructions in this repository.

5. Run the Baseline

After the environment is ready and the dataset has been placed in FAITH/data, enter the FAITH directory and run the baseline script.

cd FAITH
python <your_main_script>.py

Please replace <your_main_script>.py with the actual entry script used in your repository.

If your repository provides separate scripts for training and evaluation, use the appropriate one instead, for example:

cd FAITH
python train.py

cd FAITH
python test.py

Notes

Make sure you download the full repository contents, not only individual files.
Make sure the extracted dataset folder is renamed exactly to data.
If unzip is not installed on your system, install it first or extract the archive manually.
If the extracted folder name is different on your machine, rename that extracted folder to FAITH/data.
If the project has a custom launch script, use that script instead of the generic python <your_main_script>.py command.

Troubleshooting

`PackagesNotFoundError` during conda creation

This usually means some packages in req.txt are unavailable in your current conda channels. In that case, try updating conda first, or recreate the environment with the channels required by your project.

The dataset cannot be found

Check that the final path is exactly:

FAITH/data

`python: can't open file ...`

This means the entry script name is different from the placeholder command in this README. Please replace <your_main_script>.py with the actual script name in the FAITH folder.

If you are preparing the Hugging Face repository page, you can copy this file directly as the project README.md and then replace the script placeholder with the exact training or evaluation command used by your codebase.

Citation

If you use this dataset, please cite the SEED paper.

@inproceedings{seed2026,
  title={SEED: A Large-Scale Benchmark for Provenance Tracing in Sequential Deepfake Facial Edits},
  author={Anonymous ECCV 2026 Submission},
  booktitle={Proceedings of the European Conference on Computer Vision},
  year={2026}
}

Downloads last month: 20

SEED_balanced

Overview

Supported Tasks

Data Construction

Prompt Template Examples

Dataset Statistics

Balanced Partition and Split Protocol

Benchmark Evaluation

Average Results Reported in the Paper

Robustness Settings

Repository Contents

Intended Usage

Data Usage Policy

FAITH Baseline Setup and Usage

Repository Contents

1. Environment Setup

2. Download the Complete Repository

3. Prepare the Training Data

4. Verify the Data Placement

5. Run the Baseline

Notes

Troubleshooting

PackagesNotFoundError during conda creation

The dataset cannot be found

python: can't open file ...

Citation

`PackagesNotFoundError` during conda creation

`python: can't open file ...`