Datasets:
The dataset could not be loaded because the splits use different data file formats, which is not supported. Read more about the splits configuration. Click for more details.
Error code: FileFormatMismatchBetweenSplitsError
Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.
SEED_balanced
SEED_balanced is the public balanced release of SEED, a benchmark for provenance tracing in sequential deepfake facial edits. Unlike conventional deepfake datasets that focus on single-step manipulations or binary real/fake detection, SEED models multi-step diffusion-based facial editing trajectories and supports three complementary tasks: Authenticity Analysis, Editing Trace Analysis, and Spatial Evidence Analysis. The full SEED benchmark contains 91,526 images with step-wise provenance annotations, while the balanced benchmark partition contains 100,000 images with equal proportions of sequence lengths (L=0,1,2,3,4). :contentReference[oaicite:4]{index=4} :contentReference[oaicite:5]{index=5} :contentReference[oaicite:6]{index=6}
Overview
| Item | Description |
|---|---|
| Dataset name | SEED_balanced |
| Full benchmark scale | 91,526 images |
| Balanced benchmark scale | 100,000 images |
| Domain | Facial imagery |
| Editing type | Sequential diffusion-based facial edits |
| Source real datasets | FFHQ, CelebAMask-HQ |
| Step-wise metadata | Edit order, attribute labels, prompts, masks, editor identity |
| Supported tasks | Authenticity, Editing Trace, Spatial Evidence |
| Official evaluation | CodaBench |
SEED is built from FFHQ and CelebAMask-HQ and edited using diffusion-based pipelines. Each manipulated sample is generated by applying one to four attribute edits sequentially, and each step is logged with provenance metadata including edited attribute, prompt, mask, and editing model. :contentReference[oaicite:7]{index=7} :contentReference[oaicite:8]{index=8}
Supported Tasks
| Task | Description | Output |
|---|---|---|
| Authenticity Analysis | Distinguish real images from sequentially edited ones | Binary or sequence-based decision |
| Editing Trace Analysis | Predict edited attributes and their temporal order | Ordered attribute sequence |
| Spatial Evidence Analysis | Localize manipulated regions | Mask / localization map |
These three tasks are explicitly described in the paper and illustrated in the benchmark overview figure. :contentReference[oaicite:9]{index=9}
Data Construction
SEED is constructed in three stages:
| Stage | Description |
|---|---|
| Preprocessing | Build attribute-specific masks and text conditions |
| Sequential manipulation | Sample sequence length (L \in {1,2,3,4}), choose attributes, and apply a diffusion editor step by step |
| Quality evaluation | Filter degenerate results using perceptual and semantic consistency checks |
The editing pipeline uses multiple diffusion editors, including LEdits, SDXL, and SD3-style models fine-tuned with UltraEdit. Prompt templates are varied to preserve edit intent while increasing linguistic diversity. :contentReference[oaicite:10]{index=10}
Prompt Template Examples
| Attribute | Instruction Template | Caption Template |
|---|---|---|
| Eyes | Make the eyes {color}. | A person with {color} eyes. |
| Lip | Change the lipstick color to {color}. | A person with {color} lipstick. |
| Hair | Turn the hair {color}. / Make the hair {style}. | A person with {color} hair. / A person with {style} hair. |
| Eyebrows | Make the eyebrows {style}. | A person with {style} eyebrows. |
| Glasses | Add a pair of {glasses}. | A person wearing {glasses}. |
| Hat | Add a {hat}. | A person wearing a {hat}. |
These prompt templates are taken from the paperβs dataset construction section. :contentReference[oaicite:11]{index=11}
Dataset Statistics
| Statistic | Value |
|---|---|
| Full SEED images | 91,526 |
| Sequence length (L=1) | 29.91% |
| Sequence length (L=2) | 26.21% |
| Sequence length (L=3) | 21.88% |
| Sequence length (L=4) | 22.00% |
| UltraEdit | 38.28% |
| LEdits | 37.34% |
| SDXL | 24.38% |
| Attribute | Proportion |
|---|---|
| Lip | 28% |
| Eyebrow | 18% |
| Eye | 17% |
| Hat | 14% |
| Hair | 14% |
| Glasses | 9% |
These distributions are reported in the dataset statistics section of the paper. :contentReference[oaicite:12]{index=12}
Balanced Partition and Split Protocol
| Length bucket | Count |
|---|---|
| (L=0), real | 20,000 |
| (L=1) | 20,000 |
| (L=2) | 20,000 |
| (L=3) | 20,000 |
| (L=4) | 20,000 |
| Total | 100,000 |
Benchmark Evaluation
Official evaluation is conducted on CodaBench using three metrics:
| Metric | Meaning |
|---|---|
| Fixed-Acc | Token-level accuracy under a fixed sequence comparison protocol |
| Adaptive-Acc | Token-level accuracy under adaptive sequence comparison |
| Full-Acc | Exact sequence match, the strictest metric |
The paper emphasizes that Full-Acc is the strictest metric because the whole predicted edit history must match the ground truth. :contentReference[oaicite:14]{index=14}
Average Results Reported in the Paper
| Model | Fixed-Acc | Adaptive-Acc | Full-Acc |
|---|---|---|---|
| Shuai et al. | 71.50 | 54.07 | 48.72 |
| FreqNet | 70.08 | 52.59 | 48.27 |
| Ba et al. | 68.78 | 54.80 | 50.80 |
| SeqFakeFormer | 81.62 | 68.53 | 66.97 |
| FAITH (DCT) | 81.70 | 68.56 | 67.02 |
| FAITH (FFT) | 81.75 | 68.58 | 67.03 |
| FAITH (DWT) | 81.87 | 68.84 | 67.26 |
The paper reports that performance drops as edit chains become longer, and that DWT-based FAITH is the strongest average variant overall. :contentReference[oaicite:15]{index=15}
Robustness Settings
The paper also evaluates robustness under:
| Perturbation | Levels |
|---|---|
| JPEG compression | 25%, 50%, 75% |
| Gaussian noise | 10%, 15%, 20% |
Repository Contents
This Hugging Face repository hosts the public release only.
| File | Description |
|---|---|
seqdeepfake_train_data.zip |
Public training archive |
README.md |
Dataset card |
sample_submission.csv |
Optional example submission file |
This repository does not contain:
- hidden test labels
- hidden reference annotations
- official private evaluation data
Those components are handled through CodaBench.
Intended Usage
This dataset is intended for:
- deepfake forensics research
- diffusion-edit provenance tracing
- edit-order prediction
- localization and evidence analysis
- robustness benchmarking under image degradation
Recommended workflow:
- Download and extract the public training data from this repository.
- Train or fine-tune your method locally.
- Validate locally using your own protocol.
- Submit predictions to CodaBench for official hidden-set evaluation.
CodaBench: <PASTE_CODABENCH_LINK_HERE>
Data Usage Policy
Please use this dataset for research, benchmarking, and forensic analysis only.
Please do not use it for:
- identity recognition or surveillance
- face-based profiling
- deceptive content generation
- unauthorized inference about real individuals
Users should also respect the licenses and usage conditions of the original source datasets and any benchmark-specific release conditions.
FAITH Baseline Setup and Usage
This repository provides the FAITH baseline and the associated training data package for the SeqDeepFake setting.
Repository Contents
After downloading the full repository, the top-level structure is expected to look like this:
.
βββ FAITH/
βββ assets/
βββ .gitattributes
βββ README.md
βββ req.txt
βββ seqdeepfake_train_data.zip
1. Environment Setup
Create a new conda environment from req.txt, then activate it:
conda create -n <environment-name> --file req.txt
conda activate <environment-name>
Example:
conda create -n faith --file req.txt
conda activate faith
2. Download the Complete Repository
Please download the complete repository contents from Hugging Face, not only the code folder.
You should have all of the following at the repository root:
FAITH/assets/req.txtREADME.mdseqdeepfake_train_data.zip
If you use Git, a typical workflow is:
git clone <your-huggingface-repository-url>
cd <your-repository-name>
If you download from the web interface instead, make sure the downloaded archive is fully extracted before continuing.
3. Prepare the Training Data
The training data is provided as:
seqdeepfake_train_data.zip
Unzip this file into the FAITH directory, then rename the extracted folder to data.
Run the following commands from the repository root:
unzip seqdeepfake_train_data.zip -d FAITH/
Then rename the extracted folder to data.
For example, if the extracted folder is named seqdeepfake_train_data, run:
mv FAITH/seqdeepfake_train_data FAITH/data
After this step, the expected structure should be:
.
βββ FAITH/
β βββ data/
β βββ ...
βββ assets/
βββ .gitattributes
βββ README.md
βββ req.txt
βββ seqdeepfake_train_data.zip
4. Verify the Data Placement
Before running the baseline, confirm that the dataset is located at:
FAITH/data
That is the expected folder name used by the baseline instructions in this repository.
5. Run the Baseline
After the environment is ready and the dataset has been placed in FAITH/data, enter the FAITH directory and run the baseline script.
cd FAITH
python <your_main_script>.py
Please replace <your_main_script>.py with the actual entry script used in your repository.
If your repository provides separate scripts for training and evaluation, use the appropriate one instead, for example:
cd FAITH
python train.py
or
cd FAITH
python test.py
Notes
- Make sure you download the full repository contents, not only individual files.
- Make sure the extracted dataset folder is renamed exactly to
data. - If
unzipis not installed on your system, install it first or extract the archive manually. - If the extracted folder name is different on your machine, rename that extracted folder to
FAITH/data. - If the project has a custom launch script, use that script instead of the generic
python <your_main_script>.pycommand.
Troubleshooting
PackagesNotFoundError during conda creation
This usually means some packages in req.txt are unavailable in your current conda channels. In that case, try updating conda first, or recreate the environment with the channels required by your project.
The dataset cannot be found
Check that the final path is exactly:
FAITH/data
python: can't open file ...
This means the entry script name is different from the placeholder command in this README. Please replace <your_main_script>.py with the actual script name in the FAITH folder.
If you are preparing the Hugging Face repository page, you can copy this file directly as the project README.md and then replace the script placeholder with the exact training or evaluation command used by your codebase.
Citation
If you use this dataset, please cite the SEED paper.
@inproceedings{seed2026,
title={SEED: A Large-Scale Benchmark for Provenance Tracing in Sequential Deepfake Facial Edits},
author={Anonymous ECCV 2026 Submission},
booktitle={Proceedings of the European Conference on Computer Vision},
year={2026}
}
- Downloads last month
- 20

