Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Datasets filters
Main
Tasks
Libraries
Languages
Licenses
Other
Modalities
3D
Audio
Document
Geospatial
Image
Tabular
Text
Time-series
Video
Size (rows)
Reset Size
10M
100M
Format
json
csv
parquet
optimized-parquet
imagefolder
soundfolder
webdataset
text
arrow
Evaluation
Benchmark
Apply filters
Datasets
6,393
Full-text search
Edit filters
Sort: Trending
Active filters:
10M<n<100M
Clear all
open-index/hacker-news
Updated
4 minutes ago
•
16.3k
•
241
nvidia/Nemotron-Cascade-2-SFT-Data
Viewer
•
Updated
14 days ago
•
15.9M
•
11.2k
•
49
wikimedia/wikipedia
Viewer
•
Updated
Jan 9, 2024
•
61.6M
•
94.8k
•
1.17k
autogluon/chronos_datasets
Viewer
•
Updated
Mar 18, 2025
•
10.4M
•
18.7k
•
69
hq-bench/quitobench
Viewer
•
Updated
3 days ago
•
12.5M
•
27
•
4
OpenGVLab/InternVid
Viewer
•
Updated
Aug 13, 2024
•
21.3M
•
568
•
96
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Aug 12, 2024
•
31.1M
•
12.4k
•
680
amphion/Emilia-Dataset
Viewer
•
Updated
Feb 28, 2025
•
54.8M
•
58.5k
•
446
malaysia-ai/Multilingual-TTS
Viewer
•
Updated
25 days ago
•
62.7M
•
2.4k
•
17
nvidia/Nemotron-Pretraining-Specialized-v1.1
Viewer
•
Updated
22 days ago
•
19.8M
•
3.39k
•
32
sKT-Ai-Labs/ZX
Viewer
•
Updated
3 days ago
•
10.4M
•
3
•
3
speechcolab/gigaspeech
Viewer
•
Updated
Feb 7
•
11.9M
•
12.5k
•
155
UCSC-VLAA/MedTrinity-25M
Viewer
•
Updated
Oct 11, 2024
•
24.9M
•
4.16k
•
200
BAAI/Infinity-Instruct
Viewer
•
Updated
Dec 4, 2025
•
21.9M
•
2.88k
•
703
mythicinfinity/libriheavy
Viewer
•
Updated
5 days ago
•
12.4M
•
952
•
11
laion/LAION-DISCO-12M
Viewer
•
Updated
Nov 14, 2024
•
12.3M
•
162
•
42
NewEden-Forge/lemonilia-Roleplay-Forums
Updated
Nov 9, 2024
•
2.1k
•
5
HuggingFaceM4/FineVision
Viewer
•
Updated
Oct 21, 2025
•
24.2M
•
152k
•
478
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M
Viewer
•
Updated
Nov 24, 2025
•
91.5M
•
294k
•
66
yayoimizuha/Glint360k
Viewer
•
Updated
Oct 12, 2025
•
17.4M
•
544
•
4
PleIAs/SYNTH
Viewer
•
Updated
Nov 11, 2025
•
68M
•
66.4k
•
259
PleIAs/French-Science-Commons
Viewer
•
Updated
14 days ago
•
42.6M
•
1.66k
•
18
minishlab/tokenlearn-c4-multilingual-bge-m3
Viewer
•
Updated
6 days ago
•
12M
•
94
•
2
statmt/cc100
Updated
Mar 5, 2024
•
1.93k
•
105
ncbi/pubmed
Updated
Jan 26, 2024
•
946
•
162
anuragshas/mr_cc100_processed
Viewer
•
Updated
Feb 6, 2022
•
12.2M
•
54
•
1
Babelscape/SREDFM
Viewer
•
Updated
Jun 20, 2023
•
15.9M
•
749
•
14
manu/tok-corpus-shuffled
Viewer
•
Updated
Oct 13, 2023
•
31.7M
•
14
•
1
CohereLabs/beir-embed-english-v3
Viewer
•
Updated
8 days ago
•
50.5M
•
1.15k
•
7
ayymen/Weblate-Translations
Viewer
•
Updated
Apr 2, 2024
•
11.7M
•
507
•
18
Previous
1
2
3
...
100
Next