Images10K Compendium

By Sara Barbu, & Lune Bellec
Published on June 18, 2025 June 18, 2025

"A web-based tool for exploring visual datasets across real-world categories using carousels, tables, and interactive views."

Project definition

Background

Many studies in cognitive neuroscience still use really simplified or artificial images, like objects on plain backgrounds.That can limit ecological validity and create a disconnect between what you see in the lab and real world perception. In contrast to that, Images10K provides naturalistic scenes, showing objects in their natural environments, which makes it easier to study how people understand visual scenes in real-world contexts. This approach fits with recent work that emphasizes the importance of using more realistic images in research (Hosu, Lin, Szirányi, & Saupe, 2019).

This project is based on the Images10K dataset — containing over 8,000 naturalistic images, annotated by human participants on the Zooniverse platform. The overall goal is to provide a well-organized and richly annotated image set (8,382 images, 15 semantic categories) that can be reused for training visual recognition models in AI and neuroscience

Tools

GitHub for version control and to organize the project in a clear, collaborative, and shareable format.
DataLad to retrieve and manage the dataset in a reproducible way.
Python scripts to filter images by category, convert metadata formats, and prepare image paths for display.
Jupyter Notebooks to explore the metadata, test visualizations, and generate previews of the dataset.
Dash Bootstrap Components to build interactive UI elements like carousels and dropdown menus for browsing images.
MyST Markdown to structure the content of the interactive website and document the project cleanly within Jupyter Book.
Ubuntu terminal to navigate the file system, run scripts, and better understand how to work with files and folders at the command line.

Data

The dataset includes:

Over 8,000 naturalistic images, stored in semantically organized folders based on object categories
Annotated labels for each image, including category, subcategory, and file path
High-level semantic classifications (e.g., living vs. non-living, natural vs. artificial)
Rich image-level metadata such as:
- Number of participant annotations (via Zooniverse)
- Inter-participant agreement scores
- License information and usage permissions
- Source URLs and author attributions

Deliverables

A structured GitHub repository with scripts, metadata, and documentation
Jupyter Notebooks for metadata preview and exploratory analysis
An interactive web interface built with Jupyter Book and MyST Markdown
Image carousels and scrollable metadata tables for category-based exploration
Downloadable metadata files hosted via Google Drive

Results

Tools I learned during this project

Tools are listed above in the Tools section

Results

Deliverable:

Link to view google slide presentation

presentation

Link to the Image10k-Compendium Website

You can view the full project and explore the carousels here:
Image10k Compendium Website

Note Only a limited set of images is included on the website due to GitHub storage and bandwidth constraints.

Note The interactive carousels (Dash apps) require Python to run, so they won’t launch directly in the website view.

To view the carousels interactively on the Website:

Clone this repository:
Images10k-compendium
Install dependencies:
pip install -r binder/requirements.txt
Run both notebooks in the content/ folder:
- animated_being.ipynb
- Objects.ipynb

Conclusion and acknowledgement

This project helped me learn how to organize data and build interactive visual tools using Dash and Jupyter. I also got more comfortable working with GitHub and sharing work online.

Thanks to Lune, Marie, Cléo, and the whole BrainHack School team for all the help and support along the way!

Images10K Compendium

Project definition

Background

Tools

Data

Deliverables

Results

Tools I learned during this project

Results

Deliverable:

Link to view google slide presentation

Link to the Image10k-Compendium Website

To view the carousels interactively on the Website:

Conclusion and acknowledgement

See also these similar projects

Analyzing variability of working memory and reward processing in children with and without ADHD using fMRI data

Introduction to data visualization in Python

Experimenting with Occlusion methods to visualize the features learned by a CNN from audio or visual inputs