💻 🧠 Contact school.brainhack@gmail.com for questions! 🧠 💻

Images10K Compendium

By Sara Barbu, & Lune Bellec
Published on June 18, 2025

"A web-based tool for exploring visual datasets across real-world categories using carousels, tables, and interactive views."

Project definition

Background

Many studies in cognitive neuroscience still use really simplified or artificial images, like objects on plain backgrounds.That can limit ecological validity and create a disconnect between what you see in the lab and real world perception. In contrast to that, Images10K provides naturalistic scenes, showing objects in their natural environments, which makes it easier to study how people understand visual scenes in real-world contexts. This approach fits with recent work that emphasizes the importance of using more realistic images in research (Hosu, Lin, Szirányi, & Saupe, 2019).

This project is based on the Images10K dataset — containing over 8,000 naturalistic images, annotated by human participants on the Zooniverse platform. The overall goal is to provide a well-organized and richly annotated image set (8,382 images, 15 semantic categories) that can be reused for training visual recognition models in AI and neuroscience

Tools

  • GitHub for version control and to organize the project in a clear, collaborative, and shareable format.
  • DataLad to retrieve and manage the dataset in a reproducible way.
  • Python scripts to filter images by category, convert metadata formats, and prepare image paths for display.
  • Jupyter Notebooks to explore the metadata, test visualizations, and generate previews of the dataset.
  • Dash Bootstrap Components to build interactive UI elements like carousels and dropdown menus for browsing images.
  • MyST Markdown to structure the content of the interactive website and document the project cleanly within Jupyter Book.
  • Ubuntu terminal to navigate the file system, run scripts, and better understand how to work with files and folders at the command line.

Data

The dataset includes:

  • Over 8,000 naturalistic images, stored in semantically organized folders based on object categories
  • Annotated labels for each image, including category, subcategory, and file path
  • High-level semantic classifications (e.g., living vs. non-living, natural vs. artificial)
  • Rich image-level metadata such as:
    • Number of participant annotations (via Zooniverse)
    • Inter-participant agreement scores
    • License information and usage permissions
    • Source URLs and author attributions

Deliverables

  • A structured GitHub repository with scripts, metadata, and documentation
  • Jupyter Notebooks for metadata preview and exploratory analysis
  • An interactive web interface built with Jupyter Book and MyST Markdown
  • Image carousels and scrollable metadata tables for category-based exploration
  • Downloadable metadata files hosted via Google Drive

Results

Tools I learned during this project

Tools are listed above in the Tools section

Results

Deliverable:

presentation

You can view the full project and explore the carousels here:
Image10k Compendium Website

Note Only a limited set of images is included on the website due to GitHub storage and bandwidth constraints.

Note The interactive carousels (Dash apps) require Python to run, so they won’t launch directly in the website view.

To view the carousels interactively on the Website:
  1. Clone this repository:
    Images10k-compendium

  2. Install dependencies:
    pip install -r binder/requirements.txt

  3. Run both notebooks in the content/ folder:

    • animated_being.ipynb
    • Objects.ipynb

Conclusion and acknowledgement

This project helped me learn how to organize data and build interactive visual tools using Dash and Jupyter. I also got more comfortable working with GitHub and sharing work online.

Thanks to Lune, Marie, Cléo, and the whole BrainHack School team for all the help and support along the way!

See also these similar projects

Analyzing variability of working memory and reward processing in children with and without ADHD using fMRI data

The focus of our project was to gain experience using neuroimaging tools to preprocess, analyze, and visualize functional MRI data. We aimed to …

Discover this project

Experimenting with Occlusion methods to visualize the features learned by a CNN from audio or visual inputs

This project has for goal to explore, understand and learn how to create comprehensive visualizations of the features learned by a convolution neural …

Discover this project

Introduction to data visualization in Python

In this module, we will introduce the basics of plotting in python with some of most commonly used packages such as matplotlib and seaborn.

Discover this project