Fine-Tuning a Video Large Language Model (Vid-LLM) for Automatic Annotation of Gameplay
This project aims to develop a preprocessing pipeline to fine-tune a Video Large Language Model (Vid-LLM) for automatic annotation of gameplay recordings in cognitive neuroscience studies. Leveraging the Gym Retro ecosystem and the Courtois NeuroMod dataset, we convert event logs into video format and generate detailed annotations and timestamps to train the Vid-LLM. Deliverables include cleaned datasets, documentations and Jupyter notebooks.