Research data management using DataLad
Information
The estimated time to complete this training module is 3h.
The prerequisites to take this module are:
- the installation module.
- the introduction to the terminal module.
- the introduction to git and github module can help, but not required.
- the project management module is recommended, but not required.
If you have any questions regarding the module content please ask them in the relevant module channel on the school Discord server. If you do not have access to the server and would like to join, please send us an email at school.brainhack [at] gmail [dot] com.
Resources
This module was presented by Adina Wagner during the HBM Brainhack in 2020.
The material of the tutorial is available here.
The video of her presentation is available below:
For the installation of the DataLad please follow the instructions in the DataLad Handbook
Exercise
- Follow along the tutorial with Adina. You can copy paste the commands from the DataLad handbook section linked above, while following the video.
- Warning 1: The url for one of the books in the tutorial (
byte-of-python.pdf
) is broken, so the pdf is unreadable. This does not impact the tutorial, but just don’t be surprised if that document does not open. Also it shows how important it is to create persistent URLs when you release material, such as those offered on platforms likezenodo
,osf
orfigshare
. - Warning 2: Follow the tutorial you may need to install new command line tools, such as
tree
. - Warning 3: To be able to clone the some repositories throughout the hands on parts of the lecture you will need to produce a SSH key and register it with your github account. To be able to create your SSH key please follow the instructions from Github. From the Git Bash terminal (a bash emulation that comes with the installation of Git) go to where the ssh key file is stored and run
cat ~/.ssh/id_rsa.pub
command to see the key. It will be a very long string of letters and numbers starting with an indicatorssh-rsa
. Copy the whole chunk of the key string and go to your GitHub account, from Settings> SSH & GPG keys menu click to the New SSH key button. Paste the copied key into theKey
text box and give a title to your key such ashome_laptop_github_key
. And click theAdd SSH Key
button to save it. Now you have your SSH key is settled for the current operating system environment and you are ready to rundatalad clone
command by usinggit@github.com:...
links listed throughout the tutorial. - Follow up with your local TA(s) to validate you completed the exercises correctly.
- 🎉 🎉 🎉 You completed this training module! 🎉 🎉 🎉
- Warning 1: The url for one of the books in the tutorial (
More resources
If you want to learn more, check:
- The DataLad handbook, which features lot of additional resources as well!
- The DataLad datasets github organization, which provides an easy access to a number of data resources. This type of DataLad repositories are the easiest way to get access to datasets.
- The DataLad lecture series
- The DataLad Course Material
- Note that for the last part of the tutorial you will need to install singularity and the
datalad-container
extension (installable throughpip
). - All of the Open Neuro datasets available on the Open Neuro github organization.
- You can also read about the YODA principles for reproducible papers.