The Neurobagel CLI

The bagel-cli is a simple Python command-line tool to automatically parse and describe subject-level phenotypic and imaging attributes in an annotated dataset for integration into the Neurobagel graph.

Installation

DockerSingularity

Option 1 (RECOMMENDED): Pull the Docker image for the CLI from DockerHub:

docker pull neurobagel/bagelcli

Option 2: Clone the repository and build the Docker image locally:

git clone https://github.com/neurobagel/bagel-cli.git
cd bagel-cli
docker build -t bagel .

Build a Singularity image for bagel-cli using the DockerHub image:

singularity pull bagel.sif docker://neurobagel/bagelcli

Running the CLI

CLI commands can be accessed using the Docker/Singularity image.

Note

The Docker examples below assume that you are using the official Neurobagel Docker Hub image for the CLI. If you have instead locally built an image, replace neurobagel/bagelcli in commands with your built image tag.

Input files

The Neurobagel CLI can compile information from several different data sources to create a single harmonized representation of subject data. To run the CLI on a dataset, you will need:

A phenotypic TSV
A Neurobagel JSON data dictionary for the TSV
(Optional) The imaging dataset in BIDS format, if subjects have imaging data available (1)
(Optional) A TSV containing subject statuses for any image processing pipelines that have been run, following the Nipoppy processing status file schema (2)

The CLI will use a valid BIDS dataset to generate harmonized raw imaging metadata for subjects.
This file will be used by the CLI to generate harmonized processing pipeline and derivative metadata for subjects. It has compatibility with the Nipoppy workflow, and can be automatically generated using the Nipoppy pipeline trackers.

Viewing CLI commands and options

The bagel-cli has different commands, each generating a different type of subject (meta)data:

pheno
bids
derivatives

The pheno command must be run first on a dataset (each subject in a Neurobagel graph must have at least phenotypic information); other metadata are optional and can be added in an arbitrary order.

To view the general CLI help and information about the commands:

DockerSingularity

# This is a shorthand for `docker run --rm neurobagel/bagelcli --help`
docker run --rm neurobagel/bagelcli

# This is a shorthand for `singularity run bagel.sif --help`
singularity run bagel.sif

To view the command-line arguments for a specific command (e.g., pheno):

DockerSingularity

docker run --rm neurobagel/bagelcli pheno -h

singularity run bagel.sif pheno -h

Running the CLI on your data

cd into your local directory containing your CLI input files (at minimum, a phenotypic TSV and corresponding Neurobagel annotated JSON data dictionary).
Run a bagel-cli container and include your CLI command and arguments at the end in the following format:

DockerSingularity

docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli <CLI command here>

What is this command doing?

The combination of options --volume=$PWD:$PWD -w $PWD mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the container's working directory to the mounted path (so it matches your location on your host machine). This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)

singularity run --no-home --bind $PWD --pwd $PWD /path/to/bagel.sif <CLI command here>

What is this command doing?

The combination of options --bind $PWD --pwd $PWD mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the container's working directory to the mounted path (so it matches your location on your host machine). This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)

Example

If your dataset lives in /home/data/Dataset1:

home/
└── data/
    └── Dataset1/
        ├── tabular/
        │   ├── Dataset1_pheno.tsv
        │   ├── Dataset1_pheno.json
        │   └── ...
        ├── bids/
        │   ├── sub-01/
        │   ├── sub-02/
        │   └── ...
        ├── derivatives/
        │   ├── Dataset1_proc_status.tsv
        │   └── ...
        └── ...

Note

This is an example directory structure following the Nipoppy specification for dataset organization. Your input data may be organized differently.

To generate a single, graph-ready JSONLD file incorporating all subject data sources recognized by Neurobagel (Dataset1.jsonld), you could run the CLI as follows:

DockerSingularity

cd /home/data/Dataset1

# 1. Generate harmonized phenotypic data at the subject level
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli pheno \
    --pheno "tabular/Dataset1_pheno.tsv" \
    --dictionary "tabular/Dataset1_pheno.json" \
    --name "My dataset 1" \
    --output "Dataset1.jsonld"

# 2. Add subjects' BIDS data to the existing .jsonld
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli bids \
    --jsonld-path "Dataset1.jsonld" \
    --bids-dir "bids" \
    --output "Dataset1.jsonld" \
    --overwrite  # (1)!

# 3. Add subjects' processing pipeline metadata to the existing .jsonld
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli derivatives \
    --tabular "derivatives/Dataset1_proc_status.tsv" \
    --jsonld-path "Dataset1.jsonld" \
    --output "Dataset1.jsonld" \
    --overwrite

To keep outputs of different CLI commands as separate files, omit the --overwrite flag.

Tip

Short forms for a CLI command's options can be found by running:
docker run --rm neurobagel/bagelcli pheno --help

cd /home/data/Dataset1

# 1. Generate harmonized phenotypic data at the subject level
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif pheno \
    --pheno "tabular/Dataset1_pheno.tsv" \
    --dictionary "tabular/Dataset1_pheno.json" \
    --name "My dataset 1" \
    --output "Dataset1.jsonld"

# 2. Add subjects' BIDS data to the existing .jsonld
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif bids \
    --jsonld-path "Dataset1.jsonld" \
    --bids-dir "bids" \
    --output "Dataset1.jsonld" \
    --overwrite  # (1)!

# 3. Add subjects' processing pipeline metadata to the existing .jsonld
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif derivatives \
    --tabular "derivatives/Dataset1_proc_status.tsv" \
    --jsonld-path "Dataset1.jsonld" \
    --output "Dataset1.jsonld" \
    --overwrite

To keep outputs of different CLI commands as separate files, omit the --overwrite flag.

Tip

Short forms for a CLI command's options can be found by running:
singularity run bagel.sif pheno --help

Speed of the bids command

The bids command of the bagel-cli (step 2) currently can take upwards of several minutes for datasets greater than a few hundred subjects, due to the time needed for pyBIDS to read the dataset structure. Once the slow initial dataset reading step is complete, you should see the message:

BIDS parsing completed.
...

Upgrading to a newer version of the CLI

Neurobagel is under active, early development and future releases of the CLI may introduce breaking changes to the data model for subject-level information in a .jsonld graph file. Breaking changes will be highlighted in the release notes!

If you have already created .jsonld files for your Neurobagel graph database using the CLI, they can be quickly re-generated under the new data model by following the instructions here so that they will not conflict with dataset .jsonld files generated using the latest CLI version.

Development environment

To ensure that our Docker images are built in a predictable way, we use requirements.txt as a lock-file. That is, requirements.txt includes the entire dependency tree of our tool, with pinned versions for every dependency (see here for more information).

Setting up a local development environment

To work on the CLI, we suggest that you create a development environment that is as close as possible to the environment we run in production.

Install the dependencies from the lockfile (dev_requirements.txt):
```
pip install -r dev_requirements.txt
```
Install the CLI without touching the dependencies:
```
pip install --no-deps -e .
```
Install the bids-examples and neurobagel_examples submodules needed to run the test suite:
```
git submodule init
git submodule update
```

Confirm that everything works well by running the tests: pytest .

Setting up code formatting and linting (recommended)

pre-commit is configured in the development environment for this repository, and can be set up to automatically run a number of code linters and formatters on any commit you make according to the consistent code style set for this project.

Run the following from the repository root to install the configured pre-commit "hooks" for your local clone of the repo:

pre-commit install

pre-commit will now run automatically whenever you run git commit.

Updating Python lock-file

The requirements.txt file is automatically generated from the setup.cfg constraints. To update it, we use pip-compile from the pip-tools package. Here is how you can use these tools to update the requirements.txt file.

Ensure pip-tools is installed:
```
pip install pip-tools
```
Update the runtime dependencies in requirements.txt:
```
pip-compile -o requirements.txt --upgrade
```
The above command only updates the runtime dependencies. Now, update the developer dependencies in dev_requirements.txt:
```
pip-compile -o dev_requirements.txt --extra all
```