Generating a reference library for molecular identification

3DMolMS can be used to generate a reference library of small molecule MS/MS spectra, which can then be used for small molecule identification through MS/MS searching.

Using molecules from HMDB

Setup

Please set up the environment as shown in the Source code setup page.

Step 1: Data preparation

Download the HMDB molecules dataset from HMDB Downloads. The expected data directory structure is:

|- data
  |- hmdb
    |- structures.sdf

Step 2: Preprocessing

Use the following commands to preprocess the datasets. The dataset configuration is stored in ./molnetpack/config/preprocess_etkdgv3.yml.

Using ETKDGv3:

python scripts/hmdb2pkl.py --data_config_path ./molnetpack/config/preprocess_etkdgv3.yml

Or using original conformation:

python scripts/hmdb2pkl.py --data_config_path ./molnetpack/config/preprocess_hmdb.yml

Step 3: MS/MS generation

Use the following commands to generate MS/MS spectra. The model configuration is stored in ./molnetpack/config/molnet.yml. Remember to modify the commands if you’re using the original conformations from HMDB.

for i in {0..21}; do
  python scripts/predict.py --task msms \
  --test_data ./data/hmdb/hmdb_etkdgv3_$i.pkl \
  --model_config_path ./molnetpack/config/molnet.yml \
  --data_config_path ./molnetpack/config/preprocess_etkdgv3.yml \
  --resume_path ./check_point/molnet_qtof_etkdgv3.pt \
  --result_path ./data/hmdb/molnet_hmdb_etkdgv3_$i.mgf
done

Using molecules from RefMet

Setup

Please set up the environment as shown in the Source code setup page.

Step 1: Data preparation

Download the RefMet molecules dataset from RefMet Browse. The expected data directory structure is:

|- data
  |- refmet
    |- refmet.csv

Step 2: Preprocessing

Use the following commands to preprocess the datasets. The dataset configuration is stored in ./molnetpack/config/preprocess_etkdgv3.yml.

python scripts/refmet2pkl.py --data_config_path ./molnetpack/config/preprocess_etkdgv3.yml

Step 3: MS/MS generation

Use the following commands to generate MS/MS spectra. The model configuration is stored in ./molnetpack/config/molnet.yml.

python scripts/predict.py --task msms \
--test_data ./data/refmet/refmet_etkdgv3.pkl \
--model_config_path ./molnetpack/config/molnet.yml \
--data_config_path ./molnetpack/config/preprocess_etkdgv3.yml \
--resume_path ./check_point/molnet_qtof_etkdgv3.pt \
--result_path ./data/refmet/molnet_refmet_etkdgv3.mgf