Generating a reference library for molecular identification
3DMolMS can be used to generate a reference library of small molecule MS/MS spectra, which can then be used for small molecule identification through MS/MS searching.
Using molecules from HMDB
Setup
Please set up the environment as shown in the Source code setup page.
Step 1: Data preparation
Download the HMDB molecules dataset from HMDB Downloads. The expected data directory structure is:
|- data
|- hmdb
|- structures.sdf
Step 2: Preprocessing
Use the following commands to preprocess the datasets. The dataset configuration is stored in ./molnetpack/config/preprocess_etkdgv3.yml.
Using ETKDGv3:
python scripts/hmdb2pkl.py --data_config_path ./molnetpack/config/preprocess_etkdgv3.yml
Or using original conformation:
python scripts/hmdb2pkl.py --data_config_path ./molnetpack/config/preprocess_hmdb.yml
Step 3: MS/MS generation
Use the following commands to generate MS/MS spectra. The model configuration is stored in ./molnetpack/config/molnet.yml. Remember to modify the commands if you’re using the original conformations from HMDB.
for i in {0..21}; do
python scripts/predict.py --task msms \
--test_data ./data/hmdb/hmdb_etkdgv3_$i.pkl \
--model_config_path ./molnetpack/config/molnet.yml \
--data_config_path ./molnetpack/config/preprocess_etkdgv3.yml \
--resume_path ./check_point/molnet_qtof_etkdgv3.pt \
--result_path ./data/hmdb/molnet_hmdb_etkdgv3_$i.mgf
done
Using molecules from RefMet
Setup
Please set up the environment as shown in the Source code setup page.
Step 1: Data preparation
Download the RefMet molecules dataset from RefMet Browse. The expected data directory structure is:
|- data
|- refmet
|- refmet.csv
Step 2: Preprocessing
Use the following commands to preprocess the datasets. The dataset configuration is stored in ./molnetpack/config/preprocess_etkdgv3.yml.
python scripts/refmet2pkl.py --data_config_path ./molnetpack/config/preprocess_etkdgv3.yml
Step 3: MS/MS generation
Use the following commands to generate MS/MS spectra. The model configuration is stored in ./molnetpack/config/molnet.yml.
python scripts/predict.py --task msms \
--test_data ./data/refmet/refmet_etkdgv3.pkl \
--model_config_path ./molnetpack/config/molnet.yml \
--data_config_path ./molnetpack/config/preprocess_etkdgv3.yml \
--resume_path ./check_point/molnet_qtof_etkdgv3.pt \
--result_path ./data/refmet/molnet_refmet_etkdgv3.mgf