Pretraining 3DMolMS on QM9
This guide explains how to pretrain the 3DMolMS model on the QM9 dataset.
Setup
Please set up the environment as shown in the Source code setup page.
Step 1: Data preparation
Download the QM9 dataset from Figshare. The expected data directory structure is:
|- data
|- qm9
|- dsgdb9nsd.xyz.tar.bz2
|- dsC7O2H10nsd.xyz.tar.bz2
|- uncharacterized.txt
Step 2: Preprocessing
Use the following commands to preprocess the datasets. The dataset configuration is stored in ./molnetpack/config/preprocess_etkdgv3.yml.
python scripts/qm92pkl.py --data_config_path ./molnetpack/config/preprocess_etkdgv3.yml
Step 3: Pretraining
Use the following commands to pretrain the model. The model and training settings are in ./molnetpack/config/molnet_pre.yml.
python scripts/pretrain.py \
--train_data ./data/qm9_etkdgv3_train.pkl \
--test_data ./data/qm9_etkdgv3_test.pkl \
--model_config_path ./molnetpack/config/molnet_pre.yml \
--data_config_path ./molnetpack/config/preprocess_etkdgv3.yml \
--checkpoint_path ./check_point/molnet_pre_etkdgv3.pt