Accurate global machine learning force fields with hundreds of atoms
Learn more Get startedChmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., Müller, K.-R., Science Advances, 3(5), 2017, e1603015.
Chmiela, S., Sauceda, H. E., Müller, K.-R., Tkatchenko, A., Nature Communications, 9(1), 2018, 3887.
Chmiela, S., Vassilev-Galindo, V., Unke, O. T., Kabylda, A., Sauceda, H. E., Tkatchenko, A., Müller, K.-R., Science Advances, 9(2), 2023, eadf0873.
Chmiela, S., Sauceda, H. E., Poltavsky, I., Müller, K.-R., Tkatchenko, A., Computer Physics Communications, 240, 2019, pp. 38-45.
Sauceda, H. E., Vassilev-Galindo, V., Chmiela, S., Müller, K.-R., Tkatchenko, A., Nature Communications, 12(1), 2021, 442.
Sauceda, H. E., Gastegger, M., Chmiela, S., Müller, K.-R., Tkatchenko, A., The Journal of Chemical Physics, 153, 2020, 124109.
Sauceda, H. E., Chmiela, S., Poltavsky, I., Müller, K.-R., Tkatchenko, A., The Journal of Chemical Physics, 150, 2019, 114102.
Sauceda, H. E., Chmiela, S., Poltavsky, I., Müller, K.-R., Tkatchenko, A., In: Machine Learning Meets Quantum Physics, Lecture Notes in Physics (Springer), 968, 2020, pp. 277-307.
Wang, J., Chmiela, S., Müller, K.-R., Noè, F., Clementi, C., The Journal of Chemical Physics, 152, 2020, 194106.
Sauceda, H. E., Gálvez-González, L. E., Chmiela, S., Paz-Borbó, L. O. , Müller, K.-R., Tkatchenko, A., Nature Communications, 9(1), 2018, 3887.
Replicate our numerical results or reconstruct a force field from your own dataset with a Python implementation of sGDML.
The sgdml
package uses a proprietary dataset format, but it is easy to convert from and to Extended XYZ files and other popular file formats (learn more):
$ sgdml_dataset_via_ase.py <xyz_dataset_file>
A force field is created via a single command-line call that yields a ready-to-use model file:
$ sgdml all <sgdml_dataset_file> <n_train> <n_validate> [<n_test>]
The last three parameters specify the sizes for the training, validation and test dataset splits, which are sampled from the provided dataset file without overlap. Leave out <n_test>
to use all remaining points for testing (learn more).
A force field model is effectively a parametrization of your dataset that provides energy e
and forces f
for any input geometry r
(learn more):
import numpy as np
from sgdml.predict import GDMLPredict
from sgdml.utils import io
model = np.load('model.npz')
gdml = GDMLPredict(model)
r,_ = io.read_xyz('geometry.xyz')
e,f = gdml.predict(r)
This flexibility enables many applications, e.g. by interfacing to Atomic Simulation Environment (ASE). Here are a few examples:
We offer an experimental model training service for anyone without sufficient compute resources. Simply upload your dataset, schedule some training jobs and return later to collect your model files:
Name | Size | Benchmark | Download |
DFT [FHI-aims, light tier 1]
|
|||
Benzene (Chmiela et al., 2018) | 49,863 | ||
MD17 dataset (Chmiela et al., 2017)
|
|||
Benzene | 627,983 | ||
Uracil | 133,770 | ||
Naphthalene | 326,250 | ||
Aspirin | 211,762 | ||
Salicylic acid | 320,231 | ||
Malonaldehyde | 993,237 | ||
Ethanol | 555,092 | ||
Toluene | 442,790 | ||
Paracetamol | 106,490 | ||
Azobenzene | 99,999 | ||
MD22 dataset (Chmiela et al., 2023)
|
|||
DFT [FHI-aims, tight tiers 1&2]
|
|||
Ac-Ala3-NHMe | 85,109 | ||
Docosahexaenoic acidDHA | 69,753 | ||
Stachyose | 27,272 | ||
DNA base pair (AT-AT) | 20,001 | ||
DNA base pair (AT-AT-CG-CG) | 10,153 | ||
DFT [FHI-aims, light tier 1]
|
|||
Buckyball catcher | 6,102 | ||
Double-walled nanotubeDWNT | 5,032 | ||
CCSD [Psi4, cc-pVDZ] |
|||
Aspirin | 1,500 | ||
CCSD(T) [Psi4, cc-pVDZ] |
|||
Benzene | 1,500 | ||
Malonaldehyde | 1,500 | ||
Toluene | 1,500 | ||
CCSD(T) [Psi4, cc-pVTZ] |
|||
Ethanol | 2,000 |