ML tasks
Here we describe the key words controlling the settings of models in input.ml
file.
- ml_type (integer)
-
The type of of ML task. The following options are possible:
Regression models
ml_type=0
Regression using basis functions (e.g. LML, QNML, polynomial chaos, etc.)
ml_type=1
Kernel ridge regression
ml_type=2
Reserved only for advanced users
Other tasks
ml_type=-1
Compute descriptors only without any fit
ml_type=-2
Analyze the data and, if required, make a choice of kernel to use in
ml_type=1
Default is
ml_type=0
.
Descriptors only
The following keywords are useful for the tasks with ml_type=-1
, where descriptors are computed and written into files without performing regression. Other types of ml_type
do not exclude using these options. For the descriptor-specific settings, see the Descriptors section.
- desc_forces (logical)
-
This key word controlls wheather the descriptors of forces are computed or not. When setting
desc_forces=.true.
, please verify that descriptors of forces are implemented for the descriptor that you choose.Default is
desc_forces=.false.
- write_desc (logical)
-
This key word activates writing descriptors into files. When using
ml_type=-1
, consider settingwrite_desc=.true.
Default is
write_desc=.false.
Example: typical commands for the task of computing descriptors (e.g. for structural analysis)
ml_type=-1
write_desc=.true.
desc_forces=.false.
More examples can be found in the Examples section
Regression models
Type of regression
The following options are relevant for the ml_type=0
regression models with basis functions (linear in parameters models) namely:
linear (LML), quandratic (QNML, QML) or polynomial chaos.
The kernel regressions require separate treatment threfore the options for kernel models are presented in separate
section Kernel section.
- mld_order (integer)
-
Define the type of regression the descriptor space:
mld_order = 1
this enable a linear fit (known also as LML) and will gives \(1+D\) parametersmld_order = 2
this enable a quandratic fit and will gives \(1 + D + D^2\) parameters. In this case there is a supplementary choice about the type of quandratic model by the optionsnap_type_quadratic
. About the quadratic fit see Goryaeva et al. 2021 for more details.mld_order = 3
this enable a polynomial chaos type of fitting. Two others parameters should be set:polyc_n_poly
andpolyc_n_hermite
. The number of paramters is very large given by this formulae: 1 +polyc_n_hermite
\(\times D\) +polyc_n_hermite
\(\times D^2\) + + … +polyc_n_hermite
\(\times D\) polyc_n_poly
Default is
mld_order = 1
.
- mld_type_quadratic (integer)
-
The type of quadratic fit. For the case 1 the solution is preconditionned by the linear fit i.e. the first 1 + \(D\) are exactelly set to the LML solution and only the remaining \(D^2\) parameters are fitted quadratically. For the case 2 the full quadratic solution is provided, all the parameters are fitted without preconditionning. Shortly speaking
mld_type_quadratic=1
is for QNML andmld_type_quadratic=2
is for QML. See Goryaeva et al. 2021 for more details.Default is
mld_type_quadratic = 1
.
- polyc_n_poly (integer)
-
active for polynomial chaos regression i.e.
mld_order=3
. Is the order of polynomial degree.Default is
polyc_n_poly=3
.
- polyc_n_hermite (integer)
-
The maxiumum degree for Hermite polynomials for the polynomial chaos regression.
Milady
handle Hermite basis up to the 4 th order.Default is
polyc_n_hermite=2
.
Solving algorithm
- mld_fit_type (integer)
-
The type of algorithm used in order to solve least square (LS) problem \(\mathbf{A} \beta= \mathbf{b}\). We recommend without any hesitation
mld_fit_type = 4
.\(\mathbf{A}\) is \(M \times D\) matrix, \(M\) being the number of observations and \(D\) the number of parameters (in the case of linear ML the dimension of descriptor + 1), \(\beta\) the parameter matrix \(D \times 1\) and \(\mathbf{b}\) the observations matrix \(M \times 1\). Actually in the
MiLaDy
implementation we build \(\mathbf{Amat}\) matrix that has the dimensions \(d \times m\) being in fact \(\mathbf{A}^T\) (with the notation used for this documentation):mld_fit_type=0
: home made solver based on LU / QR decomposition for general matrix. There is no any particular assuption for the size or rank of \(\mathbf{A}\) matrix. All the cases \(M > D\), \(M < D\) and \(M = D\) are treated.mld_fit_type=1
: solution based on QR decomposition for serial and ScaLapack version. Adapted for full rank matrix \(\mathbf{A}\) and use the assumption that \(\textrm{rank}(A) = \min(M,D)\), in other words, \(A\) has full rank. In serial version, if \(\mathbf{A}\) is not full rank the inversion will stop with a error. Uses a QR or LQ factorization of \(\mathbf{A}\).mld_fit_type=2
: restricted only for advanced users Solution with constraints. The constraints are of form \(\mathbf{B}x=\mathbf{d}\). The matrix \(\mathbf{B}\) and the vector \(\mathbf{d}\) are filled with all the data (input and target for energy, force or stress) contained in the class fixed bysnap_class_constraints
.-
mld_fit_type=3
: For serial version this is adapted for the general case when we may have \(\textrm{rank}(\mathbf{A}) < \min(M,D)\), in other words, \(\mathbf{A}\) may be rank-deficient, we seek the minimum norm least squares solution \(\beta\) which minimizes both \(\left| \beta \right|^2\) and \(\left| b - A \beta \right|^2\). With this option a rank estimation is possible. The ScaLapack version uses Cholesky decomposition for symmetric and positive definite matrix consequnetly sometimes should be avoided.Warning
Avoid this solution for Scalapack version. Is very likely to obtain weird results in the most favorable cases but probably you will have segmetation fault and or
NaN
as parameters :)). mld_fit_type=4
: In the general case when we may have \(\textrm{rank}(\mathbf{A}) < \min(M,D)\), in other words, \(\mathbf{A}\) may be rank-deficient, we seek the minimum norm least squares solution \(\beta\) which minimizes both \(\left| \beta \right|^2\) and \(\left| b - A \beta \right|^2\). Is the slowest but is by far mathematically most complete solution based on SVD decomposition. With this option a rank estimation (via SVD and driven by the optionsvd_rcond
).
Default is mld_fit_type=4
.
- svd_rcond (real)
-
The value of the limit from which the singular eigenvalues of the design matrix \((\mathbf{A})\) (or any matrix) are zero. Any eigenvalue lower that this limit is treated as zero. Obviously the value of
svd_rcond
has an impact on the \(\textrm{rank}(\mathbf{A})\). If a negative value is choosen thenMilady
fixsvd_rcond=100.d0*epsilon(1.d0)
where epsilon is the Lapack machine precision function (around1.d-15
in most of the cases).Default is
svd_rcond=-1
.
- snap_class_constraints (character)
-
The class that imposes the constraints on fit. Is active only if
mld_fit_type=2
. All the configuration mentioned in this class will fill the constraints matrix \(\mathbf{B}\) and the target vector \(\mathbf{d}\).Default is
"07"
.
- write_design_matrix (logical)
-
Dump the design matrix, which contains the descriptors (energy, force and stress if mentioned in
db_model.in
) and the weigths. The design matrix is writen design in filedesign_matrix.dat
. This matrix has the dimensionnumber_of_data
\(\times\)dim_desc
, where number_of_data are the number of data points (energy, force or stress) used to fit the potential anddim_desc
is the dimension of descriptor. And additional filedesign_matrix.info
is written with information about the database used for trainning.This option is active for normal trainning (
ml_type >= 0
).Warning
The design matrix is written as the transpose of the matrix used internally by
milady
Format of the design_matrix.dat :
The first 3 lines are comments
Then are
number_of_data
lines withdim_desc
columns.
Format of the design_matrix.info :
The first 5 lines are comments.
Then there are
number_of_data
lines with4
columns.The first column is a tag that indicates whether the corresponding line comes from energy (1), force (2), or stress (3).
The second column gives the target value (energy, force, or stress) used for training.
The third column indicates the weight used for that data point in the loss function.
The fourth column indicates the source file of that data point.
Default
write_design_matrix=.false.
Regularization and loss
- mld_regularization_type (integer)
-
mld_regularization_type
=0: no regularization-
mld_regularization_type
=1: applies the regularized solution of parameters, \(\mathbf{w}(\lambda_{krr})\) found by the Moore-Penrose inversion:\[\mathbf{w}(\lambda_{krr})= \left( \mathbf{A}^T \mathbf{A} + \lambda_{rr} \mathbf{I} \right)^{-1} \mathbf{A}^T \mathbf{y}\]The properties of the logarithmic search grid of \(\lambda_{krr}\) are defined by the following parameters:
-
real
,min_lambda_krr
andmax_lambda_krr
the min and max of the logarithmic grid. If one of them is negative then an automatic grid with 21 points is set-up between1.d-10
and1.d+10
.Defaultmin_lambda_krr =1.d-10
andmax_lambda_krr =1.d+10
. -
integer
,n_values_lambda_krr
the number of points on grid.Defaultn_values_lambda_krr=21
-
Default is
0
.
- type_of_loss (integer)
-
This option defines the type of the loss function. It can have values 1, 2 or 3. The loss function has the following four terms:
\[J(\mathbf{w}) = J_E(\mathbf{w}) + J_F(\mathbf{w}) + J_S(\mathbf{w}) + R(\mathbf{w}, \lambda) \, ,\]for energy, forces, stress losses and regularization, respectively.
We have implemented three types of losses. Here are the details for each of them. The energy part of loss:
\[\begin{split}\begin{aligned} J_E^1(\mathbf{w}) & = & \frac{1}{2} \sum_{m_E=1}^{M_E} \omega_{m_E}\left( E_{m_E} - \hat{E}_{m_E}(\mathbf{w}) \right)^2 \\ J_E^2(\mathbf{w}) & = & \frac{1}{2} \sum_{m_E=1}^{M_E} \frac{ \omega_{m_E}} {M_E} \left( E_{m_E} - \hat{E}_{m_E}(\mathbf{w}) \right)^2 \\ J_E^3(\mathbf{w}) & = & \frac{1}{2} \sum_{m_E=1}^{M_E} \frac{ \omega_{m_E}} {M_E} \left( \frac{E_{m_E} - \hat{E}_{m_E}(\mathbf{w})}{N_{m_E}} \right)^2\end{aligned}\end{split}\]where \(M_E\) are the number of energy configuration included in the train database whilst \(\omega_{m_E}\) (the one which is defined in the
db_model.in
file) and \(N_{m_E}\) are the weights and the number of atoms of the \(m_E^{\textrm{th}}\) configuration.In the case of forces:
\[\begin{split}\begin{aligned} J_F^1(\mathbf{w}) & = & \frac{1}{2} \sum_{m_F=1}^{M_F} \omega_{m_F}\left( f_{m_E} - \hat{f}_{m_E}(\mathbf{w}) \right)^2 \\ J_F^2(\mathbf{w}) & = & \frac{1}{2} \sum_{m_f=1}^{M_F} \frac{ \omega_{m_F}} {M_F} \left( f_{m_F} - \hat{f}_{m_F}(\mathbf{w}) \right)^2 \\ J_F^3(\mathbf{w}) & = & \frac{1}{2} \frac{1}{M_{F,S}} \sum_{s=1}^{M_{F,S}} \sum_{a=1}^{N_s} \frac{1}{3 N_{s}} \left( f_{s,a} - \hat{f}_{s,a}(\mathbf{w}) \right)^2\end{aligned}\end{split}\]where \(\omega_{m_F}\) is the weight of the \(m_F^{\textrm{th}}\) point in the forces database of a total of \(M_F\) datapoints. \(M_{F,S}\) is the number of systems, which contain forces that should be fitted, \(s\) is some order nummber of the system and \(N_s\) is the the number of atoms in that \(s\) system.
In the case of stress:
\[\begin{split}\begin{aligned} J_S^1(\mathbf{w}) & = & \frac{1}{2} \sum_{m_S=1}^{M_S} \omega_{m_S}\left( \sigma_{m_S} - \hat{\sigma}_{m_S}(\mathbf{w}) \right)^2 \\ J_S^2(\mathbf{w}) & = & \frac{1}{2} \sum_{m_S=1}^{M_S} \frac{ \omega_{m_S}} {M_S} \left( \sigma_{m_S} - \hat{\sigma}_{m_S}(\mathbf{w}) \right)^2 \\ J_F^3(\mathbf{w}) & = & \frac{1}{2} \frac{1}{M_{S,S}} \sum_{s=1}^{M_{S,S}} \sum_{\alpha=1}^{6} \frac{1}{6} \left( \sigma_{s, \alpha} - \hat{\sigma}_{s, \alpha}(\mathbf{w}) \right)^2\end{aligned}\end{split}\]where the above notations have the same meaning as in the case of forces. \(M_S\) denotes the number of datapoints of stress observables (6 points for each system). \(M_{S,S}\) denotes the number of systems that have stress information. and \(\sigma_{s, \alpha}\) is one component of the stress in the particular system \(s\).
Default is
1
.
- train_only (loginal)
-
Only the trainning is performed. No tests at all. This option is tested only in the case of
ml_type=0
. target vector \(d\).Default is
.false.
.