Welcome to qmflows’ documentation!

Contents:

https://github.com/SCM-NV/qmflows/workflows/build%20with%20conda/badge.svg https://codecov.io/gh/SCM-NV/qmflows/branch/master/graph/badge.svg Documentation Status https://zenodo.org/badge/DOI/10.5281/zenodo.3274284.svg https://badge.fury.io/py/qmflows.svg _images/qmflows.png

QMFlows

See documentation for tutorials and documentation.

Motivation

Research on modern computational quantum chemistry relies on a set of computational tools to carry out calculations. The complexity of the calculations usually requires intercommunication between the aforementioned tools, such communication is usually done through shell scripts that try to automate input/output actions like: launching the computations in a cluster, reading the resulting output and feeding the relevant numerical result to another program. Such scripts are difficult to maintain and extend, requiring a significant programming expertise to work with them. Being then desirable a set of automatic and extensible tools that allows to perform complex simulations in heterogeneous hardware platforms.

This library tackles the construction and efficient execution of computational chemistry workflows. This allows computational chemists to use the emerging massively parallel compute environments in an easy manner and focus on interpretation of scientific data rather than on tedious job submission procedures and manual data processing.

Description

This library consists of a set of modules written in Python3 to automate the following tasks:

  1. Input generation.

  2. Handle tasks dependencies (Noodles).

  3. Advanced molecular manipulation capabilities with (rdkit).

  4. Jobs failure detection and recovery.

  5. Numerical data storage (h5py).

Tutorial and Examples

A tutorial written as a jupyter-notebook is available from: tutorial-qmflows. You can also access direclty more advanced examples.

Installation

  • Download miniconda for python3: miniconda (also you can install the complete anaconda version).

  • Install according to: installConda.

  • Create a new virtual environment using the following commands:

    • conda create -n qmflows

  • Activate the new virtual environment

    • source activate qmflows

To exit the virtual environment type source deactivate.

Dependencies installation

  • Type in your terminal:

    conda activate qmflows

Using the conda environment the following packages should be installed:

  • install rdkit and h5py using conda:

    • conda install -y -q -c conda-forge rdkit h5py

    • Note that rdkit is optional for Python 3.7 and later.

Package installation

Finally install the package:

  • Install QMFlows using pip: - pip install qmflows

Now you are ready to use qmflows.

Notes:

  • Once the libraries and the virtual environment are installed, you only need to type conda activate qmflows each time that you want to use the software.

Starting the environment

Once QMflows has been installed the user should run the following command to initialize the environment:

[user@int1 ~]$ source activate qmflows
discarding /home/user/anaconda3/bin from PATH
prepending /home/user/anaconda3/envs/qmflows/bin to PATH
(qmflows)[user@int1 ~]$ python --version
Python 3.5.2 :: Anaconda custom (64-bit)

To leave the environment the following command is used

(qmflows)[user@int1 ~]$ source deactivate
discarding /home/user/anaconda3/envs/qmflows/bin from PATH

To finalize preparations before running QMflows: if you don’t want the results to end up in the current work directory, create a new results folder.

[26]:
mkdir tutorial_results

What is QMflows?

QMflows is a python library that enables executing complicated workflows of interdependent quantum chemical (QM) calculations in python. It aims at providing a common interface to multiple QM packages, enabling easy and systematic generation of the calculation inputs, as well as facilitating automatic analysis of the results. Furthermore it is build on top of the powerful Noodles framework for executing the calculations in parallel where possible.

The basics: calling packages

Currently QMFLOWS offers an interface with the following simulation software:

  • SCM (ADF and DTFB)

  • CP2K

  • ORCA

Note:

Please make sure that the packages you want to use in QMflows are installed and active; in most supercomputer the simulation package are available using a command like (consult your system administrator):

load module superAwesomeQuantumPackage/3.1421

Also some simulation packages required that you configure a scratch folder. For instance Orca requires a SCR folder to be defnied while ADF called it SCM_TMPDIR.

With qmflows you can write a python script that simply calls one of the package objects adf, dftb, cp2k, or orca. As arguments to the call, you need to provide a settings objects defining the input of a calculation, a molecular geometry, and, optionally, a job name that enables you to find back the “raw” data of the calculation later on.

Let’s see how this works:

First we define a molecule, for example by reading one from an xyz file:

[7]:
from scm.plams import Molecule
acetonitrile = Molecule("../test/test_files/molecules/acetonitrile.xyz")
print(acetonitrile)
  Atoms:
    1         C      2.366998      0.579794     -0.000000
    2         C      1.660642      1.834189      0.000000
    3         N      1.089031      2.847969      0.000000
    4         H      2.100157      0.010030      0.887206
    5         H      2.100157      0.010030     -0.887206
    6         H      3.439065      0.764079     -0.000000

Then we can perform geometry optimization on the molecule by a call to the dftb package object:

[8]:
from qmflows import dftb, templates, run
job = dftb(templates.geometry, acetonitrile, job_name="dftb_geometry_optimization")
print(job)
<noodles.interface.decorator.PromisedObject object at 0x7ff8fe871da0>

As you can see, “job” is a so-called “promised object”. It means it first needs to be “run” by the Noodles scheduler to return a normal python object.

[29]:
result = run(job, path="tutorial_results", folder="run_one", cache="tutorial_cache.json")
print(result)
[09:14:04] PLAMS working folder: /home/lars/workspace/qmflows/jupyterNotebooks/tutorial_results/run_one
╭─(running jobs)
│ Running dftb dftb_geometry_optimization...
╰()─(success)
<qmflows.packages.SCM.DFTB_Result object at 0x7f6c8e30bcf8>

We can easily retrieve the calculated properties from the DFTB calculation such as the dipole or the optimized geometry for use in subsequent calculations.

[30]:
print("Dipole: ", result.dipole)
print(result.molecule)
Dipole:  [1.0864213029, -1.9278296041, -0.0]
  Atoms:
    1         C      2.366998      0.579794     -0.000000
    2         C      1.660642      1.834189      0.000000
    3         N      1.089031      2.847969      0.000000
    4         H      2.100157      0.010030      0.887206
    5         H      2.100157      0.010030     -0.887206
    6         H      3.439065      0.764079     -0.000000

Settings and templates

In the above example templates.geometry was actually a predefined Settings object. You can define and manipulate Settings in a completely flexible manner as will be explained in this section. To facilitate combining different packages in one script, QMflows defines a set of commonly used generic keywords, which can be combined with package specific keywords, to provide maximum flexibility.

[10]:
from qmflows import Settings
s = Settings()
s.basis = "DZP"
s.specific.adf.basis.core = "large"
s.freeze = [1,2,3]
print(s)
basis:  DZP
specific:
         adf:
             basis:
                   core:        large
freeze:         [1, 2, 3]

This code snippet illustrates that the Settings can be specified in two ways, using generic or specific keywords. Generic keywords represent input properties that are present in most simulation packages like a basis set while specific keywords allow the user to apply specific keywords for a package that are not in a generic dictionary.

Expert info:

Settings are a subclass of python dictionaries to represent herarchical structures, like

[3]:
from IPython.display import Image
Image(filename="files/simpleTree.png")
[3]:
_images/interactive_tutorial_19_0.png

In QMflows/PLAMS multiple settings objects can be combined using the overlay function.

[11]:
merged_settings = templates.geometry.overlay(s)
print(merged_settings)
specific:
         adf:
             basis:
                   type:        SZ
                   core:        large
             geometry:
                      optim:    delocal
             numericalquality:  good
             scf:
                 converge:      1e-6
                 iterations:    100
         ams:
             ams:
                 GeometryOptimization:
                                      MaxIterations:    500
                 Task:  GeometryOptimization
         cp2k:
              force_eval:
                         dft:
                             mgrid:
                                   cutoff:      400
                                   ngrids:      4
                             qs:
                                method:         gpw
                             scf:
                                 OT:
                                    N_DIIS:     7
                                    minimizer:  DIIS
                                    preconditioner:     full_single_inverse
                                 eps_scf:       1e-06
                                 max_scf:       200
                                 scf_guess:     atomic
                         subsys:
                                cell:
                                     periodic:  xyz
              global:
                     print_level:       low
                     project:   cp2k
                     run_type:  geometry_optimization
              motion:
                     geo_opt:
                             max_iter:  500
                             optimizer:         bfgs
                             type:      minimization
         cp2k_mm:
                 force_eval:
                            method:     FIST
                            mm:
                               forcefield:
                                          do_nonbonded:
                                          ei_scale14:   1.0
                                          ignore_missing_critical_params:
                                          shift_cutoff:         .TRUE.
                                          spline:
                                                 r0_nb:         0.25
                                          vdw_scale14:  1.0
                               poisson:
                                       ewald:
                                             ewald_type:        NONE
                                       periodic:        NONE
                               print:
                                     ff_info low:
                                                 spline_data:   .FALSE.
                                                 spline_info:   .FALSE.
                            subsys:
                                   cell:
                                        abc:    [angstrom] 50.0 50.0 50.0
                                        periodic:       NONE
                                   topology:
                                            center_coordinates:
                                                               center_point:    0.0 0.0 0.0
                                            coord_file_format:  OFF
                 global:
                        print_level:    low
                        project:        cp2k
                        run_type:       geometry_optimization
                 motion:
                        geo_opt:
                                max_iter:       500
                                optimizer:      lbfgs
                                type:   minimization
         dftb:
              dftb:
                   resourcesdir:        DFTB.org/3ob-3-1
              task:
                   runtype:     GO
         orca:
              basis:
                    basis:      sto_sz
              method:
                     functional:        lda
                     method:    dft
                     runtyp:    opt
basis:  DZP
freeze:         [1, 2, 3]

The overlay method merged the template containing default settings for geometry optimizations with different packages with the arguments provided by the user

[2]:
Image(filename="files/merged.png")
[2]:
_images/interactive_tutorial_23_0.png

resulting in:

[3]:

Image(filename="files/result_merged.png")
[3]:
_images/interactive_tutorial_25_0.png

Note that the generic and specific keywords still exist next to each other and may not be consistent (e.g. different basis sets are defined in generic and specific keywords). Upon calling a package with a Settings object, the generic keywords are first translated into package specific keywords and combined with the relevant user defined specific keywords. In this step, the settings defined in generic keywords take preference. Subsequently, the input file(s) for the given package is/are generated, based on the keywords after specific.[package] based on the PLAMS software.

[12]:
from qmflows import adf
print(adf.generic2specific(merged_settings))
basis:  DZP
freeze:         [1, 2, 3]
specific:
         adf:
             basis:
                   core:        large
                   type:        DZP
             geometry:
                      optim:    cartesian
             numericalquality:  good
             scf:
                 converge:      1e-6
                 iterations:    100
             constraints:
                         atom 1:
                         atom 2:
                         atom 3:
         ams:
             ams:
                 GeometryOptimization:
                                      MaxIterations:    500
                 Task:  GeometryOptimization
         cp2k:
              force_eval:
                         dft:
                             mgrid:
                                   cutoff:      400
                                   ngrids:      4
                             qs:
                                method:         gpw
                             scf:
                                 OT:
                                    N_DIIS:     7
                                    minimizer:  DIIS
                                    preconditioner:     full_single_inverse
                                 eps_scf:       1e-06
                                 max_scf:       200
                                 scf_guess:     atomic
                         subsys:
                                cell:
                                     periodic:  xyz
              global:
                     print_level:       low
                     project:   cp2k
                     run_type:  geometry_optimization
              motion:
                     geo_opt:
                             max_iter:  500
                             optimizer:         bfgs
                             type:      minimization
         cp2k_mm:
                 force_eval:
                            method:     FIST
                            mm:
                               forcefield:
                                          do_nonbonded:
                                          ei_scale14:   1.0
                                          ignore_missing_critical_params:
                                          shift_cutoff:         .TRUE.
                                          spline:
                                                 r0_nb:         0.25
                                          vdw_scale14:  1.0
                               poisson:
                                       ewald:
                                             ewald_type:        NONE
                                       periodic:        NONE
                               print:
                                     ff_info low:
                                                 spline_data:   .FALSE.
                                                 spline_info:   .FALSE.
                            subsys:
                                   cell:
                                        abc:    [angstrom] 50.0 50.0 50.0
                                        periodic:       NONE
                                   topology:
                                            center_coordinates:
                                                               center_point:    0.0 0.0 0.0
                                            coord_file_format:  OFF
                 global:
                        print_level:    low
                        project:        cp2k
                        run_type:       geometry_optimization
                 motion:
                        geo_opt:
                                max_iter:       500
                                optimizer:      lbfgs
                                type:   minimization
         dftb:
              dftb:
                   resourcesdir:        DFTB.org/3ob-3-1
              task:
                   runtype:     GO
         orca:
              basis:
                    basis:      sto_sz
              method:
                     functional:        lda
                     method:    dft
                     runtyp:    opt

In the case of adf the above keywords result in the following input file for ADF package:

[34]:
adf_job = adf(merged_settings, acetonitrile, job_name='adf_acetonitrile')
result = run(adf_job, path="tutorial_results",
             folder="run_two", cache="tutorial_cache.json")
print(open('tutorial_results/run_two/adf_acetonitrile/adf_acetonitrile.in').read())
[09:14:04] PLAMS working folder: /home/lars/workspace/qmflows/jupyterNotebooks/tutorial_results/run_two
╭─(running jobs)
│ Running adf adf_acetonitrile...
()╰─(success)
atoms
      1         C      2.419290      0.606560      0.000000
      2         C      1.671470      1.829570      0.000000
      3         N      1.065290      2.809960      0.000000
      4         H      2.000000      0.000000      1.000000
      5         H      2.000000      0.000000     -1.000000
      6         H      3.600000      0.800000      0.000000
end

basis
  core large
  type DZP
end

constraints
  atom 2
  atom 3
  atom 4
end

geometry
  optim cartesian
end

integration
  accint 6.0
end

scf
  converge 1e-06
  iterations 100
end

xc
  lda
end

end input

Combining multiple jobs

Multiple jobs can be combined, while calling the run function only once. The script below combines components outlined above:

[35]:
from scm.plams import Molecule
from qmflows import dftb, adf, templates, run, Settings

acetonitrile = Molecule("files/acetonitrile.xyz")

dftb_opt = dftb(templates.geometry, acetonitrile, job_name="dftb_opt")

s = Settings()
s.basis = "DZP"
s.specific.adf.basis.core = "large"
adf_single = adf(templates.singlepoint.overlay(s), dftb_opt.molecule, job_name="adf_single")

adf_result = run(adf_single, path="tutorial_results", folder="workflow", cache="tutorial_cache.json")
print(adf_result.molecule)
print(adf_result.energy)
[09:15:08] PLAMS working folder: /home/lars/workspace/qmflows/jupyterNotebooks/tutorial_results/workflow
╭─(running jobs)
│ Running dftb dftb_opt...
()│ Running adf adf_single...
()╰─(success)
  Atoms:
    1         C      0.000000      0.000000      0.656511
    2         C      0.000000      0.000000     -0.783088
    3         N      0.000000      0.000000     -1.946913
    4         H     -0.512221     -0.887193      1.022016
    5         H      1.024442      0.000000      1.022016
    6         H     -0.512221      0.887193      1.022016

-1.4094874734528888

In this case the second task adf_single reads the molecule optimized in the first job dftb_opt. Note that dftb_opt as well as dftb_opt.molecule are promised objects. When run is applied to the adf_single job, noodles builds a graph of dependencies and makes sure all the calculations required to obtain adf_result are performed.

All data related to the calculations, i.e. input files generated by QMflows and the resulting output files generated by the QM packages are stored in folders named after the job_names, residing inside a results folder:

[36]:
ls tutorial_results
run_one/  run_two/  workflow/
[37]:
ls tutorial_results/workflow
adf_single/  dftb_opt/  workflow.log
[38]:
ls tutorial_results/workflow/adf_single
adf_single.dill  adf_single.in   adf_single.run*  logfile  t21.H
adf_single.err   adf_single.out  adf_single.t21   t21.C    t21.N
[ ]:

Further Reading

Generic Keywords

Quantum chemistry packages use gaussian type orbitals (GTO) or slater type orbitals (STO) to perform the simulation. The packages use the same standards for the basis set and it will be really handy if we can defined a “generic” keyword for basis sets. Fortunately qmflows already offers such keyword that can be used among the packages that use the same basis standard,

[11]:
from qmflows import Settings
s = Settings()
s.basis = "DZP"

Internally QMflows will create a hierarchical structure representing basis DZP for the packages that can handle that basis set. Other generic keyowrds like: functional, inithess, etc. have been implemented.

The Following table describes some of the available generic keywords and the Packages where the funcionality is implemented

Generic Keyword

Packages

Description

basis

ADF, CP2K, Orca

Set the basis set

potential

CP2K

Set the pseudo potential

cell_angles

CP2K

Specified the angles of the unit cell

cell_parameters

CP2K

Specified the vectors of the unit cell

constraint

ADF, Orca

Constrain the distance, angle or dihedral angle for a set of molecules

freeze

ADF, Orca

Freeze a set of atoms indicated by their indexes or symbols

functional

ADF, CP2K

Set the DFT functional

inithess

ADF, Orca

Provide an initial Hessian matrix

optimize

ADF, DFTB, Orca

Perform a molecular optimization

selected_atoms

ADF, Orca

Optimize the given set of atoms while keeping the rest fixed

ts

ADF, DFTB, Orca

Carry out a transition state optimization

Note: QMflows Does not have chemical intuition and if you provide a meaningless keyword, like a wrong basis set it will not warm you.

Templates

As has been shown so far, Settings can be specified in two ways: generic or specific. Generic keywords represent input properties that are present in most simulation packages like a basis set while specific keywords resemble the input structure of a given package.

Generic and Specific Settings can express both simple and complex simulation inputs, but it would be nice if we can pre-defined a set of templates for the most common quantum chemistry simulations like: single point calculations, geometry optimizations, transition state optimization, frequency calculations, etc. qmflows already has a pre-defined set of templates containing some defaults that the user can modify for her/his own purpose. Templates are stored inside the qmflows.templates module and are load from JSON files. A JSON file is basically a nested dictionary that is translated to a Settings object by qmflows.

Below it is shown the defaults for single point calculation

[1]:
from qmflows import templates
templates.singlepoint
RDKit WARNING: [16:43:01] Enabling RDKit 2019.09.1 jupyter extensions
[1]:
specific:
         adf:
             basis:
                   type:        SZ
             numericalquality:  normal
             scf:
                 converge:      1e-6
                 iterations:    100
         ams:
             ams:
                 Task:  SinglePoint
         dftb:
              dftb:
                   resourcesdir:        DFTB.org/3ob-3-1
              task:
                   runtype:     SP
         cp2k:
              force_eval:
                         dft:
                             mgrid:
                                   cutoff:      400
                                   ngrids:      4
                             print:
                                   mo:
                                      add_last:         numeric
                                      each:
                                           qs_scf:      0
                                      eigenvalues:
                                      eigenvectors:
                                      filename:         ./mo.data
                                      ndigits:  36
                                      occupation_numbers:
                             qs:
                                method:         gpw
                             scf:
                                 eps_scf:       1e-06
                                 max_scf:       200
                                 scf_guess:     restart
                         subsys:
                                cell:
                                     periodic:  xyz
              global:
                     print_level:       low
                     project:   cp2k
                     run_type:  energy
         cp2k_mm:
                 force_eval:
                            method:     FIST
                            mm:
                               print:
                                     ff_info low:
                                                 spline_data:   .FALSE.
                                                 spline_info:   .FALSE.
                               forcefield:
                                          ei_scale14:   1.0
                                          vdw_scale14:  1.0
                                          ignore_missing_critical_params:
                                          do_nonbonded:
                                          shift_cutoff:         .TRUE.
                                          spline:
                                                 r0_nb:         0.25
                               poisson:
                                       periodic:        NONE
                                       ewald:
                                             ewald_type:        NONE
                            subsys:
                                   cell:
                                        abc:    [angstrom] 50.0 50.0 50.0
                                        periodic:       NONE
                                   topology:
                                            coord_file_format:  OFF
                                            center_coordinates:
                                                               center_point:    0.0 0.0 0.0
                 global:
                        print_level:    low
                        project:        cp2k
                        run_type:       energy
         orca:
              method:
                     method:    dft
                     functional:        lda
              basis:
                    basis:      sto_sz
_ipython_canary_method_should_not_exist_:

The question is then, how I can modify a template with my own changes?

Suppose you are perfoming a bunch of constrained DFT optimizations using ADF . You need first to define a basis set and the constrains.

[3]:
from qmflows import Settings
s = Settings()
# Basis
s.basis = "DZP"
s.specific.adf.basis.core = "Large"

# Constrain
s.freeze = [1, 2, 3]

We use two generic keywords: freeze to indicate a constrain and basis to provide the basis set. Also, we introduce an specific ADF keywords core = Large. Now you merge your Settings with the correspoding template to carry out molecular geometry optimizations, using a method called overlay.

[4]:
from qmflows import templates
inp = templates.geometry.overlay(s)

The overlay method takes as input a template containing a default set for different packages and also takes the arguments provided by the user, as shown schematically 3d207ec1e7884e409a216e837a7594e1

This overlay method merged the defaults for a given packages (ADF in this case) with the input supplied by the user, always given preference to the user input 77fd9adaef5d4fbd8aa2982472a3ce23

Below it is shown a combination of templates, generic and specific keywords to generate the input for a CP2K job

[2]:
from qmflows import templates

# Template
s = templates.singlepoint

# Generic keywords
s.cell_angles = [90.0, 90.0, 90.0]
s.cell_parameters=  38.0
s.basis = 'DZVP-MOLOPT-SR-GTH'
s.potential ='GTH-PBE'

# Specific Keywords
s.specific.cp2k.force_eval.dft.scf.max_scf  = 100
s.specific.cp2k.force_eval.subsys.cell.periodic = 'None'

print(s)
specific:
         adf:
             basis:
                   type:        SZ
             numericalquality:  normal
             scf:
                 converge:      1e-6
                 iterations:    100
         ams:
             ams:
                 Task:  SinglePoint
         dftb:
              dftb:
                   resourcesdir:        DFTB.org/3ob-3-1
              task:
                   runtype:     SP
         cp2k:
              force_eval:
                         dft:
                             mgrid:
                                   cutoff:      400
                                   ngrids:      4
                             print:
                                   mo:
                                      add_last:         numeric
                                      each:
                                           qs_scf:      0
                                      eigenvalues:
                                      eigenvectors:
                                      filename:         ./mo.data
                                      ndigits:  36
                                      occupation_numbers:
                             qs:
                                method:         gpw
                             scf:
                                 eps_scf:       1e-06
                                 max_scf:       100
                                 scf_guess:     restart
                         subsys:
                                cell:
                                     periodic:  None
              global:
                     print_level:       low
                     project:   cp2k
                     run_type:  energy
         cp2k_mm:
                 force_eval:
                            method:     FIST
                            mm:
                               print:
                                     ff_info low:
                                                 spline_data:   .FALSE.
                                                 spline_info:   .FALSE.
                               forcefield:
                                          ei_scale14:   1.0
                                          vdw_scale14:  1.0
                                          ignore_missing_critical_params:
                                          do_nonbonded:
                                          shift_cutoff:         .TRUE.
                                          spline:
                                                 r0_nb:         0.25
                               poisson:
                                       periodic:        NONE
                                       ewald:
                                             ewald_type:        NONE
                            subsys:
                                   cell:
                                        abc:    [angstrom] 50.0 50.0 50.0
                                        periodic:       NONE
                                   topology:
                                            coord_file_format:  OFF
                                            center_coordinates:
                                                               center_point:    0.0 0.0 0.0
                 global:
                        print_level:    low
                        project:        cp2k
                        run_type:       energy
         orca:
              method:
                     method:    dft
                     functional:        lda
              basis:
                    basis:      sto_sz
_ipython_canary_method_should_not_exist_:
cell_angles:    [90.0, 90.0, 90.0]
cell_parameters:        38.0
basis:  DZVP-MOLOPT-SR-GTH
potential:      GTH-PBE

Molecule

The next component to carry out a simulation is a molecular geometry. qmflows offers a convinient way to read Molecular geometries using the Plams library in several formats like: xyz , pdb, mol, etc.

[6]:
from scm.plams import Molecule
acetonitrile = Molecule("../test/test_files/acetonitrile.xyz")
print(acetonitrile)
  Atoms:
    1         C      2.366998      0.579794     -0.000000
    2         C      1.660642      1.834189      0.000000
    3         N      1.089031      2.847969      0.000000
    4         H      2.100157      0.010030      0.887206
    5         H      2.100157      0.010030     -0.887206
    6         H      3.439065      0.764079     -0.000000

You can also create the molecule one atom at a time

[7]:
from scm.plams import (Atom, Molecule)
m  = Molecule()
m.add_atom(Atom(symbol='C', coords=(2.41929, 0.60656 , 0.0)))
m.add_atom(Atom(symbol='C', coords=(1.67147,  1.82957, 0.0)))
m.add_atom(Atom(symbol='N', coords=(1.06529, 2.80996, 0.0)))
m.add_atom(Atom(symbol='H',  coords=(2.0, 0.0, 1.0)))
m.add_atom(Atom(symbol='H',  coords=(2.0, 0.0, -1.0)))
m.add_atom(Atom(symbol='H',  coords=(3.6, 0.8, 0.0)))
print(m)
  Atoms:
    1         C      2.419290      0.606560      0.000000
    2         C      1.671470      1.829570      0.000000
    3         N      1.065290      2.809960      0.000000
    4         H      2.000000      0.000000      1.000000
    5         H      2.000000      0.000000     -1.000000
    6         H      3.600000      0.800000      0.000000

QMflows Can also handle smiles as shown below

[8]:
from scm.plams import from_smiles

# String representing the smile
smile = 'C1CC2(CCCCC2)C=C1'
#Molecule creation
mol = from_smiles(smile)
print(mol)
  Atoms:
    1         C     -2.532311      0.731744      0.709758
    2         C     -1.368476      0.367070      1.231109
    3         C     -0.433052     -0.045396      0.145886
    4         C     -1.425943     -0.476305     -0.942143
    5         C     -2.482346      0.596323     -0.777123
    6         C      0.340730     -1.284527      0.522195
    7         C      1.554856     -1.363207     -0.361120
    8         C      2.541783     -0.246962     -0.011749
    9         C      1.783591      1.008298      0.379075
   10         C      0.441973      1.070128     -0.323205
   11         H     -3.396188      1.077206      1.265628
   12         H     -1.110278      0.360527      2.283834
   13         H     -1.876171     -1.425792     -0.595758
   14         H     -0.977017     -0.510609     -1.929188
   15         H     -2.195853      1.551129     -1.261514
   16         H     -3.436929      0.219830     -1.185449
   17         H      0.683363     -1.246115      1.584294
   18         H     -0.240309     -2.217667      0.374878
   19         H      1.304291     -1.253517     -1.418776
   20         H      2.005028     -2.361315     -0.225630
   21         H      3.183193     -0.002291     -0.890513
   22         H      3.139644     -0.569212      0.868506
   23         H      1.573985      0.978425      1.469548
   24         H      2.392337      1.915661      0.175061
   25         H     -0.024235      2.038641     -0.029064
   26         H      0.554336      1.087931     -1.429810
  Bonds:
   (1)--2.0--(2)
   (2)--1.0--(3)
   (3)--1.0--(4)
   (4)--1.0--(5)
   (3)--1.0--(6)
   (6)--1.0--(7)
   (7)--1.0--(8)
   (8)--1.0--(9)
   (9)--1.0--(10)
   (5)--1.0--(1)
   (10)--1.0--(3)
   (1)--1.0--(11)
   (2)--1.0--(12)
   (4)--1.0--(13)
   (4)--1.0--(14)
   (5)--1.0--(15)
   (5)--1.0--(16)
   (6)--1.0--(17)
   (6)--1.0--(18)
   (7)--1.0--(19)
   (7)--1.0--(20)
   (8)--1.0--(21)
   (8)--1.0--(22)
   (9)--1.0--(23)
   (9)--1.0--(24)
   (10)--1.0--(25)
   (10)--1.0--(26)

The Molecule class has an extensive functionally to carry out molecular manipulations, for a comprenhesive disccusion about it have a look at the molecule documentation. Also the module scm.plams.interfaces.molecule.molkit contains an extensive functionality to apply transformation over a molecule using the RDKit library.

Runinng a quantum mechanics simulation

We now have our components to perform a calculation: Settings and Molecule. We can now invoke a quantum chemistry package to perform the computation,

[9]:
from qmflows import adf
optmized_mol_adf = adf(inp, acetonitrile, job_name='acetonitrile_opt')

the previous code snippet does not execute the code immediatly, instead the simulation is started when the user invokes the run function, as shown below

from plams import Molecule
from qmflows import (adf, run, Settings, templates)

# Settings
s = templates.geometry
s.basis = "DZP"
s.specific.adf.basis.core = "Large"
s.freeze = [1, 2, 3]

# molecule
from plams import Molecule
acetonitrile = Molecule("acetonitrile.xyz")

# Job
optimized_mol_adf = adf(s, acetonitrile, job_name='acetonitrile_opt')

# run the  job
result = run(optimized_mol_adf.molecule, folder='acetonitrile')

you can run the previous script by saving it in a file called acetonitrile_opt.py and typing the following command in your console:

(qmflows)[user@int1 ~]$ python acetonitrile_opt.py

you will then see in your current work directory something similar to the following

(qmflows)[user@int1 ~]$ ls
acetonitrile      acetonitrile_opt.py   cache.json   acetonitrile.xyz

acetonitrile is the folder containing the output from the quantum package call (ADF in this case). The cache.json file contains all the information required to perform a restart, as we will explore below. Inside the acetonitrile you can find the input/output files resulting from the simulation

(qmflows)[user@int1 ~]$ ls acetonitrile
acetonitrile.log  acetonitrile_opt

(qmflows)[user@int1 ~]$ ls acetonitrile/acetonitrile_opt
 acetonitrile_opt.dill  acetonitrile_opt.out  logfile  t21.N
 acetonitrile_opt.err   acetonitrile_opt.run  t21.C
 acetonitrile_opt.in    acetonitrile_opt.t21  t21.H

Extracting Properties

In general, properties are extracted using the standard Object.attribute notation in python, as shown below.

result = optmized_mol_adf.molecule

Some of the available properties are shown in the following table,

Property

type

Description

dipole

Double

Molecular dipole mopment

energy

Double

Total energy

enthalpy

Double

Enthalpy

gradient

Numpy array

First derivatives of the energy

hessian

Numpy array

Second derivative of the energy

molecule

Molecule

Object representing a physical entity

runtime

Double

Time spent in the simulation

On the background QMflows has a mechanism to read the properties from the output files and make them available inside Python.

Communicating different packages

We can use the previous optimized geometry for further calculations using for instance another package like Orca to run a frequencies calculation,

[20]:
from qmflows import orca
s2 = Settings()
s2.specific.orca.main = "freq"
s2.specific.orca.basis.basis = 'sto_sz'
s2.specific.orca.method.functional = 'lda'
s2.specific.orca.method.method = 'dft'

job_freq = orca(s2, optmized_mol_adf)

frequencies = job_freq.frequencies

The whole script is

from qmflows import (adf, orca, run, templates, Settings)
from plams import Molecule
import plams

def main():
    s = templates.geometry
    s.basis = "DZP"
    s.specific.adf.basis.core = "large"

    acetonitrile = Molecule("files/acetonitrile.xyz")
    job = adf(inp, acetonitrile)
    optmized_mol_adf = job.molecule

    s2 = Settings()
    s2.specific.orca.main = "freq"
    s2.specific.orca.basis.basis = 'sto_sz'
    s2.specific.orca.method.functional = 'lda'
    s2.specific.orca.method.method = 'dft'

    job_freq = orca(s2, optmized_mol_adf)
    frequencies = job_freq.frequencies

    print(run(frequencies))

Once you run the script an input file for the ADF and Orca jobs are created. The ADF input looks like

Atoms
      1         C      2.419290      0.606560      0.000000
      2         C      1.671470      1.829570      0.000000
      3         N      1.065290      2.809960      0.000000
      4         H      2.000000      0.000000      1.000000
      5         H      2.000000      0.000000     -1.000000
      6         H      3.600000      0.800000      0.000000
End

Basis
  Type DZP
End

Constraints
  Atom 1
  Atom 2
  Atom 3
End

Geometry
  Optim cartesian
End

Integration
  Accint 6.0
End

Scf
  Converge 1e-06
  Iterations 100
End

Running in a supercomputer

Running in Cartesius or Bazis through the Slurm resource manager can be done using and script like

#!/bin/bash
#SBATCH -t 00:10:00
#SBATCH -N 1
#SBATCH -n 8

module load orca
module load adf/2016.102

source activate qmflows
python optimization_ADF_freq_ORCA.py

The Slurm output looks like:

[21]:
 """
 load orca/3.0.3 (PATH)
 discarding /home/user/anaconda3/envs/qmflows/bin from PATH
 prepending /home/user/anaconda3/envs/qmflows/bin to PATH
 [11:17:59] PLAMS working folder: /nfs/home/user/orca/Opt/example/plams.23412
 +-(running jobs)
 | Running adf ...
 [11:17:59] Job ADFjob started
 [11:18:18] Job ADFjob finished with status 'successful'
 [11:18:18] Job ORCAjob started
 [11:18:26] Job ORCAjob finished with status 'successful'

 [    0.           0.           0.           0.           0.           0.
  -360.547382  -360.14986    953.943089   954.3062    1049.2305
  1385.756519  1399.961717  1399.979552  2602.599662  3080.45671
  3175.710785  3177.612274]
"""
[21]:
"\nload orca/3.0.3 (PATH)\ndiscarding /home/user/anaconda3/envs/qmflows/bin from PATH\nprepending /home/user/anaconda3/envs/qmflows/bin to PATH\n[11:17:59] PLAMS working folder: /nfs/home/user/orca/Opt/example/plams.23412\n+-(running jobs)\n| Running adf ...\n[11:17:59] Job ADFjob started\n[11:18:18] Job ADFjob finished with status 'successful' \n[11:18:18] Job ORCAjob started\n[11:18:26] Job ORCAjob finished with status 'successful' \n\n[    0.           0.           0.           0.           0.           0.\n -360.547382  -360.14986    953.943089   954.3062    1049.2305\n 1385.756519  1399.961717  1399.979552  2602.599662  3080.45671\n 3175.710785  3177.612274]\n"

How the run function works?

A little discussion about graphs

qmflows is meant to be used for both workflow generation and execution. When you write a python script representing a workflow you are explicitly declaring set of computations and their dependencies. For instance the following workflow represent ADF and Orca computations of the aforementioned example. In this graph the octagons represent quantum simulation using a package, while the ovals represent both user input or data extracted from a simulation. Finally, the arrows (called edges) represent the dependencies between all these objects. b82165dc115c4c118249c699f776331a

QMflows automatically identify the dependencies between computations and run them in the correct order (if possible in parallel).

Restarting a simulation

If you are running many computationally expensive calculations in a supercomputer, it can happen that the computations take more time than that allowed by the resource manager in your supercomputer and the workflows gets cancel. But do not worry, you do not need to re-run all the computations. Fortunately, QMflows offers a mechanism to restart the workflow computations.

When running a workflow you will see that QMflows creates a set of files called cache. These files contain the information about the workflow and its calculation. In order to restart a workflow you only need to relaunch it, that’s it!

Advanced Examples

Conditional Workflows

from noodles import gather
from qmflows import dftb, adf, orca, run, Settings, templates, molkit, find_first_job

# This examples illustrates the possibility to use different packages interchangeably.
# Analytical frequencies are not available for B3LYP in ADF
# This workflow captures the resulting error and submits the same job to ORCA.

# Define the condition for a successful calculation
def is_successful(result):
    return result.status not in ["failed", "crashed"]

# Generate water molecule
water = molkit.from_smiles('[OH2]', forcefield='mmff')

# Pre-optimize the water molecule
opt_water = dftb(
     templates.geometry, water, job_name="dftb_geometry")

jobs = []

# Generate freq jobs for 3 functionals
for functional in ['pbe', 'b3lyp', 'blyp']:
    s=Settings()
    s.basis = 'DZ'
    s.functional = functional
    # Try to perform the jobs with adf or orca
    # take result from  first successful calculation
    freqjob = find_first_job(
          is_successful, [adf, orca], templates.freq.overlay(s),
          opt_water.molecule, job_name=functional)
    jobs.append(freqjob)

# Run workflow
results = run(gather(*jobs), n_processes=1)

After running the above script you have a table like

pbe     1533.267   3676.165   3817.097
b3lyp   1515.799   3670.390   3825.813
blyp    1529.691   3655.573   3794.110

Non-adiabatic couplings

qmflows-namd is a package based on QMflows to compute the Non-adiabatic couplings for large system involving thr use of QMflows, Cython and Numpy.

Exception Handling

Suppose you have a set of non-dependent calculations, for example single point calculations coming from a molecular dynamic trajectory, as shown in the figure below 7e1e1cd530d54f61ba494f3fdb852417

If one of the single point calculations fails, the rest of the point in the workflow will keep on running and the failed job will return a None value for the requested property.

If the single point calculation would be the dependency of another quantum calculation then the computation will crash.

Library Documentation

For a more detailed description of QMFlows read the documentation

Settings

Settings is a dict subclass implemented in PLAMS and modified in Qmflows. This class represents the data in a hierarchical tree-like structure. for example:

>>> from qmflows import Settings, templates

>>> s = Settings()  # (1)

# generic keyword
>>> s.basis = "DZP"  #  (2)

# "specific" allows the user to apply specific keywords for a package
>>> s.specific.adf.basis.core = "large"  # (3)

>>> input_settings = templates.singlepoint.overlay(s)  # (4)

The above code snippet shows how to create a Settings instance object in (1), then in (2) the generic keyword basis declares that the “DZP” should be used together with the large keyword of ADF as shown at (3). Finally in line (4) the user’s keywords are merged with the defaults resultin in a input like:

basis
    Core large
    Type DZP
end

integration
    accint 4.0
end

scf
    converge 1e-06
    iterations 100
wnd

xc
    lda
end

API

class qmflows.Settings(*args, **kwargs)[source]

A subclass of plams.Settings.

The difference with respect to plams’ Settings are:

  • settings['a.b'] is equivalent to settings['a']['b'] = settings.a.b

__deepcopy__(_)[source]

Implement copy.deepcopy(self).

Serves as an alias for Settings.copy().

__delitem__(name)[source]

Implement del self[name].

__getitem__(name)[source]

Implement self[name].

__setitem__(name, value)[source]

Implement self[name] = value.

Like its counterpart in dict but passed dictionaries are converted into instances of type(self).

copy()[source]

Create a deep(-ish) copy of this instance.

All nested settings instances embedded within self are copied recursively; all other objects set without copying.

overlay(other)[source]

Return new instance of Settings that is a copy of this instance updated with other.

Templates

The input generations consist of two parts: chosing a template (see templates) for the kind of calculation to perform and adding some settings to that template. Notice that the user can either pick a specific package template or provides only generic keywords.

YAML

The YAML markdown format is used together with the yaml module to implement the mechanism to load the templates.

For Example, the default parameter for a single point calculation for several package is:

specific:
  adf:
     basis:
       type: SZ
     numericalquality:
       normal
     scf:
       converge: 1e-6
       iterations: 100
  ams:
     ams:
       Task: SinglePoint
  dftb:
    dftb:
      resourcesdir:
        "DFTB.org/3ob-3-1"
    task:
      runtype: SP

  cp2k:
    force_eval:
      dft:
        mgrid:
          cutoff: 400
          ngrids: 4
        print:
          mo:
            add_last: numeric
            each:
              qs_scf: 0
            eigenvalues: ""
            eigenvectors: ""
            filename: "./mo.data"
            ndigits: 36
            occupation_numbers: ""
        qs:
            method: gpw
        scf:
            eps_scf: 1e-06
            max_scf: 200
            scf_guess: restart

      subsys:
        cell:
          periodic: xyz
    global:
      print_level: low
      project: cp2k
      run_type: energy

  orca:
    method:
        method: dft
        functional: lda
    basis:
        basis: sto_sz

Dictionaries

While templates are used as defaults, the YAML files stored in the dictionaries folder are use for two mainly two purposes: translation of the generic keywords and properties extraction.

to translate the generic keywords provided by the user to specific keywords used for each package. For instance these files are used by the Package class.

Packages

The base class Package is the library core, it provides the general scaffold to call a quantum code. On top of this infrastructure it has been created several subclasses that contain the specific details for each quantum code. The available interfaces to quantum codes are:

This class must not be call directly, instead the correspoding class for the quantum package should be called or in case that there is not an interface to your quantum code, you can make a new subclass that implement the following method:

  • run_job – This methods takes a Settings object a molecule and call a function to create the input automatically and takes cares of the bookkeeping associated with creating new folders, calling the package and retrieving and Object-result.

Instead of implementing all the runners and the bookkeeping functions ourselves, we use the plams library.

For example to carry out a simulation using ADF we call plams as follows:

>>> from scm import plams

>>> mol = plams.Molecule(...)
>>> adf_settings = plams.Settings(...)

>>> result = plams.ADFJob(molecule=mol, settings=adf_settings).run()

Running a workflow

A workflow in QMFlows consist of a set of computations and the dependencies between them, explicitly declared by the user in the python script. This dependecies and the relation between them form an graph (specifically an acyclic direct graph) that represent these relations.

QMFlows Builds the aforemention graph in order to realize the workflow evaluation order. For instance the figure below represent a simulation where firstly a molecular geometry optimization is carried out using the ADF package and some user defined Settings for the ADF simulation package. Subsequently, using the optimized molecular geometry from the previous step and another Settings for an orca simulation a job to compute the molecular frequencies is carried out.

_images/simple_graph1.png

A python script corresponding with this graph can be

>>> from plams import Molecule
>>> from qmflows import (adf, orca, run, Settings)

>>> inp = Settings(...)
>>> acetonitrile = Molecule(...)

# ADF optimization
>>> optmized_mol_adf = adf(inp, acetonitrile, job_name='acetonitrile_opt')

# Orca Settings definition
>>> s2 = Settings()
>>> s2.specific.orca.main = "freq"
>>> s2.specific.orca.basis.basis = 'sto_sz'
>>> s2.specific.orca.method.functional = 'lda'
>>> s2.specific.orca.method.method = 'dft'

# Orca Frequencies
>>> job_freq = orca(s2, optmized_mol_adf)

# Extract the frequencies from the Orca job
>>> frequencies = job_freq.frequencies

# Run the graph
>>> result = run(frequencies)
>>> print(result)

Up to the invocation of the run() function none of the computations have been executed, it is the run() function which builds and executes the dependencies. Since QMFlows needs to figure out all the dependecies in the script, the run() function takes as argument last dependency (or inner most dependy), which in this case are the frequencies. The reason behind this, is that from the last dependency it is possible to retrace all the dependecies.

QMFlows uses the noodles library under the hook to takes care of the construction and execution of the dependecy graph.

Extracting numerical properties from output files

Quantum packages simulations generate output file in different formats. For examples the SCM simulation suite (ADF and DFTB in QMFlows) generate binary outputs, while other packages like CP2K and ORCA generate ascii text files.

QMFlows abstract away all the different commmunication protocols with the different output formats, allowing the user to extract the desire property by using the convention:

>>> result = job.property

where job is the simulation perform with a given package and property is the numerical value of interest (scalar or array).

The QMFlows implementation of the aforemention mechanism search in the YAML files located at qmflows/data/dictionaries/ for instructions about how to read that given property from the output file. Nevertheless, Not all the properties for a given pacakge are implemented. If the property of your interest is not available you can request it in the Qmflows issues page.

Parsers

Internally QMFlows uses different mechanism to extract different properties from the output files. In the case of the ADF and DFTB packages, QMFlows take advantages of the python interface to kftools files developed by the SCM. In the case of XML output, QMFlows direcltly uses the python built-in xml reader. For the output files in text format Qmflows uses a mixuture of awk and parsers.

Parsers are a robust alternative to regular expressions, parsers are modular and reusable, while re tends to be abstruse and difficult to reuse. A parser is a function that decomposes a string (or binary) into its syntactic components using some predefined rules or grammar. The library pyparsing offers all the functionality to parse strings, some detail explanation about the library can be found at docs.

HDF5

HDF5 is a data model to store and represent complex numerical data in a hierarchical way. HDF5 is extremely optimized to perform fast I/O operations in large data set and it is implemented as a library with APIs to Python, C, C++, etc. The python interface is called pyh5 and it was designed to run along with Numpy as Scipy.

PackageWrapper

A module which adds the PackageWrapper class.

The herein implemented class serves as a wrapper around the qmflows Package class, taking a single plams.Job type as argument and, upon calling pkg = PackageWrapper(...); pkg() an appropiate instance of Package subclas instance is called.

For example, passing plams.ADFJob will automatically call adf, plams.Cp2kJob will call cp2k, etc.

When no appropiate Package is found, let’s say after passing the MyFancyJob type, the PackageWrapper class will still run the job as usual and return the matching ResultWrapper object.

There are however three caveats:

  1. No generic keywords are implemented for such jobs.

  2. Specialized warning message will not be available.

  3. The availability of property-extraction methods is limited.

The therein embedded results can be still extracted by calling the plams.Results methods appropiate to the passed job type, e.g. plams.AMSResults.

For example:

>>> from scm.plams import AMSJob, Molecule, Settings

>>> from qmflows import run, PackageWrapper
>>> from qmflows.packages import ResultWrapper

>>> mol = Molecule(...)  
>>> settings = Settings(...)  

>>> pkg = PackageWrapper(AMSJob)
>>> job = pkg(settings, mol, name='amsjob')
>>> result: ResultWrapper = run(job)

>>> energy = result.get_energy()  
>>> mol = result.get_molecule()  
>>> freq = result.get_frequencies()  

Index

PackageWrapper(job_type[, name])

A Package subclass for processing arbitrary plams.Job types.

PackageWrapper.__init__(job_type[, name])

Initialize this instance.

PackageWrapper.__call__(settings, mol[, ...])

(scheduled) If possible, call __call__() of the Package instance appropiate to PackageWrapper.job_type.

PackageWrapper.run_job(settings, mol[, ...])

Run the job and pass the resulting plams.Results object to ResultWrapper.

PackageWrapper.handle_special_keywords(...)

Method not implemented.

ResultWrapper(settings, molecule, job_name)

The matching Result subclass for PackageWrapper.

JOB_MAP

Placeholder docstring for sphinx.

API

class qmflows.packages.PackageWrapper(job_type, name=None)[source]

A Package subclass for processing arbitrary plams.Job types.

Will automatically convert the passed Job type into the appropiate Package instance upon calling PackageWrapper.__call__().

Examples

>>> from scm.plams import ADFJob, AMSJob

>>> from qmflows import PackageWrapper, run
>>> from qmflows.packages import ResultWrapper, ADF_Result

# Start of with two PackageWrapper instances
>>> pkg_adf = PackageWrapper(ADFJob)
>>> pkg_ams = PackageWrapper(AMSJob)

# End up with two different Result instances
>>> result_adf: ADF_Result = run(pkg_adf(...), ...)  
>>> result_ams: ResultWrapper = run(pkg_ams(...), ...)  
job_type

The to-be executed plams.Job type. Will be automatically translated, when possible, into to the appropiate Package instance upon calling PackageWrapper.__call__(). If not, default to the more bare-bones implementation within this class and the matching ResultWrapper instance.

Type:

type[plams.Job]

See also

JOB_MAP

A dictionary mapping PLAMS Job types to appropiate QMFlows Package instances.

PackageWrapper.__init__(job_type, name=None)[source]

Initialize this instance.

Parameters:

job_type (type[plams.Job]) – The to-be executed plams.Job type. Will be automatically translated, when possible, into to the appropiate Package instance upon calling PackageWrapper.__call__(). If not, default to the more bare-bones implementation within this class and the matching ResultWrapper instance. See also PackageWrapper.job_type.

PackageWrapper.__call__(settings, mol, job_name='', **kwargs)[source]

(scheduled) If possible, call __call__() of the Package instance appropiate to PackageWrapper.job_type.

If not, default to the base __call__() method.

PackageWrapper.run_job(settings, mol, job_name='job', work_dir=None, validate_output=True, **kwargs)[source]

Run the job and pass the resulting plams.Results object to ResultWrapper.

static PackageWrapper.handle_special_keywords(settings, key, value, mol)[source]

Method not implemented.

class qmflows.packages.ResultWrapper(settings, molecule, job_name, dill_path=None, plams_dir=None, work_dir=None, status='successful', warnings=None)[source]

The matching Result subclass for PackageWrapper.

qmflows.packages.JOB_MAP : dict[type[plams.Job], Package]

A dictionary mapping PLAMS Job types to appropiate QMFlows Package instance.

>>> from typing import Dict, Type

>>> from qmflows.packages import Package, cp2k, adf, dftb, orca
>>> from scm import plams

>>> plams.Job = plams.core.basejob.Job
>>> plams.ORCAJob = plams.interfaces.thirdparty.orca.ORCAJob

>>> JOB_MAP: Dict[Type[plams.Job], Package] = {
...     plams.Cp2kJob: cp2k,
...     plams.ADFJob: adf,
...     plams.DFTBJob: dftb,
...     plams.ORCAJob: orca
... }

qmflows.packages

The Package (sub-)classes of QMFlows.

The Package Class

Package(pkg_name)

Package is the base class to handle the invocation to different quantum package.

Package.__init__(pkg_name)

Initialize a Package instance.

Package.__call__(settings, mol[, job_name, ...])

(scheduled) Perform a job with the package specified by Package.pkg_name.

Package.prerun(settings, mol, **kwargs)

Run a set of tasks before running the actual job.

Package.run_job(settings, mol[, job_name, ...])

Abstract method; should be implemented by the child class.

Package.postrun(result[, output_warnings, ...])

Run a set of tasks after running the actual job.

Package.generic2specific(settings[, mol])

Traverse settings and convert generic into package specific keys.

Package.handle_special_keywords(settings, ...)

Abstract method; should be implemented by the child class.

ADF([pkg_name])

Package subclass for ADF.

DFTB([pkg_name])

Package subclass for DFTB.

CP2K([pkg_name])

A Package subclass for running CP2K Jobs.

CP2KMM([pkg_name])

A Package subclass for running CP2K Jobs for classical forcefield calculations.

ORCA([pkg_name])

A class for preparing the input for running a Orca job using both PLAMS and templates.

PackageWrapper(job_type[, name])

A Package subclass for processing arbitrary plams.Job types.

Package Instances

adf(self, settings, mol[, job_name, ...])

(scheduled) Perform a job with the package specified by Package.pkg_name.

dftb(self, settings, mol[, job_name, ...])

(scheduled) Perform a job with the package specified by Package.pkg_name.

cp2k(self, settings, mol[, job_name, ...])

(scheduled) Perform a job with the package specified by Package.pkg_name.

cp2k_mm(self, settings, mol[, job_name, ...])

(scheduled) Perform a job with the package specified by Package.pkg_name.

orca(self, settings, mol[, job_name, ...])

(scheduled) Perform a job with the package specified by Package.pkg_name.

The Result Class

Result(settings, molecule, job_name[, ...])

Class containing the results associated with a quantum chemistry simulation.

Result.__init__(settings, molecule, job_name)

Initialize a Result instance.

Result.get_property(prop)

Look for the optional arguments to parse a property, which are stored in the properties dictionary.

Result.results

Getter for Result.results.

ADF_Result(settings, molecule, job_name[, ...])

Class providing access to PLAMS ADFJob result results.

DFTB_Result(settings, molecule, job_name[, ...])

Class providing access to PLAMS DFTBJob result results.

CP2K_Result(settings, molecule, job_name[, ...])

Class providing access to CP2K result.

CP2KMM_Result(settings, molecule, job_name)

A class providing access to CP2KMM result.

ORCA_Result(settings, molecule, job_name[, ...])

Class providing access to PLAMS OrcaJob results.

ResultWrapper(settings, molecule, job_name)

The matching Result subclass for PackageWrapper.

API

qmflows.packages.run(job, runner=None, path=None, folder=None, load_jobs=False, **kwargs)[source]

Pickup a runner and initialize it.

Serves as a wrapper around noodles.run_parallel().

Parameters:
  • job (noodles.PromisedObject) – The computation to run as constructed by Package.__call__().

  • runner (str, optional) – The job runner. Note that this value should be left at None.

  • path (str or PathLike, optional) – The path where the PLAMS working directory will be created. Will default to the current working directory if None.

  • folder (str or PathLike, optional) – The name of the new PLAMS working directory. Will default to "plams_workdir" if None.

  • load_jobs (bool) – Load all pre-existing Jobs (contained within the working directory) into memory. Note that this can be quite slow if a large number of pre-existing jobs is present.

  • **kwargs (Any) – Further keyword arguments to-be passed to call_default().

Returns:

A new Result instance. The exact type will depend on job.

Return type:

Result

See also

noodles.run_parallel()

Run a workflow in parallel threads, storing results in a Sqlite3 database.


class qmflows.packages.Package(pkg_name)[source]

Package is the base class to handle the invocation to different quantum package.

The only relevant (instance) attribute of this class is Package.pkg_name which is a string representing the quantum package name that is going to be used to carry out the compuation.

The life-cycle of Package consists of 5 general steps:

  1. Initializing an instance: Package.__init__().

  2. Starting the job: Package.__call__(). This method handles the task distribution between the instance’s various methods.

  3. Converting all generic into specific settings: Package.generic2specific().

  4. Running the actual plams.Job (including pre- and post-processing): Package.run_job().

  5. Returning the final Result instance at the end of Package.__call__().

generic_mapping = NotImplemented

A class variable with the name of the generic .yaml file. Should be set when creating a subclass.

pkg_name

An instance variable with the name of the respective quantum chemical package.

result_type = NotImplemented

A class variable pointing to the Package-specific Result class. Should be set when creating a subclass.

Package.__init__(pkg_name)[source]

Initialize a Package instance.

Parameters:

pkg_name (str) – The name of the respective quantum chemical package. See Package.pkg_name.

Package.__call__(settings, mol, job_name='', validate_output=True, **kwargs)[source]

(scheduled) Perform a job with the package specified by Package.pkg_name.

Parameters:
  • settings (qmflows.Settings) – The user settings.

  • mol (plams.Molecule or rdkit.Mol) – A PLAMS or RDKit molecule to-be passed to the calculation.

  • job_name (str) – The name of the job.

  • validate_output (bool) – If True, perform a package-specific validation of the output files’ content. Only relevant if the particular Package subclass has actually implemented output validation.

  • **kwargs (Any) – Further keyword arguments to-be passed to Package.prerun(), Package.run_job() and Package.post_run().

Returns:

A new Result instance.

Return type:

Result

Package.prerun(settings, mol, **kwargs)[source]

Run a set of tasks before running the actual job.

Parameters:
  • settings (qmflows.Settings) – Settings provided by the user. Note that these settings can still contain generic keywords.

  • mol (plams.Molecule, optional) – A PLAMS molecule to-be passed to the calculation.

  • **kwargs (Any) – Further keyword arguments to-be passed to Package.run_job().

See also

Package.run_job()

A method which handles the running of the actual plams.Job.

abstract classmethod Package.run_job(settings, mol, job_name='job', work_dir=None, validate_output=False, **kwargs)[source]

Abstract method; should be implemented by the child class.

A method which handles the running of the actual plams.Job.

Parameters:
  • settings (qmflows.Settings, optional) – User-provided Settings as processed by Package.generic2specific().

  • mol (plams.Molecule) – A PLAMS molecule to-be passed to the calculation.

  • job_name (str) – The name of the job.

  • workdir (str or PathLike, optional) – The path+folder name of the PLAMS working directory.

  • validate_output (bool) – If True, perform a package-specific validation of the output files’ content. Only relevant if the particular Package subclass has actually implemented output validation.

  • **kwargs (Any) – Further keyword arguments.

Returns:

A new Result instance.

Return type:

Result

Package.postrun(result, output_warnings=None, settings=None, mol=None, **kwargs)[source]

Run a set of tasks after running the actual job.

Parameters:

See also

Package.run_job()

A method which handles the running of the actual plams.Job.

Package.generic2specific(settings, mol=None)[source]

Traverse settings and convert generic into package specific keys.

Traverse all the key, value pairs of the settings, translating the generic keys into package specific keys as defined in the specific dictionary. If one key is not in the specific dictionary an error is raised. These new specific settings take preference over existing specific settings.

Parameters:
Returns:

A new settings instance without any generic keys.

Return type:

qmflows.Settings

abstract static Package.handle_special_keywords(settings, key, value, mol)[source]

Abstract method; should be implemented by the child class.

A method providing additional processing for Package dependant generic keywords.

Parameters:

See also

Package.generic2specific()

Traverse settings and convert generic into package specific keys.


class qmflows.packages.ADF(pkg_name='adf')[source]

Package subclass for ADF.

This class takes care of calling the ADF quantum package. it uses both Plams and the Templates module to create the input files, while Plams takes care of running the plams.ADFJob. It returns a ADF_Result instance containing the output data.


class qmflows.packages.DFTB(pkg_name='dftb')[source]

Package subclass for DFTB.


class qmflows.packages.CP2K(pkg_name='cp2k')[source]

A Package subclass for running CP2K Jobs.

It uses plams together with the templates to generate the stucture input and also uses Plams to invoke the binary CP2K code. This class is not intended to be called directly by the user, instead the cp2k function should be called.


class qmflows.packages.CP2KMM(pkg_name='cp2k')[source]

A Package subclass for running CP2K Jobs for classical forcefield calculations.

It uses plams together with the templates to generate the stucture input and also uses Plams to invoke the binary CP2K code. This class is not intended to be called directly by the user, instead the cp2k_mm function should be called.


class qmflows.packages.ORCA(pkg_name='orca')[source]

A class for preparing the input for running a Orca job using both PLAMS and templates.

It also does the manangement of the input/output files resulting from running Orca and returns a Results object that containing the methods and data required to retrieve the output.


class qmflows.packages.PackageWrapper(job_type, name=None)[source]

A Package subclass for processing arbitrary plams.Job types.

Will automatically convert the passed Job type into the appropiate Package instance upon calling PackageWrapper.__call__().

Examples

>>> from scm.plams import ADFJob, AMSJob

>>> from qmflows import PackageWrapper, run
>>> from qmflows.packages import ResultWrapper, ADF_Result

# Start of with two PackageWrapper instances
>>> pkg_adf = PackageWrapper(ADFJob)
>>> pkg_ams = PackageWrapper(AMSJob)

# End up with two different Result instances
>>> result_adf: ADF_Result = run(pkg_adf(...), ...)  
>>> result_ams: ResultWrapper = run(pkg_ams(...), ...)  
job_type

The to-be executed plams.Job type. Will be automatically translated, when possible, into to the appropiate Package instance upon calling PackageWrapper.__call__(). If not, default to the more bare-bones implementation within this class and the matching ResultWrapper instance.

Type:

type[plams.Job]

See also

JOB_MAP

A dictionary mapping PLAMS Job types to appropiate QMFlows Package instances.


class qmflows.packages.Result(settings, molecule, job_name, dill_path=None, plams_dir=None, work_dir=None, status='successful', warnings=None)[source]

Class containing the results associated with a quantum chemistry simulation.

prop_mapping = NotImplemented

A Settings instance with Result-specific properties. Should be set when creating a subclass.

Result.__init__(settings, molecule, job_name, dill_path=None, plams_dir=None, work_dir=None, status='successful', warnings=None)[source]

Initialize a Result instance.

Parameters:
  • settings (qmflows.Settings) – Job Settings.

  • molecule (plams.Molecule) – molecular Geometry

  • job_name (str) – Name of the computations

  • dill_path (str) – The absolute path to the pickled .dill file.

  • plams_dir (str) – path to the Plams folder.

  • work_dir – scratch or another directory different from the plams_dir.

Type:

work_dir: str

Result.get_property(prop)[source]

Look for the optional arguments to parse a property, which are stored in the properties dictionary.

property Result.results

Getter for Result.results.

Get will load the .dill file and add all of its class attributes to this instance, barring the following three exceptions:

  • Private attributes/methods.

  • Magic methods.

  • Methods/attributes with names already contained within this instance.

This attribute’s value is set to None if the unpickling process fails.


class qmflows.packages.ADF_Result(settings, molecule, job_name, dill_path=None, plams_dir=None, work_dir=None, status='successful', warnings=None)[source]

Class providing access to PLAMS ADFJob result results.

class qmflows.packages.DFTB_Result(settings, molecule, job_name, dill_path=None, plams_dir=None, work_dir=None, status='successful', warnings=None)[source]

Class providing access to PLAMS DFTBJob result results.

class qmflows.packages.CP2K_Result(settings, molecule, job_name, dill_path=None, plams_dir=None, work_dir=None, status='successful', warnings=None)[source]

Class providing access to CP2K result.

class qmflows.packages.CP2KMM_Result(settings, molecule, job_name, dill_path=None, plams_dir=None, work_dir=None, status='successful', warnings=None)[source]

A class providing access to CP2KMM result.

class qmflows.packages.ORCA_Result(settings, molecule, job_name, dill_path=None, plams_dir=None, work_dir=None, status='successful', warnings=None)[source]

Class providing access to PLAMS OrcaJob results.

class qmflows.packages.ResultWrapper(settings, molecule, job_name, dill_path=None, plams_dir=None, work_dir=None, status='successful', warnings=None)[source]

The matching Result subclass for PackageWrapper.


qmflows.adf(self, settings, mol, job_name='', validate_output=True, **kwargs)

(scheduled) Perform a job with the package specified by Package.pkg_name.

Parameters:
  • settings (qmflows.Settings) – The user settings.

  • mol (plams.Molecule or rdkit.Mol) – A PLAMS or RDKit molecule to-be passed to the calculation.

  • job_name (str) – The name of the job.

  • validate_output (bool) – If True, perform a package-specific validation of the output files’ content. Only relevant if the particular Package subclass has actually implemented output validation.

  • **kwargs (Any) – Further keyword arguments to-be passed to Package.prerun(), Package.run_job() and Package.post_run().

Returns:

A new Result instance.

Return type:

Result

qmflows.dftb(self, settings, mol, job_name='', validate_output=True, **kwargs)

(scheduled) Perform a job with the package specified by Package.pkg_name.

Parameters:
  • settings (qmflows.Settings) – The user settings.

  • mol (plams.Molecule or rdkit.Mol) – A PLAMS or RDKit molecule to-be passed to the calculation.

  • job_name (str) – The name of the job.

  • validate_output (bool) – If True, perform a package-specific validation of the output files’ content. Only relevant if the particular Package subclass has actually implemented output validation.

  • **kwargs (Any) – Further keyword arguments to-be passed to Package.prerun(), Package.run_job() and Package.post_run().

Returns:

A new Result instance.

Return type:

Result

qmflows.cp2k(self, settings, mol, job_name='', validate_output=True, **kwargs)

(scheduled) Perform a job with the package specified by Package.pkg_name.

Parameters:
  • settings (qmflows.Settings) – The user settings.

  • mol (plams.Molecule or rdkit.Mol) – A PLAMS or RDKit molecule to-be passed to the calculation.

  • job_name (str) – The name of the job.

  • validate_output (bool) – If True, perform a package-specific validation of the output files’ content. Only relevant if the particular Package subclass has actually implemented output validation.

  • **kwargs (Any) – Further keyword arguments to-be passed to Package.prerun(), Package.run_job() and Package.post_run().

Returns:

A new Result instance.

Return type:

Result

qmflows.cp2k_mm(self, settings, mol, job_name='', validate_output=True, **kwargs)

(scheduled) Perform a job with the package specified by Package.pkg_name.

Parameters:
  • settings (qmflows.Settings) – The user settings.

  • mol (plams.Molecule or rdkit.Mol) – A PLAMS or RDKit molecule to-be passed to the calculation.

  • job_name (str) – The name of the job.

  • validate_output (bool) – If True, perform a package-specific validation of the output files’ content. Only relevant if the particular Package subclass has actually implemented output validation.

  • **kwargs (Any) – Further keyword arguments to-be passed to Package.prerun(), Package.run_job() and Package.post_run().

Returns:

A new Result instance.

Return type:

Result

qmflows.orca(self, settings, mol, job_name='', validate_output=True, **kwargs)

(scheduled) Perform a job with the package specified by Package.pkg_name.

Parameters:
  • settings (qmflows.Settings) – The user settings.

  • mol (plams.Molecule or rdkit.Mol) – A PLAMS or RDKit molecule to-be passed to the calculation.

  • job_name (str) – The name of the job.

  • validate_output (bool) – If True, perform a package-specific validation of the output files’ content. Only relevant if the particular Package subclass has actually implemented output validation.

  • **kwargs (Any) – Further keyword arguments to-be passed to Package.prerun(), Package.run_job() and Package.post_run().

Returns:

A new Result instance.

Return type:

Result

Indices and tables