Welcome to experi’s documentation!

When running a series of experiments it can be difficult to remember the exact parameters of the experiment, or even how to run the simulation again. Additionally for complex experiments with many variables, iterating through all the combinations of variables can be unwieldy, error prone, and frustrating to write.

Experi keeps all the information to run an experiment in an experiment.yml file which resides in the same directory as the experimental results. This makes it possible to version control the experiment through the experiment.yml` file. Experi supports complex iteration of variables defined in a human readable yaml syntax, making it simple to understand the experimental conditions defined within. Having an easy to understand definition of the experimental conditions also provides a reference when coming back to look at the results.

Installation

Experi is compatible with python>==3.6 supporting installation using pip

pip3 install experi

Note that for the command experi to work the directory containing the executable needs to be in the PATH variable. In most cases this will probably be $HOME/.local/bin although this is installation dependent. If you don’t know where the executable is, on *nix systems the command

find $HOME -name experi

will search everywhere in your home directory to find it. Alternatively replacing $HOME with / will search everywhere.

For installation from source

git clone https://github.com/malramsay64/experi.git
cd experi
pip3 install .

Install of the development environment requires pipenv, which has comprehensive install instuctions avalable. Once pipenv is installed, you can install the development dependencies with the command

pipenv install --dev --three

which creates a virtual environment for the project into which the exact versions of all requried packages is installed. You can activaete the virtualenv by running

pipenv shell

which creates a new shell with the environment activated. Alternatively, a single command (like the test cases) can be run using

pipenv run pytest

For those of you trying to run this on a cluster with only user privileges including the --user flag will resolve issues with pip requiring elevated permissions installing to your home directory rather than for everyone.

pip3 install --user experi

Introduction to YAML

Experi uses the YAML file format for the input file since it has widespread use and support and the format is simple for a human to parse. From the perspective of a python developer, YAML is a method of specifying python data structures in a text file. A key construct of the YAML file is the mapping of a key to a value, like a python dictionary. Also like a python dictionary the mapping uses the colon

key: value

All keys have to be strings, although values can be a range of types. Values that look like a string, will become python strings, values that are integers will become python integers, and values that are floats will become python floats. A value can also be itself a mapping

key:
  value1: 1
  value2: 2
  value3: 3
  value4: 4

where the above example is the same as the python data structure

{'"key"': {"value1": 1, "value2": 2, "value3": 3, "value4": 4}}

YAML also supports using lists as values, which are denoted with bullet points

key:
  - 1
  - 2
  - 3
  - 4

where the value is now the python list [1, 2, 3, 4]. The python list syntax is another way to specify a list in a YAML file with the example below having the same value as the example above

key: [1, 2, 3 4]

The final feature of YAML files I will highlight here is the specification of long strings, which are particularly useful when writing long bash commands. To create a long single line string, start the string with the > character.

key: >
  A long string
  written over
  multiple lines

which will have no newline characters inserted. This is so much better than having to end line in bash with a \. To include the newlines in the string, you can instead use the | symbol

key: |
  Line 1
  Line 2
  Line 3
  END

which includes a newline character after each number.

The examples presented should be enough to get started using Experi. The following resources are recommended for further reading

Input File Specification

This document is a guide to the experiment.yml file which is used as the input file for Experi. This file specifies all the parameters for running an experiment. The experiment.yml file is designed to be a human readable reference to the experimental data it generates; there is no simple method of running the experiment from a different directory.

The experiment.yml file has three main sections each section having a different role in the running of the experiment.

  • The command section defines the shell commands to run the experiment.

  • The variables section defines the variables to substitute into the commands.

  • The scheduler section defines the scheduler to use and the associated options.

Experi uses the YAML file format for the input file since it has widespread use and support and the format is simple for a human to parse. If you are unfamiliar with YAML have a look at this quick guide.

Commands

The commands section is one of the main elements of the input file, specifying the commands to run and how to run them. At it’s simplest the command key is a bash command to execute as in the example below.

# experiment.yml

command: echo Hello World

The command key can also take a list of bash commands, executing each of the commands in order.

# experiment.yml

command:
    - echo First Command
    - echo Second Command
    - echo Third Command

Variable Substitution

The power of Experi is taking a single command instance and replacing a variable with it’s values defined under the variables key. Variables take the form of the new style python string formatting, with variable names surrounded by braces — {variable} will be replaced with the value of variable. For more information on these format strings the python string formatting documentation is a reference guide, while pyformat.info is a guide to practical use.

In practice, using variables looks like this;

# experiment.yml

command: echo {nth} Command

Unlike the previous example with a list, there is no guarantee of order for the commands to run. Each combination of variables is effectively a separate command instance which could be running at the same time as any other command instance. Where there is a dependence between tasks, like creating a directory, passing a list to the command key has a guarantee on ordering.

# experiment.yml

command:
    - mkdir -p {variable}
    - echo {variable} > {variable}/out.txt

In the above example, mkdir -p {variable} will always be executed before the file out.txt is written in it.

There is no limit to the number of variables specified to a command however, variables specified in a command need to have a definition in the variables key.

After variable substitution, only unique command objects are run, where uniqueness takes into account all bash commands in a list. This is to allow for the over-specification of variables for certain steps of more complicated workflows (see Jobs). The rationale for this choice is that commands which are non-unique will typically have the same output, overwriting each other. Where this is a problem, adding an echo {variable} to the list within a command key is a reasonable workaround.

Command Failure

When running an array job on a scheduler every command in the array will run even if the first one fails. This is the behaviour that Experi replicates for all environments it can run in. Every combination of variables is executed, with a successful command meaning the exit code for all variables was 0 (success), while if one combination of variables fails then the entire command is considered to have failed.

Managing Complex Jobs

A common component of running an experiment is that the number of tasks changes at different points. An experiment could consist of 3 steps;

  1. An initial phase which generates some starting configurations,

  2. A simulation phase which subjects the starting configurations to a large number experimental conditions,

  3. An analysis phase which aggregates the data from the simulation phase

Here steps 1 and 3 might have a single set of variables, while step 2 has hundreds. Experi has the jobs keyword to facilitate these types of experiments.

jobs:
    - command: echo "Command 1"
    - command:
        - mkdir -p {var}
        - cd {var}
        - echo "Command 2 {var}"
    - command: echo "Command 3"

The jobs key allows you to break an experiment into a list of commands, with each separate command being a different job on the scheduler. Each of command key is the same as described in the above sections.

Note

I should note that the command key will work fine when submitting a job to the scheduler, the above example can be expressed with a single command key

command:
    1. echo "Command 1"
    2. mkdir -p {var}
    3. cd {var}
    4. echo "Command 2 {var}"
    5. echo "Command 3"

The difference is that in the first example Command 1 and Command 3 are only echoed once, while in this example they are both echoed for each value of {var}.

When using the jobs keyword, a prerequisite of executing the next set of commands is a successful exit code of all shell commands executed in the current command key. This is making the assumption that all experimental conditions are going to succeed and are required for the following steps. This makes a lot of sense for a setup step, although less so for a search of parameter space where it is likely to have numeric instabilities. I have an open issue to allow for the user override of this feature, although a workaround in the meantime is to suffix commands which might fail with || true. This is the or operator followed by a command which will always succeed—another more informative alternative is to echo a message. This means that the return value of the shell command always indicates success.

Variables

This is where the real power of experi lies, in being able to specify complex sets of variables in a simple human readable fashion. Variables are specified using the names as given in the command section. The simplest case is for a single value of a variable

# experiment.yml

command: echo hello {name}

variables:
    name: Alice

Specifying lists of variables can be done in the same way as the commands, again for this simple case,

variables:
    variable1:
        - Alice
        - Bob
        - Charmaine

Multiple Variables

Specifying multiple variables is as simple as specifying a single variable, however by default, all possible combinations of the variables are generated. In the simplest case, with just a single value per variable

command: echo {greeting} {name}
variables:
    greeting: hello
    name: Alice

the resulting of the command would be hello Alice. To greet multiple people we just add more names

comamnd: echo {greeting} {name}
variables:
    greeting: hello
    name:
        - Alice
        - Bob
        - Charmaine

which would result in

hello Alice
hello Bob
hello Charmaine

We have all possible combinations of the greeting and the name. Extending this, to greet all the people in both English and French we can add both the greetings, and all the names giving the input file

command: echo {greeting} {name}
variables:
    greeting:
        - hello
        - bonjour
    name:
        - Alice
        - Bob
        - Charmaine

and resulting in the output

hello Alice
hello Bob
hello Charmaine
bonjour Bob
bonjour Alice
bonjour Charmaine

Iterators

Product

In the above examples we are using the try everything approach, however there is more control over how variables are specified. By default we are using a product iterator, which could be explicitly defined like so

command: echo {greeting} {name}
variables:
    product:
        greeting:
            - hello
            - bonjour
        name:
            - Alice
            - Bob
            - Charmaine

Note that the product iterator doesn’t support a list of variables. There is way that a list of values makes sense, so will raise a ValueError.

Zip Iterator

However, if we know that Alice speaks English, Bob speaks French, and Charmaine speaks Spanish we can use a similar specification, however instead of a product iterator we can use zip.

command: echo {greeting} {name}
variables:
    zip:
        greeting:
            - hello
            - bonjour
            - hola
        name:
            - Alice
            - Bob
            - Charmaine

This is just the python zip function under the hood, and will produce the output

hello Alice
bonjour Bob
hola Charmaine

This definition of the iterator applies to all variables defined in the level directly under the iterator. So if we wanted to echo to the screen and assuming we are on macOS use the say command,

command: {command} {greeting} {name}
variables:
    command:
        - echo
        - say
    zip:
        greeting:
            - hello
            - bonjour
            - holj
        name:
            - Alice
            - Bob
            - Charmaine

In the above specification, we are applying the zip iterator to the variables greeting and name, however all the resulting values will then use the product iterator, resulting in the following sequence of commands.

echo hello Alice
echo bonjour Bob
echo hola Charmaine
say hello Alice
say bonjour Bob
say hola Charmaine

In more complicated contexts multiple zip iterators are supported by having each set of values nested in a list.

variables:
    zip:
        - var1: [1, 2, 3]
          var2: [4, 5, 6]
        - var3: ['A', 'B', 'C']
          var4: ['D', 'E', 'F']

Which will zip var1 and var2, separately zip var3 and var4, then take the product of the result of those two operations.

Chain Iterator

This handles the scenario where a single simulation has separate components, which in my case is two separate pressures. Each pressure I run the simulation at has a different sequence of temperatures

variables:
    chain:
        - pressure: 1.0
          temperature: [0.2, 0.3, 0.4]
        - pressure: 13.0
          temperature: [1.0, 1.5, 2.0]

This will generate the list of values

pressure: 1.0, temperature: 0.2
pressure: 1.0, temperature: 0.3
pressure: 1.0, temperature: 0.4
pressure: 13.0, temperature: 1.0
pressure: 13.0, temperature: 1.5
pressure: 13.0, temperature: 2.0

The append keyword is an alias for chain and operates in exactly the same way.

Cycle Iterator

This is an iterator which is particularly useful when combined with the chain and zip iterators. It will cycle through a sequence of values a specified number of times

variables:
    cycle:
        times: 2
        steps: [100, 10, 1]

This will give the list

steps: 100
steps: 10
steps: 1
steps: 100
steps: 10
steps: 1

Which when combined with the chain example above we can get

variables:
  zip:
    chain:
      - pressure: 1.0
        temperature: [0.2, 0.3, 0.4]
      - pressure: 13.0
        temperature: [1.0, 1.5, 2.0]
    cycle:
      times: 2
      steps: [100, 10, 1]

Where the number of steps follows the change in temperature.

pressure: 1.0, temperature: 0.2, steps: 100
pressure: 1.0, temperature: 0.3, steps: 10
pressure: 1.0, temperature: 0.4 steps: 10
pressure: 13.0, temperature: 1.0, steps: 100
pressure: 13.0, temperature: 1.5, steps: 10
pressure: 13.0, temperature: 2.0, steps: 1
Arange Iterator

In cases where the number of values for a variable are too numerous to list manually, Experi supports a range iterator, specified using arange like below

var:
    arange: 100

arange reflects the use of the NumPy arange function to generate the values, and is less likely to be a variable name. Like the NumPy function this iterators supports arguments for start, stop, step and dtype, which can all be specified as key value pairs

var:
    arange:
        start: 100
        stop: 110
        step: 2.5
        dtype: float

which will set var to [100., 102.5 105., 107.5]. In this case this specification is not particularly helpful, however, for hundreds of values

var:
    arange:
        stop: 500
        step: 5

this approach is a definite improvement.

pbs

This section is for the specification of the options for submission to a job scheduler.

The simplest case is just specifying

pbs: True

which will submit the job to the scheduler using the default values which are

pbs:
    ncpus: 1
    select: 1
    walltime: 1:00
    setup: ''

Of these default values setup is the only one that should require explaining. This is a sequence of commands in the pbs file that setup the environment, like loading modules, modifying the PATH, activating a virtual end, etc. They are just inserted at the top of the file before the command is run.

pbs:
    setup:
        - module load hoomd
        - export PATH=$HOME/.local/bin:$PATH

While there are some niceties to make specifying options easier it is possible to pass any option by using the flag as the dictionary key like in the example below with the mail address M and path to the output stream o

pbs:
    M: malramsay64@gmail.com
    o: dest

Command Line Interface

Experi is designed to typically not require command line options. Primarily this is to ensure the reproducibility of running the code.

experi

experi [OPTIONS]

Options

--version

Show the version and exit.

-f, --input-file <input_file>

Path to a YAML file containing experiment data. Note that the experiment will be run from the directory in which the file exists, not the directory the script was run from.

-s, --scheduler <scheduler>

The scheduler with which to run the jobs.

Options

shell|pbs|slurm

--use-dependencies

Use the dependencies specified in the command to reduce the processing

--dry-run

Don’t run commands or submit jobs, just show the commands that would be run.

-v, --verbose

Increase the verbosity of logging events.

Examples

This is a collection of example workflows using Experi demonstrating some different ways of using it for real experiments.

Command Line Options

This is an example of using Experi in my own research which was the reason I developed it. I have a tool sdrun which translates command line options into a molecular dynamics simulation. This experiment is investigating how temperature affects the motion of particles in the simulation. The experiment consists of three separate parts;

  1. The creation of a high temperature configuration which is well mixed (create)

  2. Cooling the high temperature configuration to each of the desired temperatures for data collection (equil)

  3. Collect data on the motion of particles within the simulation (prod)

The terms create, equil, and prod are the arguments to sdrun which reflect these stages. For this simulation I would like each of the steps to be a separate job on the scheduler, hence I use the jobs key. Part of the reason for this is that I am only creating a single configuration for the step which is then used for each temperature in the equal step. By running as separate jobs I will have a job with a single task for the first step which once finished will allow the equilibration array job with 10 elements start.

Input Files

A common workflow for many software packages is to define the workflow with the use of input files. Better support for input files is planned (see issue <experi #>), though it is still possible to use them. The below example creates an input file for using with the software LAMMPS.

# experiment.yml

command:
  - |
    echo -e "
    <file>
    " < file.in
  - lmp_run -in file.in

Subdirectories

Breaking the output into a subdirectories allows for more organisation of experimental result, particularly where there are many output files generated. Experi will always run from the directory containing the experiment.yml file, however that doesn’t prevent you from creating subdirectories and running commands in them. This example shows how you can use Experi to run code in a separate subdirectory for each set of variables.

command:
  - mkdir -p <direcotry>
  - cd <directory>
  - run command

Indices and tables