Welcome to experi’s documentation!¶
When running a series of experiments it can be difficult to remember the exact parameters of the experiment, or even how to run the simulation again. Additionally for complex experiments with many variables, iterating through all the combinations of variables can be unwieldy, error prone, and frustrating to write.
Experi keeps all the information to run an experiment in an experiment.yml
file which resides in
the same directory as the experimental results. This makes it possible to version control the
experiment through the experiment.yml`
file. Experi supports complex iteration of variables
defined in a human readable yaml syntax, making it simple to understand the experimental conditions
defined within. Having an easy to understand definition of the experimental conditions also provides
a reference when coming back to look at the results.
Installation¶
Experi is compatible with python>==3.6 supporting installation using pip
pip3 install experi
Note that for the command experi
to work the directory containing the executable needs to be in
the PATH
variable. In most cases this will probably be $HOME/.local/bin
although this is
installation dependent. If you don’t know where the executable is, on *nix systems the command
find $HOME -name experi
will search everywhere in your home directory to find it. Alternatively replacing $HOME
with
/
will search everywhere.
For installation from source
git clone https://github.com/malramsay64/experi.git
cd experi
pip3 install .
Install of the development environment requires pipenv
, which has comprehensive install
instuctions avalable. Once pipenv
is
installed, you can install the development dependencies with the command
pipenv install --dev --three
which creates a virtual environment for the project into which the exact versions of all requried packages is installed. You can activaete the virtualenv by running
pipenv shell
which creates a new shell with the environment activated. Alternatively, a single command (like the test cases) can be run using
pipenv run pytest
For those of you trying to run this on a cluster with only user privileges including the --user
flag will resolve issues with pip requiring elevated permissions installing to your home directory
rather than for everyone.
pip3 install --user experi
Introduction to YAML¶
Experi uses the YAML file format for the input file since it has widespread use and support and the format is simple for a human to parse. From the perspective of a python developer, YAML is a method of specifying python data structures in a text file. A key construct of the YAML file is the mapping of a key to a value, like a python dictionary. Also like a python dictionary the mapping uses the colon
key: value
All keys have to be strings, although values can be a range of types. Values that look like a string, will become python strings, values that are integers will become python integers, and values that are floats will become python floats. A value can also be itself a mapping
key:
value1: 1
value2: 2
value3: 3
value4: 4
where the above example is the same as the python data structure
{'"key"': {"value1": 1, "value2": 2, "value3": 3, "value4": 4}}
YAML also supports using lists as values, which are denoted with bullet points
key:
- 1
- 2
- 3
- 4
where the value is now the python list [1, 2, 3, 4]
. The python list syntax is another way to
specify a list in a YAML file with the example below having the same value as the example above
key: [1, 2, 3 4]
The final feature of YAML files I will highlight here is the specification of long strings, which
are particularly useful when writing long bash commands. To create a long single line string,
start the string with the >
character.
key: >
A long string
written over
multiple lines
which will have no newline characters inserted. This is so much better than having to end line in
bash with a \
. To include the newlines in the string, you can instead use the |
symbol
key: |
Line 1
Line 2
Line 3
END
which includes a newline character after each number.
The examples presented should be enough to get started using Experi. The following resources are recommended for further reading
Input File Specification¶
This document is a guide to the experiment.yml
file which is used as the input file for
Experi. This file specifies all the parameters for running an experiment. The experiment.yml
file is designed to be a human readable reference to the experimental data it generates; there is
no simple method of running the experiment from a different directory.
The experiment.yml
file has three main sections each section having a different role in the
running of the experiment.
The command section defines the shell commands to run the experiment.
The variables section defines the variables to substitute into the commands.
The scheduler section defines the scheduler to use and the associated options.
Experi uses the YAML file format for the input file since it has widespread use and support and the format is simple for a human to parse. If you are unfamiliar with YAML have a look at this quick guide.
Commands¶
The commands
section is one of the main elements of the input file, specifying the commands to run
and how to run them. At it’s simplest the command key is a bash command to execute as in the example
below.
# experiment.yml
command: echo Hello World
The command key can also take a list of bash commands, executing each of the commands in order.
# experiment.yml
command:
- echo First Command
- echo Second Command
- echo Third Command
Variable Substitution¶
The power of Experi is taking a single command instance and replacing a variable with it’s values
defined under the variables key. Variables take the form of the new style python
string formatting, with variable names surrounded by braces — {variable}
will be replaced with
the value of variable
. For more information on these format strings the python string
formatting documentation is a reference guide, while pyformat.info is a guide to practical use.
In practice, using variables looks like this;
# experiment.yml
command: echo {nth} Command
Unlike the previous example with a list, there is no guarantee of order for the commands to run. Each combination of variables is effectively a separate command instance which could be running at the same time as any other command instance. Where there is a dependence between tasks, like creating a directory, passing a list to the command key has a guarantee on ordering.
# experiment.yml
command:
- mkdir -p {variable}
- echo {variable} > {variable}/out.txt
In the above example, mkdir -p {variable}
will always be executed before the file out.txt
is
written in it.
There is no limit to the number of variables specified to a command however, variables specified in a command need to have a definition in the variables key.
After variable substitution, only unique command objects are run, where uniqueness takes into account
all bash commands in a list. This is to allow for the over-specification of variables for certain
steps of more complicated workflows (see Jobs). The rationale for this choice is that
commands which are non-unique will typically have the same output, overwriting each other. Where
this is a problem, adding an echo {variable}
to the list within a command key is a reasonable
workaround.
Command Failure¶
When running an array job on a scheduler every command in the array will run even if the first one fails. This is the behaviour that Experi replicates for all environments it can run in. Every combination of variables is executed, with a successful command meaning the exit code for all variables was 0 (success), while if one combination of variables fails then the entire command is considered to have failed.
Managing Complex Jobs¶
A common component of running an experiment is that the number of tasks changes at different points. An experiment could consist of 3 steps;
An initial phase which generates some starting configurations,
A simulation phase which subjects the starting configurations to a large number experimental conditions,
An analysis phase which aggregates the data from the simulation phase
Here steps 1 and 3 might have a single set of variables, while step 2 has hundreds. Experi has the
jobs
keyword to facilitate these types of experiments.
jobs:
- command: echo "Command 1"
- command:
- mkdir -p {var}
- cd {var}
- echo "Command 2 {var}"
- command: echo "Command 3"
The jobs key allows you to break an experiment into a list of commands, with each separate command being a different job on the scheduler. Each of command key is the same as described in the above sections.
Note
I should note that the command key will work fine when submitting a job to the scheduler, the above example can be expressed with a single command key
command:
1. echo "Command 1"
2. mkdir -p {var}
3. cd {var}
4. echo "Command 2 {var}"
5. echo "Command 3"
The difference is that in the first example Command 1
and Command 3
are only echoed once,
while in this example they are both echoed for each value of {var}
.
When using the jobs keyword, a prerequisite of executing the next set of commands is a successful
exit code of all shell commands executed in the current command key. This is making the assumption
that all experimental conditions are going to succeed and are required for the following steps. This
makes a lot of sense for a setup step, although less so for a search of parameter space where it is
likely to have numeric instabilities. I have an open issue to allow for the user
override of this feature, although a workaround in the meantime is to suffix commands which might
fail with || true
. This is the or operator followed by a command which will always
succeed—another more informative alternative is to echo
a message. This means that the return
value of the shell command always indicates success.
Variables¶
This is where the real power of experi lies, in being able to specify complex sets of variables in a simple human readable fashion. Variables are specified using the names as given in the command section. The simplest case is for a single value of a variable
# experiment.yml
command: echo hello {name}
variables:
name: Alice
Specifying lists of variables can be done in the same way as the commands, again for this simple case,
variables:
variable1:
- Alice
- Bob
- Charmaine
Multiple Variables¶
Specifying multiple variables is as simple as specifying a single variable, however by default, all possible combinations of the variables are generated. In the simplest case, with just a single value per variable
command: echo {greeting} {name}
variables:
greeting: hello
name: Alice
the resulting of the command would be hello Alice
. To greet multiple people we just add more
names
comamnd: echo {greeting} {name}
variables:
greeting: hello
name:
- Alice
- Bob
- Charmaine
which would result in
hello Alice
hello Bob
hello Charmaine
We have all possible combinations of the greeting and the name. Extending this, to greet all the people in both English and French we can add both the greetings, and all the names giving the input file
command: echo {greeting} {name}
variables:
greeting:
- hello
- bonjour
name:
- Alice
- Bob
- Charmaine
and resulting in the output
hello Alice
hello Bob
hello Charmaine
bonjour Bob
bonjour Alice
bonjour Charmaine
Iterators¶
Product¶
In the above examples we are using the try everything approach, however there is more control over how variables are specified. By default we are using a product iterator, which could be explicitly defined like so
command: echo {greeting} {name}
variables:
product:
greeting:
- hello
- bonjour
name:
- Alice
- Bob
- Charmaine
Note that the product iterator doesn’t support a list of variables.
There is way that a list of values makes sense,
so will raise a ValueError
.
Zip Iterator¶
However, if we know that Alice speaks English, Bob speaks French, and Charmaine speaks Spanish we can use a similar specification, however instead of a product iterator we can use zip.
command: echo {greeting} {name}
variables:
zip:
greeting:
- hello
- bonjour
- hola
name:
- Alice
- Bob
- Charmaine
This is just the python zip
function under the hood, and will produce the output
hello Alice
bonjour Bob
hola Charmaine
This definition of the iterator applies to all variables defined in the level directly under the
iterator. So if we wanted to echo
to the screen and assuming we are on macOS use the say
command,
command: {command} {greeting} {name}
variables:
command:
- echo
- say
zip:
greeting:
- hello
- bonjour
- holj
name:
- Alice
- Bob
- Charmaine
In the above specification, we are applying the zip
iterator to the variables greeting and name,
however all the resulting values will then use the product
iterator, resulting in the following
sequence of commands.
echo hello Alice
echo bonjour Bob
echo hola Charmaine
say hello Alice
say bonjour Bob
say hola Charmaine
In more complicated contexts multiple zip
iterators are supported by having each set of values
nested in a list.
variables:
zip:
- var1: [1, 2, 3]
var2: [4, 5, 6]
- var3: ['A', 'B', 'C']
var4: ['D', 'E', 'F']
Which will zip
var1
and var2
, separately zip var3
and var4
, then take the
product of the result of those two operations.
Chain Iterator¶
This handles the scenario where a single simulation has separate components, which in my case is two separate pressures. Each pressure I run the simulation at has a different sequence of temperatures
variables:
chain:
- pressure: 1.0
temperature: [0.2, 0.3, 0.4]
- pressure: 13.0
temperature: [1.0, 1.5, 2.0]
This will generate the list of values
pressure: 1.0, temperature: 0.2
pressure: 1.0, temperature: 0.3
pressure: 1.0, temperature: 0.4
pressure: 13.0, temperature: 1.0
pressure: 13.0, temperature: 1.5
pressure: 13.0, temperature: 2.0
The append
keyword is an alias for chain
and operates in exactly the same way.
Cycle Iterator¶
This is an iterator which is particularly useful when combined with the chain and zip iterators. It will cycle through a sequence of values a specified number of times
variables:
cycle:
times: 2
steps: [100, 10, 1]
This will give the list
steps: 100
steps: 10
steps: 1
steps: 100
steps: 10
steps: 1
Which when combined with the chain example above we can get
variables:
zip:
chain:
- pressure: 1.0
temperature: [0.2, 0.3, 0.4]
- pressure: 13.0
temperature: [1.0, 1.5, 2.0]
cycle:
times: 2
steps: [100, 10, 1]
Where the number of steps follows the change in temperature.
pressure: 1.0, temperature: 0.2, steps: 100
pressure: 1.0, temperature: 0.3, steps: 10
pressure: 1.0, temperature: 0.4 steps: 10
pressure: 13.0, temperature: 1.0, steps: 100
pressure: 13.0, temperature: 1.5, steps: 10
pressure: 13.0, temperature: 2.0, steps: 1
Arange Iterator¶
In cases where the number of values for a variable are too numerous to list
manually, Experi supports a range iterator, specified using arange
like below
var:
arange: 100
arange
reflects the use of the NumPy arange
function to generate the values,
and is less likely to be a variable name.
Like the NumPy function this iterators supports arguments for
start
, stop
, step
and dtype
,
which can all be specified as key value pairs
var:
arange:
start: 100
stop: 110
step: 2.5
dtype: float
which will set var
to [100., 102.5 105., 107.5]
. In this case this specification
is not particularly helpful, however, for hundreds of values
var:
arange:
stop: 500
step: 5
this approach is a definite improvement.
pbs¶
This section is for the specification of the options for submission to a job scheduler.
The simplest case is just specifying
pbs: True
which will submit the job to the scheduler using the default values which are
pbs:
ncpus: 1
select: 1
walltime: 1:00
setup: ''
Of these default values setup
is the only one that should require explaining. This is a sequence
of commands in the pbs file that setup the environment, like loading modules, modifying the PATH,
activating a virtual end, etc. They are just inserted at the top of the file before the command is
run.
pbs:
setup:
- module load hoomd
- export PATH=$HOME/.local/bin:$PATH
While there are some niceties to make specifying options easier it is possible to pass any option by
using the flag as the dictionary key like in the example below with the mail address M
and path
to the output stream o
pbs:
M: malramsay64@gmail.com
o: dest
Command Line Interface¶
Experi is designed to typically not require command line options. Primarily this is to ensure the reproducibility of running the code.
experi¶
experi [OPTIONS]
Options
-
--version
¶
Show the version and exit.
-
-f
,
--input-file
<input_file>
¶ Path to a YAML file containing experiment data. Note that the experiment will be run from the directory in which the file exists, not the directory the script was run from.
-
-s
,
--scheduler
<scheduler>
¶ The scheduler with which to run the jobs.
- Options
shell|pbs|slurm
-
--use-dependencies
¶
Use the dependencies specified in the command to reduce the processing
-
--dry-run
¶
Don’t run commands or submit jobs, just show the commands that would be run.
-
-v
,
--verbose
¶
Increase the verbosity of logging events.
Examples¶
This is a collection of example workflows using Experi demonstrating some different ways of using it for real experiments.
Command Line Options¶
This is an example of using Experi in my own research which was the reason I developed it. I have a
tool sdrun
which translates command line options into a molecular dynamics simulation. This
experiment is investigating how temperature affects the motion of particles in the simulation. The
experiment consists of three separate parts;
The creation of a high temperature configuration which is well mixed (create)
Cooling the high temperature configuration to each of the desired temperatures for data collection (equil)
Collect data on the motion of particles within the simulation (prod)
The terms create, equil, and prod are the arguments to sdrun
which reflect these stages. For
this simulation I would like each of the steps to be a separate job on the scheduler, hence I use
the jobs
key. Part of the reason for this is that I am only creating a single configuration
for the step which is then used for each temperature in the equal step. By running as separate jobs
I will have a job with a single task for the first step which once finished will allow the
equilibration array job with 10 elements start.
Input Files¶
A common workflow for many software packages is to define the workflow with the use of input files. Better support for input files is planned (see issue <experi #>), though it is still possible to use them. The below example creates an input file for using with the software LAMMPS.
# experiment.yml
command:
- |
echo -e "
<file>
" < file.in
- lmp_run -in file.in
Subdirectories¶
Breaking the output into a subdirectories allows for more organisation of experimental result, particularly where there are many output files generated. Experi will always run from the directory containing the experiment.yml file, however that doesn’t prevent you from creating subdirectories and running commands in them. This example shows how you can use Experi to run code in a separate subdirectory for each set of variables.
command:
- mkdir -p <direcotry>
- cd <directory>
- run command