Input File Specification¶
This document is a guide to the experiment.yml
file which is used as the input file for
Experi. This file specifies all the parameters for running an experiment. The experiment.yml
file is designed to be a human readable reference to the experimental data it generates; there is
no simple method of running the experiment from a different directory.
The experiment.yml
file has three main sections each section having a different role in the
running of the experiment.
- The command section defines the shell commands to run the experiment.
- The variables section defines the variables to substitute into the commands.
- The scheduler section defines the scheduler to use and the associated options.
Experi uses the YAML file format for the input file since it has widespread use and support and the format is simple for a human to parse. If you are unfamiliar with YAML have a look at this quick guide.
Commands¶
The commands
section is one of the main elements of the input file, specifying the commands to run
and how to run them. At it’s simplest the command key is a bash command to execute as in the example
below.
# experiment.yml
command: echo Hello World
The command key can also take a list of bash commands, executing each of the commands in order.
# experiment.yml
command:
- echo First Command
- echo Second Command
- echo Third Command
Variable Substitution¶
The power of Experi is taking a single command instance and replacing a variable with it’s values
defined under the variables key. Variables take the form of the new style python
string formatting, with variable names surrounded by braces — {variable}
will be replaced with
the value of variable
. For more information on these format strings the python string
formatting documentation is a reference guide, while pyformat.info is a guide to practical use.
In practice, using variables looks like this;
# experiment.yml
command: echo {nth} Command
Unlike the previous example with a list, there is no guarantee of order for the commands to run. Each combination of variables is effectively a separate command instance which could be running at the same time as any other command instance. Where there is a dependence between tasks, like creating a directory, passing a list to the command key has a guarantee on ordering.
# experiment.yml
command:
- mkdir -p {variable}
- echo {variable} > {variable}/out.txt
In the above example, mkdir -p {variable}
will always be executed before the file out.txt
is
written in it.
There is no limit to the number of variables specified to a command however, variables specified in a command need to have a definition in the variables key.
After variable substitution, only unique command objects are run, where uniqueness takes into account
all bash commands in a list. This is to allow for the over-specification of variables for certain
steps of more complicated workflows (see Jobs). The rationale for this choice is that
commands which are non-unique will typically have the same output, overwriting each other. Where
this is a problem, adding an echo {variable}
to the list within a command key is a reasonable
workaround.
Command Failure¶
When running an array job on a scheduler every command in the array will run even if the first one fails. This is the behaviour that Experi replicates for all environments it can run in. Every combination of variables is executed, with a successful command meaning the exit code for all variables was 0 (success), while if one combination of variables fails then the entire command is considered to have failed.
Managing Complex Jobs¶
A common component of running an experiment is that the number of tasks changes at different points. An experiment could consist of 3 steps;
- An initial phase which generates some starting configurations,
- A simulation phase which subjects the starting configurations to a large number experimental conditions,
- An analysis phase which aggregates the data from the simulation phase
Here steps 1 and 3 might have a single set of variables, while step 2 has hundreds. Experi has the
jobs
keyword to facilitate these types of experiments.
jobs:
- command: echo "Command 1"
- command:
- mkdir -p {var}
- cd {var}
- echo "Command 2 {var}"
- command: echo "Command 3"
The jobs key allows you to break an experiment into a list of commands, with each separate command being a different job on the scheduler. Each of command key is the same as described in the above sections.
Note
I should note that the command key will work fine when submitting a job to the scheduler, the above example can be expressed with a single command key
command:
1. echo "Command 1"
2. mkdir -p {var}
3. cd {var}
4. echo "Command 2 {var}"
5. echo "Command 3"
The difference is that in the first example Command 1
and Command 3
are only echoed once,
while in this example they are both echoed for each value of {var}
.
When using the jobs keyword, a prerequisite of executing the next set of commands is a successful
exit code of all shell commands executed in the current command key. This is making the assumption
that all experimental conditions are going to succeed and are required for the following steps. This
makes a lot of sense for a setup step, although less so for a search of parameter space where it is
likely to have numeric instabilities. I have an open issue to allow for the user
override of this feature, although a workaround in the meantime is to suffix commands which might
fail with || true
. This is the or operator followed by a command which will always
succeed—another more informative alternative is to echo
a message. This means that the return
value of the shell command always indicates success.
Variables¶
This is where the real power of experi lies, in being able to specify complex sets of variables in a simple human readable fashion. Variables are specified using the names as given in the command section. The simplest case is for a single value of a variable
# experiment.yml
command: echo hello {name}
variables:
name: Alice
Specifying lists of variables can be done in the same way as the commands, again for this simple case,
variables:
variable1:
- Alice
- Bob
- Charmaine
Multiple Variables¶
Specifying multiple variables is as simple as specifying a single variable, however by default, all possible combinations of the variables are generated. In the simplest case, with just a single value per variable
command: echo {greeting} {name}
variables:
greeting: hello
name: Alice
the resulting of the command would be hello Alice
. To greet multiple people we just add more
names
comamnd: echo {greeting} {name}
variables:
greeting: hello
name:
- Alice
- Bob
- Charmaine
which would result in
hello Alice
hello Bob
hello Charmaine
We have all possible combinations of the greeting and the name. Extending this, to greet all the people in both English and French we can add both the greetings, and all the names giving the input file
command: echo {greeting} {name}
variables:
greeting:
- hello
- bonjour
name:
- Alice
- Bob
- Charmaine
and resulting in the output
hello Alice
hello Bob
hello Charmaine
bonjour Bob
bonjour Alice
bonjour Charmaine
Complex specifications¶
In the above examples we are using the try everything approach, however there is more control over how variables are specified. By default we are using a product iterator, which could be explicitly defined like so
command: echo {greeting} {name}
variables:
product:
greeting:
- hello
- bonjour
name:
- Alice
- Bob
- Charmaine
however if we know that Alice speaks English, Bob speaks French, and Charmaine speaks Spanish we can use a similar specification, however instead of a product iterator we can use zip.
command: echo {greeting} {name}
variables:
zip:
greeting:
- hello
- bonjour
- hola
name:
- Alice
- Bob
- Charmaine
This is just the python zip
function under the hood, and will produce the output
hello Alice
bonjour Bob
hola Charmaine
This definition of the iterator applies to all variables defined in the level directly under the
iterator. So if we wanted to echo
to the screen and assuming we are on macOS use the say
command,
command: {command} {greeting} {name}
variables:
command:
- echo
- say
zip:
greeting:
- hello
- bonjour
- holj
name:
- Alice
- Bob
- Charmaine
In the above specification, we are applying the zip
iterator to the variables greeting and name,
however all the resulting values will then use the product
iterator, resulting in the following
sequence of commands.
echo hello Alice
echo bonjour Bob
echo hola Charmaine
say hello Alice
say bonjour Bob
say hola Charmaine
In more complicated contexts multiple zip
iterators are supported by having each set of values
nested in a list.
variables:
zip:
- var1: [1, 2, 3]
var2: [4, 5, 6]
- var3: ['A', 'B', 'C']
var4: ['D', 'E', 'F']
Which will zip
var1
and var2
, separately zip var3
and var4
, then take the
product of the result of those two operations.
pbs¶
This section is for the specification of the options for submission to a job scheduler.
The simplest case is just specifying
pbs: True
which will submit the job to the scheduler using the default values which are
pbs:
ncpus: 1
select: 1
walltime: 1:00
setup: ''
Of these default values setup
is the only one that should require explaining. This is a sequence
of commands in the pbs file that setup the environment, like loading modules, modifying the PATH,
activating a virtual end, etc. They are just inserted at the top of the file before the command is
run.
pbs:
setup:
- module load hoomd
- export PATH=$HOME/.local/bin:$PATH
While there are some niceties to make specifying options easier it is possible to pass any option by
using the flag as the dictionary key like in the example below with the mail address M
and path
to the output stream o
pbs:
M: malramsay64@gmail.com
o: dest