XML-format for defining interfaces for external programs

In order to use external programs within PriorsEditor, their interfaces must be explained to PriorsEditor through special configuration files written in XML-format.
This page describes how to write such configuration files.

The box below shows the XML-code for a simple example program. It could also be useful to study the other XML-files that are available from this website.

  <?xml version="1.0" encoding="UTF-8"?>

  <program name="RandomFilter" class="ExternalProgram">

	 <service type="local" location="C:\bioinformatics\randomfilter.exe" />

	 <parameter type="regular" name="Region Track" class="RegionDataset" required="yes">   
	   <dataformat name="GFF" />
	   <argument type="valued option" switch="-i"/>
	 </parameter>

	 <parameter type="regular" name="Probability" class="Float" required="yes">
	   <min>0</min>
	   <max>1.0</max>
	   <default>0.5</default>
	   <argument type="valued option" switch="-p"/>
	 </parameter>

	 <parameter type="result" name="Result" class="RegionDataset" required="yes">
	   <dataformat name="GFF" />
	   <argument type="valued option" switch="-o"/>
	 </parameter>

  </program>

After the compulsory XML-header follows a <program>-element which contains the actual description of the program and its interface.
The <program>-element has two arguments: a name and a class. The name argument is just an arbitrary name used to refer to the program.
If the external program is a motif discovery program, the class argument must be "MotifDiscovery" and for motif scanning programs it must be "MotifScanning".
For other external programs that do not fall into either of these two categories, the class argument is not important and can be set to any string.
A third and optional argument to <program> is cygwin which can take on the values "yes" or "no" (default is "no"). This argument can be used to signal that the program is originally a LINUX program and needs CYGWIN to be installed in order to run under WINDOWS operating systems. If cygwin is set to "yes" some filepaths might be converted to UNIX-style as necessary.
The <program>-element contains other elements that describe various properties of the program, including information about where the program is located and how to execute it and descriptions of the input and output parameters of the program.

Program properties

The <program>-element can contain an optional <properties>-element which describes various properties of the program, including names of the authors, a short description of the program itself, contact information, websites and citations. These properties are displayed when the user double-clicks on a program in the External Programs Dialog, and they are mostly useful if one wants to share an XML-configuration file with other users that are not familiar with the program. The <properties>-element can also contain a <license>-element with a license agreement that the user must accept in order to use the program. HTML-code can be used in the text as long as the brackets are escaped. E.g. to use italics, "<i>" must be escaped as "&lt;i&gt;".


  <properties>
      <author>Timothy L. Bailey and Charles Elkan</author>
      <citation>
      	Timothy L. Bailey and Charles Elkan (1994)
      	"Fitting a mixture model by expectation maximization to discover motifs in biopolymers",
      	&lt;i&gt;Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology&lt;/i&gt;,
      	(28-36), AAAI Press, 1994
      </citation>
      <contact>donotreply@somewhere.org</contact>
      <homepage>http://meme.sdsc.edu</homepage>
      <description>MEME searches for novel motifs in DNA (and protein) sequences using an expectation maximization strategy</description>   
  </properties>


Service type and location


   <service type="local">
     <source version="3.1" os="Windows"          url="http://homes.esat.kuleuven.be/~thijs/download/windows/MotifScanner.exe"/>
     <source version="3.1" os="Windows (mirror)" url="http://tare.medisin.ntnu.no/priorseditor/tools/windows/MotifScanner.exe"/>   
     <source version="3.2" os="Linux"            url="http://homes.esat.kuleuven.be/~thijs/download/linux_3.2/MotifScanner"/>
     <source version="3.2" os="Linux x86-64"     url="http://homes.esat.kuleuven.be/~thijs/download/linux_x86-64/MotifScanner"/>
     <source version="3.2" os="Mac OS X"         url="http://homes.esat.kuleuven.be/~thijs/download/macosx_ppc/MotifScanner"/>
   </service>

The <service>-element describes the program's location and how it should be accessed.
The current version of PriorsEditor only supports use of programs that are installed locally on the user's computer (type="local"), but future versions might also support the use of web services.
If the location of the executable program is known, it can provided as an argument to the <service>-element, as seen in the example for the "RandomFilter" program on top of this page.
If the location of the program is not stated in the XML-file, the user must specify the location when the XML-file is installed in PriorsEditor.
If a precompiled executable of the program can be obtained from an external source, the location of this source can be provided inside the <service>-element using optional <source>-elements. The version and os arguments just provide a description for the program source, but the url argument must point to a single file that can be downloaded and "installed" locally by PriorsEditor. (The source file must be usable 'as is' since PriorsEditor does not actually perform any unpacking or runs any installation scripts on the program, it just copies the file to somewhere in the local filesystem).

Describing the program's interface

The description of the interface itself mostly consist of a list of <parameter>-elements, each describing an input or output parameter of the program.

   <parameter type="regular" name="Positional priors" class="NumericDataset" required="no" hidden="no">   
       <description>A positional priors track (Note: sum of priors for all positions must not exceed 1.0!) </description>   
       <argument type="valued option" switch="-psp"/>
       <dataformat name="PSP">
           <setting name="Orientation" class="String">Direct</setting>
           <setting name="Motif width" class="Integer">8</setting>
       </dataformat>
   </parameter>

Each parameter has a type argument which can be either "source", "result" or "regular". A parameter must also have a name argument, which is used to refer to the parameter and is also the name displayed in GUI dialogs. Finally, a parameter must have a class argument which specifies the type of data the parameter holds. The class argument can have one of the following values: Motif discovery and scanning programs should have one source-type parameter of class="DNASequenceData". Other external programs should use regular parameters to pass input data. Motif discovery programs require two result-type parameters. One should be named "Motifs" and be of class="MotifCollection" while the other should be named "Result" and be of class="RegionDataset". Motif scanning programs should only have one result parameter named "Result" of class="RegionDataset". Other external programs should also have only one result parameter (support for multiple outputs is planned for future releases). The name of the result parameter for other external programs is not important, and the class can be set depending on the returned data object.
Parameters can have two additional optional arguments: required and hidden which can be set to either "yes" or "no". Hidden parameters do not show up in GUI dialogs, and the user can not change the value of a hidden parameter. Hidden parameters can, however, be used to pass default settings to programs.

<parameter>-elements can contain other elements, for instance an optional <description> of the parameter which can be displayed to the user in a GUI dialog (HTML-code can be used if brackets are escaped as explained above).
The <argument>-element inside the parameter is required and describes how the parameter is passed to the program. The type of the argument-element can be either "valued option", "flag" (for Boolean parameters) or "implicit". Arguments can also have a switch which preceeds the parameter on the command line.

Restricting values of simple parameters

Simple parameters such as Integers, Floats, Strings and Booleans can be given default settings with a <default>-element inside the parameter, as can be seen in the example on top of the page for the second parameter (Probability). For number parameters the allowed range can also be specified by providing <min> and <max>-elements (although this is not checked in the current version of PriorsEditor). String parameters can normally take on any value, but they can also be restricted to a limited set of options:

   <parameter class="String" name="Size" type="regular" >   
     <option>Small</option>
     <option>Medium</option>
     <option>Large</option>
   </parameter>
The options are presented to the user which chooses among the allowed values. The value used for the parameter is normally the string between the option open- and close-elements (here Small, Medium or Large) but it is also possible to specify that a different value should be used. In the example below, the value "S" is used if the user selects "Small", "M" is used instead of "Medium" and "L" instead of "Large".

   <parameter class="String" name="Size" type="regular" >   
     <option value="S">Small</option>
     <option value="M">Medium</option>
     <option value="L">Large</option>
   </parameter>

Specifying the data format for complex parameters

Complex parameters (not Integers, Floats, Strings or Booleans) are passed to external programs through temporary files. In order to output these parameters to files, the data format to use must be specified with a <dataformat>-element inside the parameter. The name of the format must be given and the format might also require specification of additional format-specific <settings>.

   <parameter type="regular" name="Positional priors" class="NumericDataset" required="no" hidden="no">   
       <dataformat name="PSP">
           <setting name="Orientation" class="String">Direct</setting>
           <setting name="Motif width" class="Integer">8</setting>
       </dataformat>
   </parameter>

Data formats currently supported by PriorsEditor include:
A description of these data formats and their parameter settings can be found elsewhere.

Setting up the command-line

The command-line used to execute the external program can be defined up in two different ways. One way is to explicitly specify the command-line, using the <command>-element as described below. This method is the most powerful. However, programs that have very straightforward interfaces can do without the command-element.
If no <command>-element is specified, the command-line is build up by writing out the name of the executable program followed by all the parameters in the order that they appear in the XML-file. The values of "simple" parameter types, like numbers and strings are written directly to the command-line whereas complex types (such as large datasets) are written to temporary files and the filename is written to the command line. If a parameter has an associated switch then the switch is written out before the parameter itself. If the parameter is a boolean "flag", only the switch is output (or not, depending on the boolean value of the parameter). "Implicit" arguments are not written to the command line. Implicit arguments can be used when the value for a parameter is always the same, for instance if the external program always writes its output to a file named "output.txt". Arguments that are implicit should specify the (already known) filename instead of a switch.

If the "RandomFilter" program described at the top of this page is executed, and the user has chosen a region dataset to use for the first parameter and a value of e.g. "0.45" to use for the second parameter, the resulting command-line that is executed will look like this:

   C:\bioinformatics\randomfilter.exe -i <tempfile_1> -p 0.45 -o <tempfile_2>   

Before executing the command, however, the region dataset the user selected for the first regular parameter is output (in GFF-format) to a temporary file named tempfile_1. The third parameter also refers to a region dataset, but since this is a "result" parameter only the name of the file (randomly chosen for the occasion) is passed to the external program on the command-line. The external program is expected to write its output to this file (in GFF-format as specified in the XML-file) whose contents will later be read back by PriorsEditor after the program execution has finished.

The command element

If the program requires a more complex command-line than just the name of the program followed by the parameters in the order specified, the command-line can be specified explicitly with a <command>-element. For instance, if the RandomFilter program above was not a standalone executable, but rather a perl script, we might have to specify the command-line like this.

   <command>perl %PROGRAM {Region Track} {Probability} {Result}</command>   

Here, %PROGRAM is a special string which refers to the program itself (this was implicit when we didn't use the command-element). Other special strings that can be used include %APPDIR which refers to the directory where the program resides, and %WORKDIR which is the "working directory" used when executing the command. Parameters are referred to on the command line by placing the name of the parameter in braces. The command-line will parsed and these braces will be replaced by the actual value of the parameter (or a filename for complex parameters) possibly preceeded by a switch if one is specified. It is possible to execute multiple commands in succession by separating the commands by semicolons. This can be useful if there is a need to perform any pre- or post-processing (for instance to convert a non-standard output format to GFF which can be read by PriorsEditor).

An XML-configuration file should be designed to be usable irrespective of which operating system the program will eventually run on. However, references to specific files within a command-line might be tricky since different operating systems have different ways of representing file paths. Also, some operating systems might need to escape filenames containing spaces by enclosing them in quotes. PriorsEditor performs the necessary conversions automatically for temporary files and the %PROGRAM special string, but if you want to refer directly to other files within the command-line, you might have to explicitly state that this part of the string refers to a file and should be processed accordingly. There are two ways to inform PriorsEditor that you want to refer to a file, and both work by enclosing the filename in "special quotes". The first uses "dollar-brace" style, like this: ${filepath}$, and the other uses "dollar-quote-brace" style, like so: $'{filepath}'$. (Note that the closing parenthesis is the reverse of the opening parenthesis). The difference between these are really only apparent for programs that run on WINDOWS using CYGWIN. With the first style, WINDOWS-paths are converted to CYGWIN Unix-style paths and enclosed in quotes if they contain spaces. The latter style does not convert the paths but will enclose them in quotes if they contain spaces. Use the latter style to refer to programs that should be executed and the first style for other file references. For an example of usage of the latter style you can have a look at the XML-configuration file for Weeder.

Advanced options

It is possible for a parameter to take on the same value as another parameter by "linking" to this other parameter. This is accomplished by specifying a link argument containing the name of the target parameter. Note that parameters can only link to other parameters that have already been defined earlier in the XML-file and they can only link to parameters of the same class. Parameters that link to others should be "hidden", since their values should not be explicitly set by the user (only indirectly). Settings for data formats can also link to other parameters (but not other settings), and this is the only way a user can (indirectly) change values for data format settings (since information about data formats used for parameter passing are not usually revealed to the user).

   <parameter type="regular" name="Positional priors" class="NumericDataset" hidden="yes" link="name_of_target_parameter">   
       <argument type="valued option" switch="-P"/>
   </parameter>


Cleaning up

If a program creates any additional files or directories during its execution (besides the temporary files created to pass complex parameters), it is prudent to specify these so that PriorsEditor can perform the necessary clean up after the execution has finished. The <temporary>-element is used to specify the names of these temporary files (or directories). The special strings %WORKDIR and %APPDIR explained above can prefix the filenames if necessary.


  <program>
      ...
      ...
      ...
      <temporary filename="tempfile1" />   
      <temporary filename="%WORKDIR/tempfile2" />
  </program>