Map transform |
Map transforms apply to all properties of type Text and can be used to convert
specific values found in the database to other values for use in MotifLab (or vice versa).
The transform is specified as a comma-separated list of key=value pairs
where the key in each pair denotes the value of the property in MotifLab and the value-part
denotes the corresponding value as found in the database. For example, consider a text property named "RepeatType" which has the following transform "Transposon=LINE,Transposon=SINE,Transposon=LTR,Tandem Repeat=Minisatellite,Tandem Repeat=Microsatellite,Tandem Repeat=Short Tandem Repeat". This transform assumes that the database column corresponding to the RepeatType property can (at least) take on the following six values: LINE, SINE, LTR, Minisatellite, Minisatellite and Short Tandem Repeat. When the datasource encounters a value in the database which can be found on the right-hand side of a key=value pair in the list, it will assign the region the corresponding left-hand side value for the property. Note that the same key can be repeated multiple times in the list in order to map a larger set of values in the database onto a smaller set of values in MotifLab. In this case, for example, the six different types of repeats mentioned in the database will be grouped into two different types ("Transposon" or "Tandem Repeat") for the RepeatType property of the regions. Values encountered in the database that are not mentioned in the transform will not be converted but used as is. For the "strand" (orientation) property, MotifLab recognizes the values "direct", "1" or anything starting with "+" as referring to the direct strand, whereas "reverse" or anything starting with "-" would be taken as a reference to the reverse strand. If a database uses a different scheme for denoting strand orientation, e.g. "D" for direct strand and "R" for reverse strand, the map transform can be used to convert these to values recognized by MotifLab, like so: "direct=D,reverse=R". Although chromosomes in MotifLab are often presented on the form "chr12", they are represented internally using only the chromosome number (or letter) without the "chr" prefix. If a database uses "chr" prefixes for chromosomes, these values must be converted with a map transform on the form "1=chr1,2=chr2,3=chr3, ...etc./" (remember that the left-hand side of each pair is the value used in MotifLab and the right-hand side is the corresponding value in the database). Since this is a very common scenario, MotifLab has a special short-hand notation for this called the "chr-prefix" transform (see below). |
Chr-prefix |
This transform applies only to the "chromosome" property. MotifLab assumes that the chromosome column in the SQL-table refers to e.g. "chromosome 1" simply as "1" and "chromosome X" as "X", etc. However, some databases might include a "chr"-prefix for all chromosomes (e.g. "chr1" and "chrX"). If the database uses a "chr"-prefix, MotifLab must be made aware of this in advance in order to process the data correctly. This can be done by simply typing any value that is not a valid map transform into the Transform-column of the "chromosome" property. Any such non-empty transform-value will be interpreted to mean that the database uses "chr"-prefixes. |
Coordinate offset |
This transform applies to the "start" and "end" properties. In the genomic coordinate system used by MotifLab, the first nucleotide on a chromosome is said to correspond to position 1 (so-called "1-indexed" coordinates) and the location of a sequence region is referred to as "chrN:start-end" where the first nucleotide in the region is at position "start" and the last nucleotide is at position "end". Hence, a region corresponding to the first 8 nucleotides of chromosome 2 is referred to as "chr2:1-8". This way of referring to sequence regions is used by the GFF dataformat and many other dataformats. However, some dataformats, such as e.g. the popular BED-format, rely on different coordinate systems. According to the BED-format (which is "0-indexed"), the first nucleotide of a chromosome is said to be at position 0 and the end-coordinate refers to the first position after the region rather than the last position within the region (i.e. the start-coordinate is inclusive but the end-coordinate is exclusive). The first 8 nucleotides of chromosome 2 would hence be referred to as "chr2:0-8" in BED-format. If the coordinates in your database are not in an "inclusive 1-indexed" format, they have to be converted to the coordinate system used by MotifLab using a "coordinate offset" transform. This can be accomplished simply by typing a positive or negative number into the Transform-column of the "start" and/or "end" properties. This value will then be added to the value which is obtained from the database. If your database uses BED-coordinates you can convert these simply by entering the offset-value "1" into the Transform-column for the "start" property, and this will convert the 0-indexed coordinate to an 1-indexed coordinate (note that the end-coordinate does not have to be transformed since an "exclusive 0-indexed" coordinate would be the same as an "inclusive 1-indexed" coordinate). |