*********************************************************************************
*                                                                               *                          
*                        PriorsEditor:   README                                 *
*                                                                               *
*********************************************************************************


To start PriorsEditor with a graphical user interface you can use the following command (or run "runPriorsEditor.bat")

javaw -Xms256M -Xmx256M -splash:PriorsEditorSplash.png -jar PriorsEditor.jar


(The numbers "256M" specify how much memory to use for PriorsEditor and can be increased if necessary)


PriorsEditor can also be run from the command line without starting up the GUI.
The command to run PriorsEditor in this way is:

java -Xms256M -Xmx256M -classpath PriorsEditor.jar priorseditor.engine.PriorsEditor


In addition you need to provide a protocol script (option -p) and a file describing which sequences to analyze (option -s)
You can also specify a log-file with the "-l <filename>" option. All messages will then be written to this file instead of STDERR.

To obtain output from PriorsEditor when run from the command line, the protocol script has to contain "output" commands for all the data you want to keep. 
For instance, if the protocol script contains a line: "results = output BindingSites in GFF format",
the data from the BindingSites track will be saved to the file "results.gff" 
(the filename is the same as the name of the output data object and the file-suffix is set depending on the data format used)  


Example of full command:

java -Xms256M -Xmx256M -classpath PriorsEditor.jar priorseditor.engine.PriorsEditor -p testprotocol.txt -s seqInfo.txt


The file with sequence information (-s option) can either contain the sequences themselves in FASTA-format or contain lines
describing the location of the sequences. Note that if you use FASTA-files as input, the DNA sequence track will automatically 
be named "DNA". Also, in order to use additional feature data loaded from databases, the header for each FASTA-sequence must 
specify the origin of the sequence like so:

>NTNG1|chr1:107482152-107484351|Direct strand|9606:hg18
AGTGTGAAGT....

The fields in the header are separated by pipes ("|"). 
[1] The name of the sequence. 
[2] Location within the genome: chr:start-end
[3] The strand that the sequence is taken from (Direct strand or Reverse strand)
[4] Organism (NCBI Taxonomy ID) and genome build (separated by colon).


If the sequence file does not contain FASTA sequencs, it should contain lines describing the location of the sequences.
These lines could either specify the sequence regions exactly or use locations relative to known genes.
Each line must contain either 8 or 6 fields separated by tabs (you can mix and match these formats within the file). 


The 8-field format 

[1] Name of gene/sequence (this can be an arbitrary name)
[2] Genome build
[3] Chromosome
[4] Start of sequence region (genomic coordinate)
[5] End of sequence region  (genomic coordinate)
[6] Transcription start site for the gene (genomic coordinate). This is optional and can be set to NULL
[7] Transcription end site for the gene (genomic coordinate). This is optional and can be set to NULL
[8] The strand the gene is taken from (DIRECT, +1, +) or (REVERSE, -1, -)

Examples
--------
UNG	hg18	12	108017798	108019997	108019798	NULL	DIRECT
BRCA2	hg18	13	31785617	31787816	31787617	NULL	+


The 6-field format specifies the locations relative known genes

[1] Gene identifier (depending on format)
[2] Name of gene identifier format (eg. "Ensembl Gene", "Entrez Gene" or "HGNC Symbol")
[3] Genome build
[4] Start of sequence region relative to anchor point (see field #6)
[5] End of sequence region relative to anchor point (see field #6)
[6] Anchor point. This could either be "TSS" or "TES"

Examples
--------
NTNG1	HGNC Symbol	hg18	-2000	200	TSS
56475	Entrez Gene	hg18	-2000	200	TSS
ENSG00000111249	Ensembl Gene	hg18	-2000	200	TSS
ENSG00000187664	Ensembl Gene	hg18	-2000	200	TSS
ENSG00000196358	Ensembl Gene	hg18	-2000	200	TSS