********************************************************************************* * * * PriorsEditor: README * * * ********************************************************************************* To start PriorsEditor with a graphical user interface you can use the following command (or run "runPriorsEditor.bat") javaw -Xms256M -Xmx256M -splash:PriorsEditorSplash.png -jar PriorsEditor.jar (The numbers "256M" specify how much memory to use for PriorsEditor and can be increased if necessary) PriorsEditor can also be run from the command line without starting up the GUI. The command to run PriorsEditor in this way is: java -Xms256M -Xmx256M -classpath PriorsEditor.jar priorseditor.engine.PriorsEditor In addition you need to provide a protocol script (option -p) and a file describing which sequences to analyze (option -s) You can also specify a log-file with the "-l " option. All messages will then be written to this file instead of STDERR. To obtain output from PriorsEditor when run from the command line, the protocol script has to contain "output" commands for all the data you want to keep. For instance, if the protocol script contains a line: "results = output BindingSites in GFF format", the data from the BindingSites track will be saved to the file "results.gff" (the filename is the same as the name of the output data object and the file-suffix is set depending on the data format used) Example of full command: java -Xms256M -Xmx256M -classpath PriorsEditor.jar priorseditor.engine.PriorsEditor -p testprotocol.txt -s seqInfo.txt The file with sequence information (-s option) can either contain the sequences themselves in FASTA-format or contain lines describing the location of the sequences. Note that if you use FASTA-files as input, the DNA sequence track will automatically be named "DNA". Also, in order to use additional feature data loaded from databases, the header for each FASTA-sequence must specify the origin of the sequence like so: >NTNG1|chr1:107482152-107484351|Direct strand|9606:hg18 AGTGTGAAGT.... The fields in the header are separated by pipes ("|"). [1] The name of the sequence. [2] Location within the genome: chr:start-end [3] The strand that the sequence is taken from (Direct strand or Reverse strand) [4] Organism (NCBI Taxonomy ID) and genome build (separated by colon). If the sequence file does not contain FASTA sequencs, it should contain lines describing the location of the sequences. These lines could either specify the sequence regions exactly or use locations relative to known genes. Each line must contain either 8 or 6 fields separated by tabs (you can mix and match these formats within the file). The 8-field format [1] Name of gene/sequence (this can be an arbitrary name) [2] Genome build [3] Chromosome [4] Start of sequence region (genomic coordinate) [5] End of sequence region (genomic coordinate) [6] Transcription start site for the gene (genomic coordinate). This is optional and can be set to NULL [7] Transcription end site for the gene (genomic coordinate). This is optional and can be set to NULL [8] The strand the gene is taken from (DIRECT, +1, +) or (REVERSE, -1, -) Examples -------- UNG hg18 12 108017798 108019997 108019798 NULL DIRECT BRCA2 hg18 13 31785617 31787816 31787617 NULL + The 6-field format specifies the locations relative known genes [1] Gene identifier (depending on format) [2] Name of gene identifier format (eg. "Ensembl Gene", "Entrez Gene" or "HGNC Symbol") [3] Genome build [4] Start of sequence region relative to anchor point (see field #6) [5] End of sequence region relative to anchor point (see field #6) [6] Anchor point. This could either be "TSS" or "TES" Examples -------- NTNG1 HGNC Symbol hg18 -2000 200 TSS 56475 Entrez Gene hg18 -2000 200 TSS ENSG00000111249 Ensembl Gene hg18 -2000 200 TSS ENSG00000187664 Ensembl Gene hg18 -2000 200 TSS ENSG00000196358 Ensembl Gene hg18 -2000 200 TSS