|
User DocumentationThe documentation is currently incomplete, but will be done someday soon... For the moment being, a very brief description of how to use the program is below. What is RepeatMap?The RepeatMap program is able to quickly calculate the number of times a kmer (i.e. a sequence of length k) occurs in a large sequence. For example, the D. melanogaster genome contains the sequence "ACAGTCGATCGA" 24 times (14 on forward strand, 10 on reverse strand). We're able to perform hundreds of thousands of such lookups in seconds. We currently provide this service for several genomes (see program for current availability) and provide all the tools necessary to add more genomes to the list. Sample SessionIf you'd like to test the program, you can click the button "Load Test Settings" and then look at the "Repeat Graph" or "Get Probes" or both. A sample session is
Notes on UseThe program is still in beta version, meaning it's usable but prone to behave unexpectedly. If you have a problem with the program, don't hesitate to contact the maintainers. Common problems and fixes
Program DocumentationThe program is documented in the publications:-------. We provide a brief summary of the results in the paper. To create the dictionary, we take a sample across all 20mers and estimate a distribution function across 2^K bins, where K is a number in the range of 5-12 depending on the size of the genome. We then go through all 20mers in the genome and put them into one of the bins. We then sort each of the bins. After each bin is sorted, we just go through the bins and read how many times each 20mer is repeated. We then write out the 20mers and the counts to a file. When the server is started, it reads in the 20mer and counts file. It then creates a very large table with all the counts. The counts can be determined by a binary search. To determine counts on the forward strand, we just search for the DNA entry (after converting the byte sequence into a double). To determine backward counts, we search for the reverse compliment of the sequence of interest. Let's say our genome is: TATTGGACTTACGGCATTAC 3'--------------------5' Reverse strand 5'--------------------3' Forward strand ATAACCTGAATGCCGTAATG We get the sequence ATGCATT. We're interested in 2mers. The forward counts: ATGCATT 331030- The reverse counts (look 5'->3' on the reverse strand): ATGCATT 301332- The reverse counts via the reverse compliment on the forward strand (look 5'->3' on the forward strand): AATGCAT 233103- Leading to an output file of 5' 3' A 3 - T T 3 3 A G 1 0 C C 0 1 G A 3 3 T T 0 3 A T - 2 A 3' 5'
|
This page last modified Tuesday, 21-Oct-2008 16:10:43 UTC