asivache
9d56355abe
bug fixed when reference name was passed as a string instead of actual reference bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@416 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 21:46:27 +00:00
kiran
222c4e5865
Commented out some debugging lines
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@415 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:15:41 +00:00
kiran
49d76014d1
Commented out a debugging line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@414 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:15:11 +00:00
kiran
b39e584787
Primary or secondary bases that got a quality score of literally zero led to unfortunate infinities. Added an epsilon (1e-5) to every prob.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@413 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:04:49 +00:00
jmaguire
d28e9f9b98
search over q's for finding argmax[q] p(D|q)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@412 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:15:45 +00:00
aaron
96248cdec4
Added some output to all the classes, including build in runtime analysis
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@411 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:14:53 +00:00
ebanks
647827b18c
Transitioned indel code to use GATK and Walkers
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@410 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:14:15 +00:00
ebanks
b363eedd2c
Deal with screwy reads by changing logic to determine whether we are
...
past the last interval
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@409 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:13:16 +00:00
hanna
0629f79049
Moved fasta support files into their own package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@408 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 18:13:23 +00:00
hanna
7a4a5a17c0
Made sequence index compatible with Aaron's junit changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@407 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 17:53:20 +00:00
aaron
88ebf1a05b
Fied some documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@406 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 17:41:38 +00:00
hanna
186c799ffc
Class to read an .fai file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@405 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 17:37:18 +00:00
aaron
704f1bd634
It helps if I check the new base class in with my changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@404 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 17:18:16 +00:00
aaron
4b3578e1de
Added the base test case, fixed the rest of the test cases to follow suit. Added more verbose output to ant for junit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@403 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 17:11:38 +00:00
jmaguire
961dbbd4ef
Now output bases and qhat and qstar into the GFF.
...
Quals coming soon (four-base)
QHAT : Most likely alt allele freq (unconstrained by number of chromosomes).
QSTAR : Most likely alt allele freq (constrained by number of chromosomes).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@402 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 15:23:00 +00:00
kiran
dafdff1974
All bases are now indexed as A:0, C:1, G:2, T:3.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@401 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 14:49:43 +00:00
kiran
40ea22eb17
Added some methods to return the cross-talk partner base of a given base or base index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@400 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 14:49:12 +00:00
aaron
eb4b4a053b
A bunch of updates to the SAM/BAM data source, along with test cases for the merging of multiple files (it works!).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@399 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 14:19:20 +00:00
kiran
30121534ed
Outputs the secondary bases and quals (if available) in verbose mode. Prefixed with the tag 'SQ='.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@398 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 13:58:28 +00:00
kiran
998fad76c6
Some utility methods for creating pileups of secondary bases and secondary quals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@397 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 13:57:54 +00:00
depristo
8b2c2e677b
Uses the cleaner new GenomeLoc(read) syntax
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@396 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:55:43 +00:00
depristo
1cee7948ab
Added lots of assertions to check for problems.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@395 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:55:19 +00:00
depristo
794360c410
Added verbose option to show mapping qualities and base qualities as ints!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@394 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:54:48 +00:00
depristo
cc75e8f712
Uses the cleaner new GenomeLoc(read) syntax
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@393 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:53:58 +00:00
depristo
11377ef390
Added lots of assertions to check for problems. The current GenomeLoc needs to be cleaned up and refactored but at least it runs. We need unit tests ASAP
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@392 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:53:08 +00:00
depristo
bb666ce392
Added mappingQualPileup function for use in the verbose mode of Pileup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@391 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:51:26 +00:00
asivache
bc43c0eefc
there are really cases when we can not merge until we get just two pilesant now we do not crash in those cases but print a warning and just show the resulting n piles even when n>2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@390 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:45:47 +00:00
asivache
8e6093d5a5
remove mom/dad/kid cmd line arguments that were needed for mendelian walker; now we can use generic track binding!!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@389 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:45:34 +00:00
kiran
f838a5e511
Changed some double comparisons of the form a == b to abs(a - b) <= precision. Now we shouldn't be passing or failing some if conditions due to floating-point precision.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@388 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 20:05:46 +00:00
aaron
887adcfc7f
Some minor fixes to the last check-in
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@387 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:24:51 +00:00
aaron
f2d0d73309
removed old shard strategy code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@386 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:13:45 +00:00
aaron
dd604799dc
Added some new code for shard support over reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@385 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:11:43 +00:00
asivache
d44c30154a
added MAX_READ_LENGTH - now we can ignore long reads (454?); a bad idea in general, but the performance hit is to hard to take, at least for preliminary testing runs...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@384 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 16:53:12 +00:00
hanna
e91a429c58
A class to print out as much context about the given locus site as is possible. Useful for testing traversal engines -- run old and new code across a given region and diff the output to make sure they have the same context.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@383 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:29:55 +00:00
jmaguire
6652f13a17
more verbose gff output!
...
EVEN MORE verbosity to come!
Tremble in anticipation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@382 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:21:23 +00:00
hanna
cf929a8275
Get rid of test case's dependence on transient methods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@381 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:16:42 +00:00
jmaguire
6e180ed44e
Unified caller is go.
...
AlleleFrequencyWalker and related classes work equally well for 2 or 200 chromosomes.
Single Sample Calling:
Allele Frequency Metrics (LOD >= 5)
-------------------------------------------------
Total loci : 171575
Total called with confidence : 168615 (98.27%)
Number of variants : 111 (0.07%) (1/1519)
Fraction of variant sites in dbSNP : 87.39%
-------------------------------------------------
Hapmap metrics are coming up all zero. Will fix.
Pooled Calling:
AAF r-squared after EM is 0.99.
AAF r-squared after EM for alleles < 20% (in pools of ~100-200 chromosomes) is 0.95 (0.75 before EM)
Still not using fractional genotype counts in EM. That should improve r-squared for low frequency alleles.
Chores still outstanding:
- make a real pooled caller walker (as opposed to my experiment framework).
- add fractional genotype counts to EM cycle.
- add pool metrics to the metrics class? *shrug* we don't really have truth outside of a contrived experiment...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@380 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 12:29:51 +00:00
jmaguire
f39092526d
Added function RandomSubset
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@379 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 12:14:53 +00:00
asivache
b4136b6d6e
a few tweaks to make it more robust: ignore reads with cigars containing anything but I,D,M; don't set up contig ordering manually, rely upon reference sequence and its dictionary; don't die if a record does not have NM tag, but faal back to direct counting instead; now requires reference as a cmdline arg
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@378 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 04:49:19 +00:00
kiran
32e000bbfe
Added MatchSQTagToStrand jar target.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@377 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:50:36 +00:00
kiran
756e6c61d8
Strictness args are presented as lowercase in the help, but only accepted if uppercase. Changed help to list the valid arguments in uppercase.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@376 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:50:19 +00:00
kiran
c51f51f255
Make sure we always write at least 1000 points per base in each cycle's scatterplot. Print the disagreement rate between Bustard and FourBaseRecaller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@375 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:49:41 +00:00
kiran
1fb16d54e0
For SAM files that have no alignments and when no reference is specified, contigInfo.getSequence() is null, causing an error when getSequenceName() is called on the resulting null pointer. Check for null instead and return that instead of barfing here.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@374 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:48:21 +00:00
kiran
5e96ab6161
Helpful functions for converting a base (char) to a base index (A:0, C:1, G:2, T:3, alphabetical and consistent with Illumina conventions to minimize confusion.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@373 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:46:23 +00:00
kiran
35fc002d5d
Debugging information is now written in such a way to make it easier to import into R.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@372 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:45:33 +00:00
kiran
6ee4fe5a20
Fixed a Bustard/Firecrest file synchronization bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@371 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:44:07 +00:00
kiran
817278be46
If a SAMRecord is on the negative strand, reverse complement the SQ tag.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@370 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:42:24 +00:00
kiran
1d5a22cacf
Extracts a Fastq file and the SQ tags to a separate file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@369 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:41:44 +00:00
kiran
e410c005c0
A debugging tool to ensure the SQ tag in a four-prob SAM file matches the SAMRecord strand orientation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@368 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:40:42 +00:00
hanna
9c37400c4f
Added basic performance testing so I can make sure concurrent access doesn't slow down overall fasta access.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@367 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 18:05:56 +00:00