andrewk
bdb34fcf38
Updated integration tests for VariantEval. Hooray for IT!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1866 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 20:00:29 +00:00
aaron
41a95cb3f0
fixing unified genotyper test for change: VCF output now emits no calls as ./.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1865 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 19:38:58 +00:00
hanna
85a4fbc256
Bumping version of Picard for firehose compatibility.
...
Integration tests were validated against svn rev 1861, before the wonder
twins committed their changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1864 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 19:38:56 +00:00
aaron
8aacc43203
VCF output now emits no calls as ./.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1863 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 18:51:31 +00:00
andrewk
d1a4cd2f73
Added ValidationData analysis type to VariantEvalWalker; this eval takes a GFF file with validated truth data positions (bound to "validation")and calculates the accuracy of the genotype calls bound to "eval".
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1862 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 15:39:08 +00:00
ebanks
07b134a124
Added some integration tests for multiple samples
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1861 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 15:22:10 +00:00
ebanks
418e007ca6
A cleaner interface: now everyone can use UG's initialize method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1860 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 14:09:16 +00:00
aaron
96972c3a5c
a fix for a bug Eric found: if your first call contains fewer samples than calls at other loci, your VCFHeader got setup incorrectly.
...
Also moved a buch of Lists over to Sets for consistancy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1859 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 04:57:50 +00:00
aaron
a69ea9b57c
Cleaning up the VCF code, adding lots of tests for a variety of edge cases. Two issues are still outstanding: updating the no call string with the standard 1000g decided on today, and fixing Eric's issue where not all the VCF sample names are present initially.
...
also: their, I hope your happy Eric, from now on I'll try not to flout my awesomest grammer in the future accept when I need to illicit a strong response :-)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1858 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 04:11:34 +00:00
ebanks
b82c3b6040
Better error output (and fixed spelling mistakes)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1857 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 01:01:45 +00:00
ebanks
993c567bd8
I had to remove some of my more agressive optimizations, as they were causing us to get slightly different results as MSG. Results in only small cost to running time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1856 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 00:59:32 +00:00
asivache
7d7ff09f54
throw an exception if read has no associated read group
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1855 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 18:11:32 +00:00
chartl
b9544d3f89
Output formatting change (very slight)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1854 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 16:47:29 +00:00
hanna
839c5d66bc
Read uints directly into longs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1853 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 16:15:11 +00:00
hanna
ce38fa7c81
Breaking the signed int glass ceiling; stage 1: convert critical ints to longs. Code cleanup and documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1852 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 15:28:56 +00:00
kcibul
79993be46c
changed blank gene name to UNKNOWN
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1851 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 13:47:00 +00:00
depristo
0c2016c19a
Improved error messages -- now easier to read, points to the GATK Error Messages wiki, and avoids double printing of stack traces
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1850 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 12:07:44 +00:00
aaron
a9094c835c
clean-up and fixes to the VCF input
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1849 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 04:53:59 +00:00
ebanks
a32470cea1
Deal with the fact that walkers can call UG's init/map functions directly.
...
We need to filter contexts in that case since the calling walkers don't get UG's traversal-level filters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1848 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 02:31:45 +00:00
hanna
8dca236958
Base-packed reader cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1847 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 01:26:23 +00:00
hanna
316b30ee56
On the road to human: make sure the suffix array will fit in a Java array.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1846 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 21:45:35 +00:00
ebanks
e740e7a7ce
Because walkers call UG's map function, we need to move the actual writing out
...
to UG's reduce function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1845 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 20:49:26 +00:00
kcibul
825e6c7a4d
added calculation for bases over 2x,10x,20x,30x plus gene name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1844 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 20:32:26 +00:00
aaron
727b69fce0
catch null output destinations earlier
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1843 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 20:07:15 +00:00
chartl
1f66738c8e
Fix a hashing function bug. Ignore reads with non-reference bases in the pileup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1842 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:41:26 +00:00
hanna
72c34f11dd
Bug fixing for BWA output formats.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1841 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:32:22 +00:00
aaron
60183229ab
the oldest java mistake in the book...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1840 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:32:13 +00:00
ebanks
52d2e0ca07
All walkers now use read.getReadGroup()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1839 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:27:40 +00:00
chartl
0a09fa4d5c
Rename to distinguish this transition table calculator from the scala version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1838 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:52:21 +00:00
chartl
1d055011bd
Getting rid of this so I can rename it without the world blowing up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1837 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:45:11 +00:00
aaron
eb90e5c4d7
changes to VCF output, and updated MD5's in the integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1836 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:42:48 +00:00
ebanks
89771fef05
-Use read.getReadGroup()
...
-Add another filter for read groups for Chris
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1835 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:08:32 +00:00
ebanks
311ab8da5a
A helper class to create the masks for the sequenom design maker.
...
This project is now officially done.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1834 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:28:51 +00:00
hanna
3553fc9ec0
Preparing for human -- support bwa output files directly rather than relying on a custom fixed sa interval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1833 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:17:46 +00:00
ebanks
d89bc2c796
This class no longer outputs in sequenom format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1832 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:16:36 +00:00
ebanks
0c95d6906f
Merge both versions of the Sequenom assay design maker: use Jared's base code and add in indels. [Jared, this still emits the same output for SNPs as your original version)
...
Remove all sequenom stuff from the FastaAlternateReferenceMaker so it can just concentrate on making alternate references...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1831 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:11:45 +00:00
ebanks
49af5269e5
Jared: feel free to change or revert, but until we move over to UG version...
...
Only print out positions with at least one non-ref call
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1830 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:08:57 +00:00
chartl
f5a2e6dd50
Fix!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1829 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 16:15:20 +00:00
ebanks
f2886d88e0
We now emit genotype calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1828 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 02:49:56 +00:00
ebanks
1b214c0de5
Fixed logic: throw exception if contigs are NOT equal
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1827 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 02:48:44 +00:00
ebanks
aeca14d052
On our side of 5CC, we spell multi M-U-L-T-I.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1826 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 01:41:25 +00:00
ebanks
c9c8fd1fef
Added the discovery LOD score to the meta data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1825 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 01:24:06 +00:00
ebanks
0c06bf9dbc
Explicitly set output to GELI now that default is VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1824 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 22:12:03 +00:00
hanna
a76fac4687
Cleanup existing speedups. Minor performance improvements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1823 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 21:51:18 +00:00
hanna
837ae1d33a
Optimization: from 22k reads/min - 30k reads/min.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1822 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:59:29 +00:00
ebanks
96b8499a31
Remodeled version of the UnifiedGenotyper.
...
We currently get identical lods and slods as MultiSampleCaller (except slods for ref calls, as I discussed with Jared) and are a bit faster in my few test cases. Single-sample mode still emulates SSG.
The remaining to do items:
1. more testing still needed
2. we currently only output lods/slods, but I need to emit actual calls
3. stubs are in place for Mark's proposed version of the EM calculation and now I need to add the actual code.
More check-ins coming soon...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1821 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:27:01 +00:00
ebanks
b28446acac
Multi-sample calls now have associated meta-data (SLOD, allele freq), which wil
...
l soon actually be used...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1820 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:08:43 +00:00
hanna
db642fd08b
Optimization: from 10k reads/sec - 22k reads/sec..
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1819 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 18:07:15 +00:00
aaron
77499e35ac
fixes for GSA-199: Need easier way to write binary outputs to standard output. GLF and VCF now have stream constructors, and can get dumped to standard out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1818 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 15:50:20 +00:00
hanna
f37564e63a
Our BWA is now looking at roughly the same number of candidate alignments as BWA/C. Performance is now at 11k reads / min, still a long way from BWA/C.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1817 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 15:50:04 +00:00