ebanks
35382468ee
Better error checking/output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4676 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 16:36:34 +00:00
depristo
7a3a464959
Finally, the logic is right
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4674 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 14:02:09 +00:00
depristo
8d66637fc2
Bug fix for VariantsToTable with filtered records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4673 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 13:49:16 +00:00
depristo
d76b87d6e3
Useful debug file output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4672 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 13:36:52 +00:00
ebanks
28142408ff
Refactoring so that all counting in UGv2 is done on the filtered context. In particular, tests for empty pileups and too many spanning deletions now use the correct counts. Also, -all_bases mode now trumps all; this one is for you, chartl.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4671 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 05:01:12 +00:00
ebanks
c7229abbf7
Get rid of 'meaningless and random values' that prevent Sendu from merging PG lines. I have to admit that he did have a good point there.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4670 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 03:35:12 +00:00
delangel
cb1e8ad43a
Temp bug fix for indel genotyper: if there are two or more variant contexts at a site, just choose the first one containing an indel and genotype that. There might be cases where IGv2 emits 2 indel variant contexts in at the same ref location which made us fail there. A better solution will be to form underlying haplotypes supported by reads and compute likelihoods of that.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4667 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-14 00:21:54 +00:00
depristo
82f9327b5e
Throw the right exception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4666 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 22:18:42 +00:00
depristo
44d0cb6cde
New version of cutting routines for VQSR. Old code removed. Working unit tests. Best practice with testng integration test (everyone look at it). Walker test now allows you to not specify no. input files, if it can infer input counts from MD5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4664 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 16:19:56 +00:00
kshakir
673fa841a4
Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader.
...
Removed obsolete usages of PackageUtils with updated PluginManager.
Ported Queue interval utilities written in scala over to Sting's java IntervalUtils.
Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles.
Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test).
While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1".
Upgraded to scala 2.8.1 and updated calls to deprecated functions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:14:28 +00:00
depristo
988da428ae
Bug fix for old style tranches file. ApplyVariantCuts moved over, and passes integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4657 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 14:38:26 +00:00
depristo
c5f8c4dd0d
VariantEval test for tranches file, plus cutting over VE to use the generic Tranches framework
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4656 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 13:52:40 +00:00
ebanks
69de3e51bf
Better precision for the calculated AF value. Now looks at the total number of samples to determine how much precision is necessary. Also, changing default min BQ used for calling in UGv2 to Q17.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4655 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 08:31:40 +00:00
depristo
ec83a4b765
Initial commit, without any tool changes, of a new infrastructure for determining tranches. This new version walker up from the lowest quality snps and determines Ti/Tv. This is marginally more stable than moving in the other direction when there are few novel variants (exomes). Can make a substantial difference in the size of the call set (10-20%). I'll hook it into the main system now. Includes an new class Tranche, isolated read/writing utilities that are now testing in TestVariantRecalibrator, which should be moved to UnitTest as soon as I can figure out how to do this on my mac.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4654 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 23:52:49 +00:00
depristo
ed6396ed43
No longer getting the inet, it seems to potentially hang the JVM
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4653 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 23:49:42 +00:00
ebanks
2f6666a988
Correcting traversal statistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4652 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 22:46:58 +00:00
depristo
dbde721dd0
Bug fix for filtered records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4651 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 18:54:51 +00:00
aaron
698e5cf345
for GATK style codecs, make sure we fill in their GenomeLocParser from the RMDIndexer
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4650 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 18:44:15 +00:00
delangel
2f3be24a00
Improvement in exact allele frequency calculation model (still under test, but this is definitely better than what I had before). Instead of approximating log(10^x+10^y) as max(x,y), approximate full Jacobian formula max(x,y)+log(1+10^-abs(x-y)) with static lookup table for the second term.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4647 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 01:22:35 +00:00
asivache
2e0296fef9
NWayOut logic slightly changed: 1) results.list file is gone; 2) now with -nWayOut one can specify either a) suffix to attach to every output file (i.e. cleaned reads from inputK.bam will be sent to inputK.suffix.bam) or b) *.map tab-separated file that must list <input_name> <output_name> mappings, one per line, for every input file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4645 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 20:32:16 +00:00
asivache
a1adfb91ce
And now @Hidden tags are really in place :-/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4644 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 20:28:40 +00:00
asivache
68ce55148e
(pseudo-)genotyping functionality added: force-emits calls (including REF) at specified locations. Currently @Hidden for testing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4643 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 20:25:40 +00:00
hanna
8e36a07bea
Convert GenomeLocParser into an instance variable. This change is required
...
for anything that needs to be simultaneously aware of multiple references, eg
Queue's interval sharding code, liftover support, distributed GATK etc.
GenomeLocParser instances must now be used to create/parse GenomeLocs.
GenomeLocParser instances are available in walkers by calling either
-getToolkit().getGenomeLocParser()
or
-refContext.getGenomeLocParser()
This is an intermediate change; GenomeLocParser will eventually be merged
with the reference, but we're not clear exactly how to do that yet. This
will become clearer when contig aliasing is implemented.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4642 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 17:59:50 +00:00
depristo
4759fdd2ac
V1 of read and variant simulator and assessor. SimulateReadsForVariants generates BAM and VCF with given combinations of variant and read properties. AssessSimulatedPerformance produces a table suitable for analysis in R
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4637 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-08 21:01:33 +00:00
aaron
97db593efb
making my last commit message actually true
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4636 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 18:26:23 +00:00
aaron
be499fc986
making the reference optional (the GATK will set it on the first run if it's not included), and setting the seq index if they do supply it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4635 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 18:15:31 +00:00
ebanks
e05af54f3e
Found the cause of 80% of our non-called FNs: an excess of filtered bases were causing us to choose the wrong alternate allele. More details to dev team.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4634 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 03:39:57 +00:00
aaron
2a8c97a4a7
better error catching, as well as allowing for default index naming, <filename>.idx
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4633 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-06 19:12:19 +00:00
aaron
cb2e26a004
by request, an indexer tool to create Tribble style indexes outside of the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4632 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-06 18:59:06 +00:00
depristo
bbb890dd6c
Bug fix for variants in VCF header fetching to avoid null pointer when a VariantContext tribble codec doesn't have a header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4630 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-05 12:43:25 +00:00
ebanks
c9dbd8f80a
Bug fix for Tim: all point events must be treated equally
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4629 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-05 03:42:51 +00:00
rpoplin
913db5d1ab
Unfortunately when annotating sites with the UG the -G None option was wiping out the single annotations added by -A options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4625 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-04 19:27:23 +00:00
ebanks
816c86776e
Walker description was wrong and it was bothering me
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4624 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-04 02:17:09 +00:00
ebanks
87f6738d4c
Deprecated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4623 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-04 02:07:40 +00:00
chartl
42e9987e69
Bug fix to GenotypeConcordance. AC metrics get instantiated based on number of eval samples; if Comp has more samples, we can see AC indeces outside the bounds of the array.
...
Bug fix to LiftoverVariants - no barfing at reference sites.
AlleleFrequencyComparison - local changes added to make sure parsing works properly
Added HammingDistance annotation. Mostly useless. But only mostly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4622 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 19:23:03 +00:00
fromer
3d27defe93
Fixed output stats (percentage denominator)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4621 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 18:47:06 +00:00
ebanks
4e109f58bf
In preparation for Ryan's jumping into SLOD: getting rid of bad hack to ensure P(AF=i) is calculated in the strand-specific cases. With Mark's recent changes this is no longer necessary and just makes the code slower.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4620 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 03:44:59 +00:00
fromer
22d64f77ff
Added hidden --outputMultipleBaseCountsFile option to detect cases where a single read has more than one base at the same position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4619 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 03:22:48 +00:00
fromer
a885ecf046
When merging MNPs, the phased flag and the phase quality (PQ) are determined simultaneously
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4613 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-02 14:44:26 +00:00
hanna
861ee3e37a
Changing testing framework from junit -> testng, for its enhanced configurability.
...
Initial test to see how Bamboo will respond. More detailed email to follow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4609 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 21:31:44 +00:00
asivache
fe3f78e1d3
make it full (absolute) path for the file names recorded in results.list
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4608 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 20:53:51 +00:00
asivache
2ac5e55130
typo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4607 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 20:38:02 +00:00
asivache
0e6dd38936
In n-way-out mode, added printing names of all the output files into 'results.list' file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4606 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 20:37:38 +00:00
fromer
64599d1074
Added debugging message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4605 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 19:51:42 +00:00
fromer
639ecdc931
Noted in comment that using a single sample in MergePhasedSegregatingAlternateAllelesVCFWriter does NOT update any of the INFO fields, though this could be changed in the future...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4604 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 19:02:52 +00:00
fromer
8439f0aa61
Check for VCFConstants.MISSING_VALUE_v4 when retrieving INFO fields and consider such values as non-existent
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4603 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 17:51:35 +00:00
asivache
aadd230636
N-Way-Out is back. Now uses SAMReadID to identify each read's source bam, so should be reliable. Interface is sort of ugly fo now: to generate output file names, .bam is stripped from input file names, then the value of -nWayOut argument is pasted on (and all the output files are written into the current dir).
...
Unrelated change: in the sorted-target mode (when we read sorted target intervals one by on from a file), one can now specify multiple semicolon-separated interval files (all must be sorted). Not hugely useful probably, but makes --targetIntervals always process its values in exactly the same way, so we are consistent (it has been already taking ;-separated args in unsorted mode)
NwayIntervalMergingIterator: reads in multiple sorted GenomeLoc input streams (iterators) and presents them as a single sorted and merged stream
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4602 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 16:06:51 +00:00
depristo
23cb399a88
Reasonable first pass at a correct SB calculation. Simple utilities to support it. VariantsToTable no longer prints filtered sites by default. New non-standard variant eval module to print comp sites not present in eval (FN finder)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4601 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-31 12:41:52 +00:00
delangel
30fae5cf18
Major redo of exact AF computation for UnifiedGenotyperV2. Fact of life is, there's no way we can compute an exact QUAL field and keep performing the AF computation in linear probability space. In good sites with lots of samples, the ratio of Pr(AC=K*|D) to Pr(AC=0|D) can be 10^1500 or some ridiculous large number like that, which no double can represent. So, we abandon probablity space and work now in log likelihood space, which has several major repercussions:
...
a) Sites were numerically well behaved now, but another hard fact of life is that the AF iteration is defined in linear Pr space, not in log likelihood space, and the math doesn't work out in log space. So, we need to convert back and forth from lin to log space.
b) As a consequence of a), the code got a major slowdown, and calling the 629 samples was about 15 times slower than before (sic).
c) To solve b), log10 of integers are now cached at init, and numerical approximations are now made. Most importantly, I'm using the approximation that log(exp(a) + exp(b)) ~= max(a,b) which seems almost inconsequential in practical performance but reduces computation time to what it was before. More detailes analyses are forthcoming. This approximation can be refined further on to avoid expensive log-exp conversions if further profiling and analysis deems it necessary.
Also, two other issues were solved:
a) Strand bias computation was actually wrong in the case where the optimal AC was bigger than max(forward reads,reverse reads). Now the code is exactly as buggy as the grid search model (all bugs are equal, but some are more equal than others)
b) Genotype likelihoods are now computed in a better way and if a likelihood < 0 we don't just cap to 0 but do something a bit smarter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4600 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-31 01:26:04 +00:00
hanna
d492621122
The TraversalEngine's habit of hanging onto old ROD states seems to have a bad
...
interaction with Tribble. In Tribble, keeping these references in memory until
the shard is flushed means keeping one 512K character buffer per object in
memory. Fixed by purging the reference to the object at the end of the
shard traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4599 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 17:09:58 +00:00
ebanks
1c056ea791
Users can now use VariantAnnotator to add annotations from one VCF to another. For example, if you want to annotate your target VCF with the AC field value from the rod bound to CEU1kg, you can specify -E CEU1kg.AC and records will be annotated with CEU1kg.AC=N when a record exists in that rod at the given position.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4598 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 16:38:31 +00:00
ebanks
1b3fc8ddd2
Doing things too quickly is also naughty. Thanks, Andrey. Now, we're even.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4597 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 14:50:04 +00:00
ebanks
58f7b4c595
Naughty use of assertions means that malformed records are not caught.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4596 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 14:41:38 +00:00
delangel
9a60e72364
Trivial change to LeftAlignVariants: make walker return number of aligned variants on map(), and print out the # of aligned variants at the end of the traversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4595 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 02:03:36 +00:00
hanna
2f8057bf24
Cleanup for multithreading memory leak during integration tests...unregister MXBean at end
...
of traversal to avoid holding a reference to the microscheduler, which holds a reference to
the engine, which in turn holds a reference to the walker, which itself holds a reference to
all the data aggregated during the course of the traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4594 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 18:37:42 +00:00
depristo
860de05a7c
Bug fix for PL vs. GL in header. PL now truly default output for UGv2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4592 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 12:39:18 +00:00
depristo
9782dde3dd
Bug fix for PL vs. GL in header. PL now truly default output for UGv2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4591 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 12:38:48 +00:00
ebanks
fe3cfb067c
very minor cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4590 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 02:11:33 +00:00
depristo
cbce3e3c83
General support for both GL (log10) and PL (phred-scaled) genotype likelihoods. All walkers now use the Tribble GenotypeLikelihoods object for parsing VCFs with genotype likelihood fields. Please use GenotypeLikelihoods object from now on for seamless support for GL and PL tags. UGv2 now uses PL by default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4589 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 01:48:47 +00:00
fromer
15183ed778
Reduced header to single sample when useSingleSample arg is given (to prevent lots of pointless no-calls)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4588 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 23:02:10 +00:00
fromer
34538bf2b3
Added ability to focus only on a single sample and/or emit only merged records in MNP merger
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4587 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 20:41:05 +00:00
kshakir
5cdd7a7ba4
There's no such thing as a sam index, so the GATK extension generator doesn't need to add an @Input for them.
...
Updated a call to swapExt to specify the directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4586 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 20:39:03 +00:00
hanna
4c23b1fe9c
Get rid of the static cache of ArgumentTypeDescriptors by making them an integral part of the
...
parsing engine. Hugely lowers our memory footprint in integrationtests, but not yet enough to
run Mark's new parallelized VariantEvalIntegrationTests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4585 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 19:44:55 +00:00
ebanks
e112df20df
Use a sorting VCF writer because records can flip positions during left-alignment
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4583 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 06:33:03 +00:00
ebanks
708e973911
Adding a walker to left-align indels in a VCF file (was able to reuse code from AlignmentUtils to do the hard part). The code correctly updates the alleles if they change. This makes it much easier to compare our indel calls to e.g. CG or dbSNP.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4582 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 06:08:26 +00:00
ebanks
ec442086ec
Minor refactoring of the cleaner allows me to add a trivial walker that left aligns the indels present in reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4581 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 03:39:10 +00:00
ebanks
ffc0ed2b32
Renamed getName() to getSource() in VariantContext to be more accurate
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4579 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 02:21:41 +00:00
ebanks
52fc023d80
Added convenience methods to check/get the ID of the VariantContext
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4578 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 01:56:58 +00:00
fromer
a7af1a164b
Updated MNP merging to merge VC records if any sample has a haplotype of ALT-ALT, since this could possibly change annotations. Note that, besides the "interesting" case of an ALT-ALT MNP in a pair of HET sites, this could even occur if two records are hom-var (irrespective of using phasing). Note also that this procedure may generate more than one ALT allele.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4577 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 01:50:36 +00:00
depristo
e02aac0743
No longer print out 0 reads were filtered out... message when there were no reads scene at all
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4575 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-26 20:22:16 +00:00
depristo
b085648141
Parallelized VariantEval. Refactored output to support parallel output style. Minor improvements to testing framework to enable easy executeTestParallel to run -nt 1 and -nt 4 by default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4574 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-26 20:21:38 +00:00
kshakir
8211cee0b2
Queue UI Improvements:
...
- Forcing user to set the temp directory via -Djava.io.tmpdir to avoid filling up /tmp.
- By default deleting job outputs tagged as intermediate.
- Defaulting pipeline to scatter count 1 (no reads deleted).
- Cleaning up temp classes even when scripting fails.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4573 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-26 19:49:08 +00:00
ebanks
cedceb33cd
My only experience with getting external groups (GAP,dbSNP) to use VCF has been painful at best, so I'm not holding my breath to get indels for CG in VCF. To that extent, here's a oneoffs walker to convert from CG format to VCF for all 'del' & 'ins' types (but not 'sub' types, since they're too complex to code up in VCF and I don't care about them for now). rs ids are included.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4572 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-26 17:53:14 +00:00
ebanks
071799453c
More complete fix to previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4571 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-25 20:47:37 +00:00
ebanks
67a776d53c
Yikes! VariantEval was always loading genotypes unnecessarily when no sample list was provided because the order of the checks in the if statement wasn't optimal. This results in a massive performance penalty when running with many-sample VCFs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4570 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-25 20:30:23 +00:00
ebanks
0d97394c4f
Add capability to liftover to do the right thing when sections of the genome are reverse complemented. This does not work for indels (we don't try to reverse complement) because we need to figure out what the hell to do about the fact that the 'base to the left' that we automatically add on will be wrong because the location of the indel actually changes when reverse complemented. Sheesh.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4569 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-25 20:03:03 +00:00
fromer
c357ec775a
Trivially phases any hom site (since it is always correct to continue the previous haplotypes by appending the same allele onto both haplotypes)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4568 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-25 16:58:41 +00:00
rpoplin
da64183854
Fix for the case of the truth VCF file having multiple SNPs at the same locus.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4567 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-25 15:04:50 +00:00
hanna
3039c0de3c
Retire old ROD syntax.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4564 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 23:52:11 +00:00
depristo
78e71c4167
Fisher exact makes a return. Seems to be working properly. Current tagged as a work in progress. Needs to take the filtered context to be truly correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4561 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 20:35:44 +00:00
fromer
f06f955e06
Added count of number of mergeable records (within specified distance cutoff)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4560 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 20:11:15 +00:00
depristo
84b6d2926b
Useful walker that creates a new interval list with only the interval overlapping input sites list. Really a one-off walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4559 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 19:55:04 +00:00
depristo
78b4a1c240
VariantsToTable now supports the virtual TRANSITION field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4558 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 19:53:46 +00:00
hanna
e6d61197e6
Disable OTF indexing when writing indices for temporary VCFs when running
...
with -nt option. When last I checked in, Ryan was seeing a ~25% speedup
per shard by not indexing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4556 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 17:40:37 +00:00
depristo
e6b008f87c
Fixed >= vs. > test leading to failure to tolerate dynamic indexes that are created at *exactly* the instant the output VCF is closed too
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4555 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 16:11:14 +00:00
ebanks
72c5b75460
Tribble exceptions can be generated outside of the normal codec parsing code because we now lazy load the VCF genotype fields. I'm not sure how else to account for this (to make sure they show up as user errors and not GATK system errors) besides catching them here.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4554 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 15:22:17 +00:00
delangel
e24f7fec47
Fixed indel genotyper which broke yet again because we can't just call context.getBasePileup() without checking again for its existence in the first place.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4553 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 15:17:11 +00:00
ebanks
c0b4317311
Er, here's the right fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4552 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 15:08:25 +00:00
ebanks
181f901126
Fix for Ryan: don't pull reference sequence for the portions of reads that extend beyond the contig boundaries
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4551 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 14:38:26 +00:00
ebanks
9f76aed515
Fix for IDs 5zP7jJeffK2sdPH1BH4JBVSrQztVEDKP and nX0cuBjoqBW4NQFpM6dE13KpkCuYFpZu
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4550 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 14:05:27 +00:00
hanna
d4feb99d9a
For parallel ROD traversals, simplified reference sharding. Will replace
...
with a more sensible strategy for sharding w/o BAMs at some point after
ASHG.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4549 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 05:08:15 +00:00
fromer
60f88866dd
Uses VCFConstants instead of hard-coded constants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4547 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:49:01 +00:00
fromer
883b8ff80e
Removed flush() method from VCFWriter interface; added takeOwnershipOfInner parameter in constructor of wrapper VCFWriters to designate if the Writer should close the inner Writer it receives on construction
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4546 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:48:00 +00:00
fromer
1ea43be976
Removed flush() method from VCFWriter interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4545 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:46:42 +00:00
chartl
3566ad2146
Wrong if statement.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4544 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 17:37:45 +00:00
chartl
bf17f92b64
Do not look for samples in dbsnp binding
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4543 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 17:36:38 +00:00
ebanks
225cf49128
Implementing reference confidence estimate in UGv2 as per UGv1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4542 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 16:57:59 +00:00
delangel
cf9c9ae241
Three important updates for Dindel genotyper:
...
a) Fix it up because it broke with a recent checkin to annotate vcf with unfiltered depth.
b) Printout of ref/alt alleles in output vcf was incorrect because the start/stop positions of associated GenomeLoc were incorrectly computed in case of a deletion.
c) Redid Beagle input/output walkers as not assume that ref was a single base, not to assume that variant was a vcf and generalized it to be indel-capable, so now the Beagle walkers can be used for indels as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4541 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 16:00:16 +00:00
ebanks
8f38ebf98e
Throw a user exception when using the clustered SNP filter in the presence of ref calls. It's unfortunate, but until we get a windowed ROD context this is just too much of a headache to support.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4537 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 02:44:10 +00:00
kshakir
88a0d77433
Changed parsing engine to store the order the argument bindings based on their definition in the class, moving "-T" to the front of Queue command lines.
...
Queue GATK generated .intervals is now a List(File) again removing special case handling in the generator.
Instead of using @Scatter annotation, using ScatterFunction instance to determine if a job can be scattered.
Implemented special VcfGatherFunction which only uses the header from the first file, even if the other files differ in their headers.
Added a -deleteIntermediates to Queue to delete the outputs from intermediate commands after a successful run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4536 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 21:43:52 +00:00
ebanks
91049269c2
Optimizations across the board, with help from Guillermo, Matt, and JProfiler. Too tired to give details now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4535 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 20:47:41 +00:00
fromer
f76865abbc
ReadBackedPhasing now uses a SortedVCFWriter to simplify, and has the ability to merge phased SNPs into MNPs on the fly [turned off by default]; MergeSegregatingPolymorphismsWalker can also do this as a post-processing step; Integration tests for MergeSegregatingPolymorphismsWalker were also added
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4534 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 20:27:10 +00:00
fromer
e8079399ac
Added flush() method to VCFWriters
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4533 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 20:23:22 +00:00
fromer
00726b6c4b
Added mergeIntoMNPs to merge successive VCF records into a single MNP VCF [if possible]
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4532 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 19:40:26 +00:00
fromer
55230ce5f3
Added startsBefore, startsAfter, and minDistance [calculates distance between any pair of bases in the two GenomeLocs]
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4531 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 19:12:34 +00:00
ebanks
4f77581087
More optimizations for HaplotypeScore: pulling final constants out of loops
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4530 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 17:40:57 +00:00
hanna
20fac43521
Add extra logging to the GATK run report at the start of metrics aggregation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4529 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 17:32:51 +00:00
ebanks
a205900eff
Naughty use of Strings in HaplotypeScore literally double the runtime of Unified Genotyper. Moved over to bytes and no longer allow Strings in the Haplotype util class. New round of profiling on tap for tomorrow.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4528 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 03:32:21 +00:00
depristo
f9541b78d3
Timing of traversal now starts at the start of the traversal, so the rate is reasonable right off the bat. For example, we now see: INFO 22:45:02,476 TraversalEngine - [TRAVERSAL STARTING]; INFO 22:45:32,484 TraversalEngine - [PROGRESS] Traversed to 2:50850686, processing 18,646 sites in 30.05 secs (1611.50 secs per 1M sites)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4527 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 02:47:34 +00:00
depristo
f7ce18553e
GenotypeConcordance now prints interesting sites more nicely. RMDTrackBuilder is now uses the root class FeatureSource not BasicFeatureSource.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4525 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 00:29:02 +00:00
ebanks
7a291a8ff3
First pass at a VCF validator. Will test more tonight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4524 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-19 19:55:49 +00:00
chartl
341e93ee12
The reference fixer seems to have munged the OMNI rather than making it better. Looks like some sites need to only have the ref and alt bases swapped, and others need to have the genotypes swapped as well? E.g.
...
some subset need
A C 1/1 --> C A 0/0
while another subset need
A C 1/1 --> C A 1/1
it's unclear how big these subsets are (or even if one is empty). What I do know is, doing the first one totally screws up concordance metrics for the 421-sample chip. So either something else needs to be done, or there's a bug in this walker. Until I know for sure, I've added an initialize exception to disable this thing...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4523 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-19 12:50:24 +00:00
ebanks
5251f49a90
Including Marian Thieme's BaseCounts class (with some modifications)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4522 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-19 03:07:30 +00:00
hanna
c5f105d050
Fix boneheaded mistake in the new interval filtering code I added on Sunday.
...
Sorry everyone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4521 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-19 01:20:12 +00:00
ebanks
524cb8257c
Renaming for accuracy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4519 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-18 18:11:07 +00:00
ebanks
0fe504b748
Use filtered depth for Exact model (just like grid search)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4518 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-18 18:08:31 +00:00
ebanks
d54d9880d7
Now that G's new genotyping algorithm is live, I've cleaned up the code to completely separate the grid search from the exact model. AlleleFrequencyCalculationModel is now completely abstract.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4517 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-18 18:04:06 +00:00
ebanks
80e5ac65b4
CAP_BASE_QUALITY needs to be included in the clone() method for it to be usable in UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4516 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-18 03:11:03 +00:00
hanna
6af9532090
Fix for GATK slowdowns at the ends of intervals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4514 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 23:21:23 +00:00
chartl
5889138f4a
*facepalm*
...
forgot to add the samples to the header. How could the VCFWriter let me get away with something so boneheaded?!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4513 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 05:36:29 +00:00
chartl
2bc5971ca1
Added - a tool to fix reference bases of a VCF. The OMNI had a couple of sites with incorrect reference bases (look to be legacy from other chips), and a few more that had ref and alt flipped. GAP should probably take care of it, but since I need results by monday, I'm doing it.
...
Modified - SelectVariants: Hook up to VariantContextUtils to recalculate AC/AF/AN, which uses the accessor in VariantContext to do this. Somehow sites that were selected down to hom-ref genotypes only wound up getting positive AC.
**IMPORTANT** I kind of need input here. The header of a file used for an integration test specifies AC as being an integer. Recalculating it casts it into an integer list (which it should be, as it allows for alternate alleles). However this appears to clash with what the jexl expression is looking for? For now, the integration test itself needed to be changed -- it's unclear what to do when the header specifies AC of being one class, but recalculating it casts to another class, and I'm not sure what to do.
I'm committing my omni_qc pipeline because I'm almost certain 2 months down the road I'm going to wonder what the heck I did to generate my results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4511 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 03:18:01 +00:00
ebanks
7aa030a9a4
Hmm. Apparently variants can get lifted over to different chromosomes. Who knew? Reverting changes from a couple of days ago. The only way to do this correctly (without requiring lots of memory) is to turn off on-the-fly indexing for this walker. Integration tests cover this now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4510 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 02:54:12 +00:00
chartl
8b2d387643
Added in an eval module that calculates the dispersion histograms between eval and comp (e.g. M_{i,j} = # of times eval observed to have AC i, comp AC j -- for af it's i/100 vs j/100 )
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4507 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 19:07:43 +00:00
ebanks
f78ff08e2b
This is less correct than my previous change but it's what UGv1 does and now is not the right time to start mucking with things.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4506 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 18:56:45 +00:00
ebanks
471c18054f
Fix for SB calculation: the best overall AF might not have any mass when just looking at reads from a single strand. We need to compute the best AF for each stratification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4505 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 17:51:18 +00:00
asivache
42c3d74432
bug fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4503 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 16:27:40 +00:00
chartl
c9d473edee
More changes to Variant Eval and Genotype Concordance (passes all integration tests):
...
1: -sample can now include a file, which will be parsed for sample-name entries
2: If you request a sample to run analysis on, but it is not present in any of your RODs, VEW will exception out
3: Change added to parse Integer, String, and List<Integer> type Allele Count annotations (error otherwise)
4 [slightly problematic]: The count objects now maintain row-keys in order, as the keys were taking an inordinate amount of time in onTraversalDone (multiple calls to getRowKeys(), so many multiple sorts of the same underlying unsorted object, very bad)
There is a legacy comparison object which is unused which I will strip out soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4502 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 12:40:36 +00:00
ebanks
9f54170dff
Hooking up the liftover tool to the new on-the-fly sorting VCF writer so that records can now get emitted in order.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4499 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 07:27:01 +00:00
ebanks
d41c252b13
Looking over the calling results with Ryan, it's clear that while the grid search optimization (ignoring samples that are clearly ref) can work for assigning genotypes, it cannot be used for calculating P(AF>0). There's too much area under the likelihood curve that gets lost and the QUALs are negatively affected. However, testing showed that this only slightly affects runtime (~15 minutes per 1Mbase for the 1kg allpops). The optimization does remain for genotyping.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4498 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 19:06:32 +00:00
ebanks
2606e67cf1
Reverting Matt's change from yesterday which I accidentally blew away when trying to cope with the stupid svn update issues we've been plagued with recently.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4495 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 14:40:42 +00:00
ebanks
cfb33d8e12
Filtering optimizations are now live for UGv2. Instead of re-computing filtered bases at every locus, they are computed just once per read and stored in the read itself. Eyeballing the results on the ~600 sample set from 1kg, we cut out ~40% of the runtime! QUALs are now sometimes different from UGv1 because I noticed a bug in v1 where samples with spanning deletions only were assigned ref calls instead of no-calls which ever so slightly affects the QUAL. Not a big deal though.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4494 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 05:04:28 +00:00
chartl
4ac636e288
Minor change: when tabulating concordance by AC, ignore sites with multiple segregating alleles in the population, at least for now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4493 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 01:35:33 +00:00
chartl
7c9ef59d65
This is simultaneously a minor and major change to VariantEval, so take heed:
...
The core walker has been modified so that when variant contexts (eval and comp) are subset to command-line-specified sample(s), the chromosome count annotations (AC/AN/AF) are altered to reflect the AC/AN/AF of only those samples involved in the comparison. No more getting AC500 when you're comparing a 10-sample overlap. Interestingly enough, this didn't break any integration tests.
GenotypeConcordance now has two additional tables: Allele Count Statistics, and Allele Count Summary Statistics. These work exactly identically to the Sample Statistics and Sample Summary Statistics tables, except that the partition being used is no longer the sample, but instead the allele count of the variant sites. These tables stratify by both eval and comp ACs, e.g.
evalAC0
evalAC1
evalAC2
compAC0
compAC1
compAC2
Differences with previous integration tests were verified to only be in the Allele Count tables (by grepping them out of the diff); a new test has been added for the simple case of an AC=1 site in the eval becoming an AC=2 site in the comp.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4491 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 22:26:15 +00:00
hanna
83b8676b69
Hack to fix mysterious disappearing read attributes. Ultimately caused
...
by the fact that the GATKSAMRecord, by design, needs to both inherit from
SAMRecord and wrap a 'member' SAMRecord, and method calls that aren't
implemented as explicit passthroughs can compromise the content of the
SAMRecord in subtle ways.
Will be automatically fixed when Picard moves to a lightweight SAMRecord
interface rather than the current heavyweight implementation. But in
the short-term, there's no obvious fix.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4489 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 19:06:54 +00:00
depristo
da29fcdb68
No longer writes the index to disk twice. But fixes for closing VCFWriters throughout the codebase
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4488 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 14:26:06 +00:00
aaron
28a1020c89
comment out debugging line that was clogging the performance test output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4487 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 03:26:55 +00:00
aaron
272ac2ae4a
more fixes for tests broken by indexing-on-the-fly; I think this should do it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4486 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 01:54:32 +00:00
hanna
ed39af53cd
Fix for exception when trying to load reference segment for a read that aligns
...
to 0 bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4485 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 23:50:51 +00:00
ebanks
fe9f128631
Better fix for earlier bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4484 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 19:21:33 +00:00
aaron
ff0df1a2da
A fix for an integration test that was broken by on-the-fly indexing. Also, better reporting of Tribble exceptions in GATK integration tests. Trying to get the tests back up and running...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4483 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 18:39:56 +00:00
ebanks
69652e08c6
Bug fix for reads that completely fall within an insertion: the I cigar string element was 1 base too long.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4482 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 14:46:21 +00:00
kiran
f348ca2976
Now processes VCF files with repeated loci without crashing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4481 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 04:36:07 +00:00
ebanks
fd8351cd49
Get rid of useless test/'optimization' that was carried over from UGv1. New codde is (minimally) faster with same results.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4478 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-11 04:04:07 +00:00
ebanks
f28523e7de
Implemented SB for UGv2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4477 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-11 03:56:01 +00:00
hanna
7008a469dc
Update MalformedReadFilter to pass reads that have cigar strings like 40S36I
...
that have 0 aligned bases in the genome. We'll have to fix walkers as faults
appear.
Also added JIRA GSA-406: finer-grained control of MalformedReadFilter: want
to exception out by default in these cases but pass them with a warning with
a corresponding -U flag.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4476 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-11 03:01:04 +00:00
ebanks
530875817f
Experimental code for better filtering of bases in sam records. Not hooked up yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4475 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-11 02:19:51 +00:00
ebanks
a0de269c4b
Better message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4474 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-10 20:11:51 +00:00
rpoplin
0a4cf02a52
Fix for index out of bounds exception in VR.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4473 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-10 17:35:15 +00:00
depristo
116309b3c3
More test cases for UG integration test. We currently fail doing multi-threaded gzip output, FYI
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4472 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 20:22:12 +00:00
depristo
38a67fed63
High performance version of standard vcf writer. New general static Tribble class for common constants, including general .idx constant and functions to get standard index name for a given file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4471 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 19:53:21 +00:00
fromer
bdd3a9752e
Changed min MQ and BQ to 20 (for phasing)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4469 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 19:27:45 +00:00
asivache
05500d1a8d
An iterator wrapper/adapter: takes GenomeLoc iterators 1 and 2 and traverses intersections of intervals from 1 with intervals from 2. Both 1 and 2 must be SORTED and NON_OVERLAPPING, but this iterator does NOT perfrom any checks, so if these conditions are not met, the behavior is unspecified
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4468 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 16:34:00 +00:00
asivache
253d528e49
not ready for commit yet
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4467 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 15:30:55 +00:00
asivache
4f2f33b42a
fix method invocation to conform to new API; this version of the code will compile but new functionality is still not fully in
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4466 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 15:30:26 +00:00
asivache
cece19d4d2
not ready for commit yet
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4465 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 15:14:54 +00:00
asivache
39e373af6e
deleting accidentally committed junk
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4464 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 15:13:01 +00:00
asivache
b3d81984aa
renaming MergingIterator to RODMergingIterator as it is more appropriate for this specialized implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4462 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 14:10:11 +00:00
asivache
77dddd0afa
renaming MergingIterator to RODMergingIterator as it is more appropriate for this specialized implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4461 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 14:08:28 +00:00
chartl
21ec44339d
Somewhat major update. Changes:
...
- ProduceBeagleInputWalker
+ Now takes a validation ROD and a prior to give it, will use those genotypes in place of the variant genotypes if both are present
+ Takes a bootstrap argument -- can use some given %age of the validation sites
+ Optionally takes a bootstrap output argument -- re-prints the validation VCF, filtering those sites used as part of the bootstrap
-BeagleOutputToVCFWalker
+ Now filters sites where the genotypes have been reverted to hom ref
+ Now calls in to the new VCUtils to calculate AC/AN
-Queue
+ New pipeline libraries for easy qscript creation, still a work in progress, but this is a considerable prototype
+ full calling pipeline v2 uses the above libraries
+ minor changes to some of my own scripts
+ no more need for contig interval lists, these will be parsed out of your normal interval list when it is provided
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4459 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 13:30:28 +00:00
ebanks
97b153f2fa
Quick fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4457 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 06:10:52 +00:00
ebanks
acd238f3f2
For Chris: pull out the chromosome counting code into VCUtils so that other tools can make use of it. Transitioned SelectVariants over to use it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4456 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 04:37:54 +00:00
delangel
3838823262
Two ugly hopefully temporary fixes for new genotyping model:
...
a) In Indel genotyper: we can't deal yet with extended events correctly and we are still triggering at each extended event which results in repeated records on a vcf. So, to avoid this, keep track of start position of candidate variantes we've visited and if we've visited a variant before we don't do it again.
b) Avoid infinite terms in QUAL and in genotype likelihoods which can happen if posterior AF happens to be exactly zero. For now, hard-code a minimum value of each term of the posterior AF likelihood to be -300 (ie 1e-300 in lin space). This can be solved with better and smarter log-to-lin conversions and some precision fixes in AF calculation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4455 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 00:53:16 +00:00
rpoplin
0de658534d
Removed the qScale arguments in VariantRecalibrator. It is smarter about how it tries to find a cut so the arbitrary scale factor hopefully is no longer necessary. Now the recalibrated variant quality score more accurately reflects our believed lod of the call.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4451 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-07 18:04:57 +00:00
fromer
ee00dcb79d
1. Phasing now ignores bases without minimum base quality (BQ) and minimum mapping quality (MQ); 2. The probability of a non-called base is now divided by 3, to evenly split up the error probability over the non-called bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4450 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-07 17:40:59 +00:00
fromer
f8f1cc45a3
Now ReadBackedPhasing caps Base Quality by Mapping Quality
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4445 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 20:48:57 +00:00
scalvo
bda427f078
Change specification of AnnotationInputTable, and fix 2 bugs.
...
Previous output spec contained 3 columns:
haplotypeReference,haplotypeAlternate,haplotypeStrand
where haplotypeReference was always on the + strand, and haplotypeAlternate was on the strand specified by haplotypeStrand.
The new specification contains 3 columns:
haplotypeReference,haplotypeAlternate,transcriptStrand
where haplotypeRef and haplotypeAlt are required to be on the + strand. transcriptStrand now specifies the strand of the transcript, which is needed for interpreting the haplotypes.
Bugfix #1 : fix incorrect assignment of variantCodon and variantAA
(Previously variantCodon was incorrectly set to referenceCodon)
Bugfix #2 : fix incorrect codingCoordStr values for - strands (bug reported by Giulio Genovese), and incorrect usage of "m." for mitochondrial transcripts (bug reported by Steve Hershman)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4444 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 20:46:09 +00:00
scalvo
b5c127e643
Removed HAPLOTYPE_STRAND_COLUMN; Previously, GenomicAnnotation allowed a user to specify the strand of the haplotypeAlternate, and would reverseComplement the haplotypeAlternate if HAPLOTYPE_STRAND_COLUMN was "-". The new specification does not allow this functionality, and instead requires both the reference and the alternate haplotypes to be on the + strand (as in VCF format).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4443 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 20:37:41 +00:00
kshakir
ca5db821ce
Added the ability to Queue to run scala functions inside the JVM. NOTE: Extend from InProcessFunction instead of CommandLineFunction to use this functionality.
...
Queue now submits new LSF jobs only after previous functions have completed successfully.
When the Queue process is shutdown (ex: via Control-C) sends a bkill command for any running jobs.
Ported commands like creating directories and scatter/gather interval list to scala functions.
Updates to LSF status tracking by porting the python to internally generated bash scripts.
Temporarily disabled job name submission to LSF. Plus side is that the full command is now available in "bjobs -w". TODO: Put back jobName passing to LSF based on an option?
Changed BaseTest to allow scala to access paths to references.
Changed the extension generator to default the analysis name to the walker "name".
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4442 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 18:29:56 +00:00
ebanks
3c5dc675ab
For Guillermo: only decide that something is a clear reference call if it is at least 10 times as likely as the next best genotype
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4441 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 15:16:41 +00:00
depristo
00491fcd2e
Only see not writing GATK Run Report if you are running with debug enabled
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4437 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:09:21 +00:00
rpoplin
69485d6a7a
Added command line argument for the max value of the allele count prior in VariantRecalibrator (--max_ac_prior). Default value increased to 0.99 from 0.95.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4436 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:00:53 +00:00
ebanks
3d564f4a29
reverting an accidental change from the dindel merge
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4434 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 03:08:09 +00:00
ebanks
b5e148140b
Officially fixed the UG priors; updated the default min MQ/BQs to pipeline values of q20 and min calling threshold to Q50
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4431 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 18:35:36 +00:00
fromer
c6668bd49c
Fixed bug in phasing, where mapping probability was incorrectly raised to the power of number of non-null bases [instead, it is just multiplied into phasing probability once]
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4430 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 17:07:31 +00:00
hanna
250c18e679
Error message fixes for the following issues:
...
nvjpM4yOwQAu3fNGxi4oXLuVpKn6aAlf,1GL0OuXK2xKQfvbu34tWYgbojSVSLo0l,
ehEGBJOfgc4V7qj8W0Homf5ICuVK5Sm3,cZsreLm1CbY3aYKZhV7DOSvQNwur41zp,
GlrlyGEyP9kJDIRCQNFQp7BGJBXSzdDJ,hyz1uiHXr39ANmdZu9K1epOSX8EL3mDw,
q0n4EucZESCI4LZhQik306zD4VAuH2cb.
Messages:
camrhG5tHzlY9WUSEVpVZGkU1tyJqKb5,s0OX2g7nYRctJxyFoQCa6clac9IsjHyi,
THIAtjllvYNlnTmiMnJEIHd2Ju4gqQIO,jwVk3JYZJNHloW7HO4LeGxFexknqro0v,
BFNRGOGmGGJNNPZqgeF1ikTNFfskbyLc,...
Were fixed in 4392.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4428 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 03:37:13 +00:00
ebanks
aa00801108
remove reference to -mrl
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4423 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 17:27:01 +00:00
chartl
f978c25b9d
Perhaps both, Eric. Perhaps both.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4422 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 13:56:04 +00:00
chartl
0eb777612a
Swap "." over to VCFConstants.MISSING_DEPTH_v3
...
Why v3, you ask? Why not? Simply because v2 was a String so old and clunky, the sun would fizzle out and grow cold before any VCF could be successfully parsed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4421 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 13:41:41 +00:00
chartl
74087c44ae
Fixed a bug which caused a parsing exception when there was a variant with a dp field of ".", e.g. "GT:DP 0/1:." -- which can happen when using imputation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4420 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 12:37:36 +00:00
ebanks
6448753cf7
Removed the SequenomValidationConvertor and renamed it VariantValidationAssessor since it no longer handles ped/sequenom files (but instead works on vcfs/variantcontexts). Updated all of the wiki docs, including adding instructions on how to convert ped files to vcf, a la Shaun Purcell. We now officially no longer support ped files everyone. Other misc cleanup in the code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4419 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 02:11:38 +00:00
ebanks
d8db48204e
Fix typo and tell people not to post user errors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4415 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-03 18:58:03 +00:00
ebanks
490e5e1b0f
Better error when bad ref bases are provided
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4414 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-03 05:40:37 +00:00
aaron
64b7b3f83b
fix for a recent change to the indexing code where we ignore the results of locking the file (this is bad), and as a result don't write the index; this should fix the build.
...
Off to Yosemite in 4 hours, enjoy the week gsa folks!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4410 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 04:35:11 +00:00
depristo
7551ba8249
Trival refactoring in preparation for on-the-fly indexing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4409 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 22:32:59 +00:00
rpoplin
2f7892601c
Useful debugging argument added to VariantRecalibrator to only use sites whose qual field is above --qual
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4406 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 21:08:55 +00:00
hanna
575c38fc04
Accidental fail to commit missing file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4405 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 20:26:51 +00:00
delangel
d4398f2686
silly bug fix: if I'm to do a short term hack to avoid -infinity likelihoods I might as well do it right.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4403 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 18:39:45 +00:00
hanna
8d25a5f9f2
A mechanism for supplying attribution text -- mainly useful for external
...
walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4402 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 18:31:19 +00:00
delangel
e920badcc4
Temporary fix for case where genotype likelihoods are exactly (1,0,0) or (0,1,0) etc. at a site with new indel genotyper: this would make us blow up when converting to log space and try to assign genotypes at a site. A more robust solution is in the works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4401 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 17:43:43 +00:00
rpoplin
b83fdf8a17
Bug fix in AnalyzeAnnotations. Be sure the site is a biallelic, unfiltered SNP.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4400 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 13:09:46 +00:00
delangel
fa9c21c020
More fixes for exact AF calculation model in new unified genotyper:
...
a) Fixed bugs in new dynamic programming-based genotyper
b) Fixed up temp hack that handles extended pileups for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4398 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 02:32:50 +00:00
delangel
eb67aee732
bug fix: forgot to uncomment code to compute genotype likelihoods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4397 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 21:38:22 +00:00
delangel
ece694d0af
Next iteration on new UG framework:
...
- Brought over exact AF estimation from branch (which is now dead). Exact model is default in UnifiedGenotyperV2.
- Implemented completely new genotyping algorithm given best AF estimate using dynamic programming, which in theory should be better than both greedy search and any HWE-based genotyper.
- Integrated and added new Dindel likelihood estimation model.
- Corrected annotators that would call readBasePileup: since we can be annotating extended events, best way is to interrogate context for kind of pileup and either readBasePileup or readExtendedEventPileup.
All changes above except last one are still in playground since they require more testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4396 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 21:33:59 +00:00
hanna
bf7fd08810
Fix newly-introduced bug in the PluginManager/DynamicClassResolutionException
...
where, when the system can't find a plugin of the correct name, the system
prefers to crap all over itself and throw an unintelligible NullPointerException
rather than displaying an intelligent error.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4393 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 19:07:05 +00:00
hanna
14e19f4605
(Slightly) better exception text when SAM/BAM output file can't be created.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4392 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 18:43:22 +00:00
hanna
1fb8c86f6d
Looks like we've got two competing models for an empty interval list: null and
...
the empty list. Score another victory for the integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4391 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 17:11:47 +00:00
hanna
78343be52c
At some time in the recent past, we lost our ability to process the '-L all'
...
argument. Brought it back, and added an integrationtest to make sure it
stays around.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4390 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 15:58:43 +00:00
delangel
e80742e72f
Use -o as argument for output file in ProduceBeagleInputWalker, to be consistent with other walkers (you're welcome, chartl :)).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4386 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 22:46:39 +00:00
hanna
732aa32758
Every Sting app from now on will be forced into the US English locale.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4385 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 21:55:21 +00:00
fromer
20ffe484bc
Added detection and INFO field marking of phasing inconsistencies (and optional filtration using --filterInconsistentSites)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4384 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 19:28:56 +00:00
rpoplin
a6c7de95c8
By using the AC info field instead of parsing the genotypes we cut 78% off the runtime of VariantRecalibrator. There is a new argument to force the parsing of genotypes if necessary. Various other optimizations throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4383 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 18:56:50 +00:00
ebanks
2d1265771f
Fix for G: make sure to generate the genotype conformations in the grid for the target frequency when not using grid search for anything except the conformations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4382 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 16:44:53 +00:00
delangel
4556e3b273
First iteration in filling up exact AF calculation with new refactored UG. Code computes EM iterations of exact AF spectrum and returns to caller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4381 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 16:21:54 +00:00
ebanks
0d71dff928
Small bug fix to the new UG (need to initialize the entire posteriors array) means that we also get identical results as old UG when calling with 60 samples in the pilot1 data. Now that I'm happier with UGv2, I've transitioned it to use the correct AF priors instead of the busted ones still in the old UG.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4379 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 14:24:50 +00:00
hanna
eee134baf2
Chris found a bug in the downsampler where, if the number of reads entering
...
the pileup at the next alignment start is large, we don't add as many of those
incoming reads as we should. No integration tests were affected.
Thanks, Chris!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4378 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 11:18:12 +00:00
ebanks
0ec07ad99a
Initial version of refactored Unified Genotyper. Using SNP genotype likelihoods and GRID_SEARCH AF estimation models, achieves the exact same results as original UG on 1-2 samples with the exception of strand bias (not implemented yet); other than that I have no idea. Needs tons more testing. Do not use. For Guillermo only.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4377 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 08:42:25 +00:00
kshakir
6df7f9318f
For enums generate the full path to the Enum type to avoid collisions such as enum Model and enum Model used in the same class.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4376 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 05:28:59 +00:00
fromer
e322e71c2f
Restored SVN history for phasing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4373 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 00:02:02 +00:00
fromer
720aaca8a0
Trying to restore SVN history for phasing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4372 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:50:28 +00:00
fromer
bf88117ead
Trying to restore SVN history for phasing directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4371 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:48:24 +00:00
fromer
dfb5143a41
Restore folder
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4370 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:46:07 +00:00
fromer
7c909bef82
Moved phasing classes out of playground! The code is still under production, though...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4369 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:21:28 +00:00
fromer
8d8980e8eb
Fixed phasing algorithm to: 1. More correctly weed out irrelevant reads and sites; 2. Crudely flag sites with large phase discrepancies betweens reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4368 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:02:53 +00:00
chartl
5a5c72c80d
Accidentally commited some debug output to PackageUtils, reverting change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4367 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 21:58:42 +00:00
chartl
862c94c8ce
Small change for Matt -- output partition types in lexicographic order.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4365 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 20:08:03 +00:00
ebanks
7ad87d328d
Make sure to uppercase ref bases since they aren't coming from the engine
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4364 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 19:05:46 +00:00
bthomas
96cccafb0d
Adding a few helper methods for accessing sample metadata, and associated unit tests. These are motivated by discussion with Ryan about how he'll use sample metadata in VariantEvalwalker - hopefully will make it easier for him. Methods are:
...
-- getToolkit().subContextFromSampleProperty(): filters a VariantContext to genotypes that come from samples that have a given property value
-- getToolkit().getSamplesWithProperty(): gets all samples with a given property
-- getToolkit().getSamplesFromVariantContext(): sample objects that are referenced by name in a VariantContext
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4361 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 02:16:25 +00:00
ebanks
1034853a84
Adding 'solexa' to list of known/supported platforms
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4357 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-27 02:38:38 +00:00
aaron
70f03a7113
first pass of well-formatted tribble exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4352 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-25 03:29:33 +00:00
kshakir
edaa278edd
Removed cases where various toolkit functions were accessing GenomeAnalysisEngine.instance.
...
This will allow other programs like Queue to reuse the functionality.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4351 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-25 02:49:30 +00:00
hanna
497bcbcbb7
Recent changes to the build system make the build system complain loudly about
...
pieces of core that depend on playground. Most of these have been eliminated by
(temporarily) promoting Aaron's report system to core in this checkin. I'll
follow up with other changes in separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4350 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 22:09:12 +00:00
hanna
6ebca5d219
Enhancements to build external projects for walker sharing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4348 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 21:17:16 +00:00
corin
eb1fa4bff3
changes an argument to an output so I can use it to track dependencies in queue
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4347 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 21:07:09 +00:00
depristo
745b8cc6d3
GATK now detects and UserExceptions when human lexicographically sorted data is provided
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4343 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 15:19:48 +00:00
rpoplin
1931b2e1bd
Three fixes for VariantFiltrationWalker: Trying to filter an empty VCF file will produce a well-formed VCF file with zero records instead of a blank file, needed for pipelines. The first record's genotype info fields are now in the same order as all the others. The VCF header lines are pulled from just the input variant rod instead of from all rods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4341 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 13:52:56 +00:00
kshakir
4ed9f437e9
Sliced the GAE in half like a gordian knot to avoid the constant merge conflicts.
...
The GAE half has all the walker specific code. The new "Abstract" GAE has the rest of the logic.
More refactoring to come, with the end goal of having a tool that other java analysis programs (Queue, etc.) can use to read in genomic data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4339 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 23:28:55 +00:00
rpoplin
0c9fabb06f
Fix in AnalyzeAnnotations, somebody changed it look for ID in the vc's info field. This dinosaur desperately needs integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4338 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 19:48:44 +00:00
hanna
0c781968fb
Tried to do a bit of pre-commit refactoring and screwed it up. Fixed.
...
Thanks to Ryan for identifying the problem.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4336 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 18:17:29 +00:00
depristo
d081b9b352
Improvements to error messages about @Requires and @Allows
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4334 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 12:08:27 +00:00
hanna
7841b301c4
Added more diagnostics so that I have some idea of what a 'general' exception
...
is. Required to fix bug ZjhCJAdwhtFq1x54ZlmlN8pFNcbrRpdJ and similar. We
might want to change this particular case to a ReviewedStingException after
we gain a bit more experience with it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4333 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 21:32:01 +00:00
fromer
44ccfc3531
Updated Phasing algorithm + evaluation module to properly implement haplotypes [including homozygous genotypes]; Implemented dynamic window phasing model for LARGE increase in efficiency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4332 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 21:29:58 +00:00
hanna
8f75d88519
Fix for GATK run report ids:
...
mOVsxGfDiiSMxVs2PPTVjzYTVbizlD6e
f9kUHUADFsZ0LiTGxRL5zPmq9kZcA4cQ
8eGHWJFAlBVmgxwPi3sMd1RmiN2PwHOf
iLhvHWveypKb2F8vKS5irHylc3pYvlOb
HDttXKUMEVoPrvVeWrH7E0htxYyNydMx
plus a bit of cleanup of custom exceptions in the sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4330 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 19:49:25 +00:00
kshakir
20b38b38f3
Updated from SnakeYAML 1.6 to 1.7.
...
Added a pipeline java bean and YAML utility to serialize java beans.
Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format.
Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference.
More changes to come as this code gets tested out in the fullCallingPipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 19:47:49 +00:00
hanna
0c99c97685
The engine now automatically adds the command-line arguments to the header of every VCF, unless -NO_HEADER is specified.
...
Changed integration tests, adding the -NO_HEADER argument, for walkers that previously did not include the command-line
arg headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4326 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 15:27:58 +00:00
depristo
522830fb01
Support for --assume-single-sample in UG, better malformated bam exceptions, and ignoring out of order contigs in seqdictutils. All for the CG bam file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4323 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 20:33:34 +00:00
aaron
b968af5db5
The tribble indexes are now updated with correct sequence lengths for each contig they have in their sequence dictionary. Also clean-up in the RMD track builder.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4321 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 18:21:22 +00:00
rpoplin
547763b230
Better error message for Petr's null pointer exception. Also added an exception integration test because I'm certain this used to work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4319 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 13:44:40 +00:00
depristo
8719dde59d
Now prints out PASS when a variant is unfiltered
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4318 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 13:16:41 +00:00
delangel
205fc0b636
Cleanup: Use Tribble's version of createVariantContextWithPaddedAlleles (no real functional difference) to avoid duplicated code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4315 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 19:53:30 +00:00
delangel
a10cfe213b
Small bug fix in simple indel genotyper: Likelihood of case where best haplotype pair was (REF,REF) was not computed correctly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4314 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 17:04:39 +00:00
ebanks
f5a30d0248
I just spoke to Andrey & Kiran (the original authors of these tools), and they voted to kill these in favor of Picard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4313 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 13:27:35 +00:00
delangel
f64b6fddc1
Major changes/improvements to indel genotyper:
...
a) Redid way to compute path metrics in indel error model. Paper formulation where we have an anchor point in the alignemt between read and haplotype won't work in practice except in nice data sets that are perfectly indel-realigned and that are well mapped by aligner. New formulation doesn't assume this, and it's actually simpler and uses less code. It now resembles more a classic SW dynamic programming formulation but it still preserves the HMM probabilistic formulation.
b) Added a programmable call threshold, set by command line.
c) Use now sample name from BAM file, remove -sampleName argument.
d) Simplify loop to compute read-haplotype likelihoods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4311 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-19 23:47:31 +00:00
rpoplin
c6351a11d6
Clearer logger output when not using by-hapmap
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4308 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-18 16:10:42 +00:00
rpoplin
7e58d8ed61
CombineVariants now outputs the command line in the VCF header. Added a new hidden argument to VR walkers called --NoByHapMapValidationStatus to turn off the by-hapmap dbsnp rod behavior. Very useful for experimenting with which sets to use as training data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4307 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-18 16:06:50 +00:00
kshakir
a3f31e5df0
When QScript writers use the RodBind, then the File version of the same argument should be optional, i.e. should not always try to output the file, which when unpopulated will be null.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4305 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 18:22:07 +00:00
bthomas
c6c6d32b46
Quickly adding a new convenience method for retreiving a group of samples. The method is getSamples(Collection<String>) and returns a set of sample objects. There's also a test there.
...
Ryan is using this to modify VCF code today...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4303 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 15:55:17 +00:00
kshakir
a898908918
The output BAM file optional arguments of compression and whether to write an index are not outputs themselves.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4302 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 15:35:54 +00:00
bthomas
bc12055fcf
Quick patch to fix the sample code. It wasn't actually initializing the sample data source, so I added a call to initializeSampleDataSource() in GenomeAnalysisEngine. I think there was just an error resolving the versions of GenomeAnalysisEngine
...
Also added a new error message that I thought would be helpful...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4301 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 14:05:26 +00:00
ebanks
a10b2a00a5
Moving the util VariantContext 'modifying' routines into VC itself (as opposed to VCUtils) so that we can pass the genotype data directly into it and are no longer forced to decode the genotypes for no reason. This means that any walker that takes in a VCF and modifies the records without touching the genotypes never have to decode them. I've hooked this into the other two Variant Recalibrator walkers for Ryan. One side effect, though, is that we no longer can sort the sample names in the VCF (i.e. if the input VCF doesn't have samples in alphabetical order, then we used to sort them when writing a new VCF but no longer do that), because if we don't decode then we can't re-order the genotypes. I don't think this is a big concern given that the Unified Genotyper does emit sorted samples and that's the main source for most of the VCFs we use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4300 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 07:09:58 +00:00
bthomas
f66ef4626e
Fixing two minor issues: 1) adding a new error message if the user adds a fasta file in a directory that doesn't exist; 2) renaming my sample unit tests so they actually run.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4299 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 20:45:51 +00:00
rpoplin
2eb5d9b2d2
CountCovariates makes sure that it sees a rod type that it expects for use as a variant mask (accepted types are dbsnp, vcf, and bed)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4296 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 18:53:42 +00:00
aaron
782e0018e4
removal of most of the old GATK ROD system; also a fix for -Dsingle so we can again run just a single unit or integration test (single tests in tribble can be run with the -DsingleTest option now). More to come.
...
*** Three integration tests had to change: ***
RecalibarationWalkersIntegrationTest:
One of the tests was using the interval as the snp track, and wasn't supplying a DbSNP track (for CountCovariates)
SequenomValidationConverterIntegrationTest:
relies on Plink ROD which we've removed.
PileupWalkerIntegrationTest:
we no longer have implicit interval tracks, so there isn't a rod name over the specified region. Otherwise the same result.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4292 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 22:54:49 +00:00
delangel
c604ed9440
Several improvements to new indel genotyper (more to come soon):
...
a) Turns out previous change of centering haplotype around indel was a bad idea. Context to the left of indel is important but not as important as right one, because by definition all alleles start at the same location, so haplotype is the same to the left of indel regardless of allele. So, go back to having a constant size window to the left of event.
b) Expand reference context so we can test larger haplotypes.
c) Optimize computation of read likelihoods by doing them in linear array instead of in a matrix - no difference in biallelic sites but could be significantly faster in multiallelic sites.
d) Bug fix: read alignment wasn't being computed correctly if, a) we were at an insertion, b) read started right at the insertion, c) read CIGAR didn't include insertion - more of these corner conditions are lurking, so a revamped computation of how reads align to candidate haplotypes is in the works.
e) Add debug option not to use prior haplotype likelihoods.
f) Don't hard-code NA12878 for genotyping, now sample name is a required input argument.
g) Bug fix: if there are no reads covering a candidate indel event, just output NO_CALL (didn't notice this in HiSeq, but in P1 data it happens all the time). I need to add a confidence threshold for calling later on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4291 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 21:53:08 +00:00
depristo
fb6d7d19f9
Better window size error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4290 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 20:40:56 +00:00
rpoplin
b5d2e299d2
Make it more clear what is going on with the by-hapmap validation status in the dbSNP rod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4289 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 17:29:31 +00:00
rpoplin
0a06fbdb94
Adding header lines to output of VR walkers to settle validator warnings. Command lines are added to the VCF header. GATK version numbers will be added to the header lines by Matt.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4288 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 16:45:03 +00:00
asivache
d7b5baf8e5
Now uses tagging of -I arguments. Multiple -I options (merging) is now allowed. In somatic mode 'tumor' and 'normal' tags are required for each input bam, the order does not matter anymore (since we use tags!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4286 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 13:58:51 +00:00
bthomas
e5f81d25d4
Adding the --sample-metadata (-SM) command line argument and associated functionality. This is something Matt and I have been working on for a while. Basically, it allows you to integrate sample metadata into an analysis, by including a sample file. More detailed documentation is on the wiki: http://www.broadinstitute.org/gsa/wiki/index.php/Adding_Sample_data_to_an_analysis
...
This commit adds two important classes: Sample, which contains data about one sample; and SampleDataSource, which manages sample data a la ReferenceDataSource and ReadsDataSource.
This code should be stable, but it has not been integrated with existing walkers yet. That's the next commit.
In the meantime, feel free to experiment with the code - there are two basic example walkers in the playground.sample package. And PLEASE let me know if you see any errors/inconsistencies.
Note that this also adds a new dependency on SnakeYaml, a YAML parser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4285 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 11:50:22 +00:00
ebanks
dd23f204ab
Making the UG args that allow users to proceed with insufficient bam headers (no SM or PL tags) @Hidden; removed them from wiki.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4283 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:54:50 +00:00
ebanks
514b28210e
Have VF write to sdout when no -o is supplied
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4282 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:48:33 +00:00
ebanks
4e83ba411f
We now do lazy loading for the genotype data in VCF. Practically, almost all walkers end of loading the genotype data because we need to be smarter about transfering the unparsed genotype string when modifying VariantContexts; however, this does solve the problem for VR's piece to generate clusters (shaved off 75% of runtime for Ryan's large case). That further optimization will happen later.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4279 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 00:18:17 +00:00
depristo
74d4f124b1
Bug fixes to allow us to generate GATKRunReports for very early errors that leave the engine in a corrupt state. Vastly better error handling of common command line problems. Analysis output now notes whether an exception is a a UserException or a StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4278 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 22:45:15 +00:00
delangel
6d07181dc9
When processing Beagle output and creating new vcf, output the filtered records in the original input vcf as is, so that we don't lose the information on them when we run Beagle.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4276 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 19:18:45 +00:00
hanna
7fa6b2135b
Added a back door so that integration tests can reset the sequence dictionary
...
in the reference. Reset routine is not accessible to any class outside
GenomeLocParser's package.
We'll have to do something more intelligent with this when the GATK goes
distributed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4275 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 18:58:08 +00:00
depristo
dbb641280e
CycleCovariate now tolerates SOLEXA as machine type. Also, exception handling is now written to stderr.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4274 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 12:35:57 +00:00
ebanks
71d2d69b41
Better error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4273 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 05:04:26 +00:00
fromer
248cc308b2
ReadBackedPhasing silently ignores sites with ploidy != 2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4272 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 21:14:17 +00:00
fromer
528f6344af
Moved ReadBackedPhasingWalker to phasing sub-directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4271 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 19:36:41 +00:00
depristo
fa3be2209f
Improvements to the error display code to print out the SVN number in all messages. Fixes to CallableLoci and tests to check for that case
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4270 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 18:36:45 +00:00
depristo
7880863eb7
Final step in error refactoring. GATK exception is now ReviewedStingException, indicating that this exception is really what one wants. Only use this exception when you have thought about StingException vs. UserException and made a real decision.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4267 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:07:38 +00:00
depristo
7ad8fbdd5a
Moved GATKException to exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4266 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:47:19 +00:00
depristo
bccebf8899
Newly placed StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4264 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:38:46 +00:00
depristo
3964e02fb6
Newly placed StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4263 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:38:32 +00:00
depristo
595907e98e
Moving StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4262 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:34:15 +00:00
depristo
40e6179911
Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:02:43 +00:00
delangel
da2e879bbc
Miscellaneous improvements to indel genotyper:
...
- Add a simple calculation model for Pr(R|H) that doesn't rely on Dindel's HMM model. MUCH faster, at a cost of slightly worse performance since we're more sensitive to bad reads coming from sequencing artifacts (add -simple to command line to activate).
- Add debug option to calculation model so that we can optionally output useful info on current read being evaluated. (add -debugout to commandline).
- Small performance improvement: instead of evaluating haplotype to the right of indel (just with a 5 base addition to the left), it seems better to center the indel and to add context evenly around event.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4257 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 13:50:28 +00:00
ebanks
61d511f601
Small memory performance improvement: remove the mapping from the hash instead of setting the value to null (i.e. remove the key too)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4256 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 05:19:09 +00:00
ebanks
a0231f073f
Damnit. Enabling the Picard code to recalculate all of the relevant SAMRecord attribute tags means that I need to have reference bases over all read bases even after realignment (and there are some big indels in dbsnp). Fortunately, I have my trusty IndexedFastaSequenceFile reader handy! Re-enabling the previously broken performance test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4255 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 05:06:37 +00:00
hanna
87aca64716
Jumped the gun a bit on bam on-the-fly indexing -- Tim says it's not ready yet.
...
Turned it off by default and added a property to turn it back on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4254 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 21:16:03 +00:00
rpoplin
7b113a4886
Truncate the floating point numbers coming out of the variant recalibration walkers. Integration tests now work with both 1.6.0_16-b01 and 1.6.0_21-b06
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4253 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 18:37:49 +00:00
depristo
8f1a32acae
All exceptions thrown by the GATK have been reviewed and UserErrors replaced where appropriate. Shazam. Another check-in will remove the GATKException and restore the StingException.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4252 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 15:25:30 +00:00
rpoplin
61e848c4f0
It's clear from Sendu's calling and my own calling that -qScale 100.0 is a much better default value for low pass data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4248 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 01:47:21 +00:00
depristo
1de713f354
Massive review of maybe 50% of the exceptions in the GATK. GATKException is a tmp. tracker so that I can tell which StingExceptions I've reviewed. Please don't use it. If you are working on new code and are considering throwing exceptions, it's either UserError or StingException, please
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4246 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 23:21:17 +00:00
aaron
f5c295b6b2
add a little bit of documentation to the RMD track builder and wrap any exceptions thrown in tribble with the file source and line that caused the error.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4243 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 17:56:36 +00:00
rpoplin
aeb897db7f
VR walkers look at by-hapmap validation status by default. Eric will be updating the syntax to allow for more flexibility here.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4242 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:40:56 +00:00
depristo
6a30617a60
Initial implementation of UserError exceptions and error message overhaul. UserErrors and their subclasses UserError.MalFormedBam for example should be used when the GATK detects errors on part of the user. The output for errors is now much clearer and hopefully will reduce GS posts. Please start using UserError and its subclasses in your code. I've replace some, but not all, of the StingExceptions in the GATK with UserError where appropriate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4239 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 11:32:20 +00:00
depristo
ca9c7389ee
Not useful
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4238 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 02:33:03 +00:00
depristo
8708753a6a
checkin for removal
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4237 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 02:32:46 +00:00
hanna
5119bdb55e
- Update DoC to support output to /dev/null.
...
- Add a release sanity check for DoC.
- Update release sanity checks with new command-line argument system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4236 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 23:43:18 +00:00
fromer
1b1ec7e52d
Changed default phasing window size to 10
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4235 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 21:28:36 +00:00
fromer
ce031b2f05
PhasingEvaluator prints out interesting sites (only 1 phased, or phases disagree)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4233 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 18:21:21 +00:00
ebanks
40283f6456
Success! TranscriptToGenomicInfo now works without the delicate hacks that Ben had put in.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4232 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 18:06:00 +00:00
ebanks
cd091d7309
This walker can NOT be tree-reducible (in its current state). Given that it's meant to be run just once for any given transcript set, this is not at all a problem.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4231 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 16:47:51 +00:00
ebanks
ae9cba1c73
After an epic battle with this code until 3am last night, I have discovered that it is tragically and fatally busted. Ben clearly didn't understand how the ROD system works when writing it and so it is unusable in its current state. I've ripped out all code and it now gracefully exits telling the user that we are actively working on a replacement for this tool. Sigh.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4230 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 16:39:41 +00:00
ebanks
29f7b1e6d6
Trivial update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4229 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 14:02:38 +00:00
ebanks
cd2bfb09ef
Change for Tim: invalidate the MD tag (temporarily) if it exists in a read that gets realigned
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4228 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 13:59:09 +00:00
ebanks
65edbced36
Addition for Tim: recalculate the NM and UQ tags after realignment. Also, don't fix the insert size calculation, since that's done by fix mate information.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4227 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 04:02:14 +00:00
chartl
71046e650e
Added a more robust check for Jishu -- am pretty sure the .bam header is busticated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4223 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 01:11:22 +00:00
fromer
ae3f7026a4
Corrected phasing quality evaluation to correctly account for hom sites that break phase
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4222 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 22:43:54 +00:00
hanna
501f6a0e14
Temporary hack to disable index creation when target BAM is /dev/null. Tim
...
promises me that Picard will put in a real solution next week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4220 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 16:57:51 +00:00
fromer
754c2c761e
Added minimum phasing quality for phasing evaluation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4219 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 14:29:11 +00:00
ebanks
5d0d9c7dce
My parallel version of TranscriptToInfo now emits 'chr start end' instead of 'chr:start-end' for records so that 1) they can be easily sorted in coordinate order (allowing me to emit records out of order if I choose) and 2) the file can be tabix indexed (when we stop finding 'critical' bugs in that code).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4218 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 05:20:40 +00:00
ebanks
4d4ef5b42c
In the end, it's not worth rewriting TranscriptToInfo from scratch. I'm keeping the old one around for a bit so I can play with this new version which 1. doesn't store the records in memory so can be run in under 1Gb of memory, 2. actually emits all of the records (the original fails in some cases), and 3. is refactored to cut out ~20% of the code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4215 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-06 02:37:34 +00:00
kiran
0dd5a0990d
Now annotates sites marked as filtered out (this is important if sites are in a lower-quality tranche).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4214 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-04 00:36:55 +00:00
delangel
ef7454a241
Minor improvements to indel genotyper:
...
a) Ability to specify haplotype size from command line
b) Expand reference context window so we can form haplotypes for longer indel events.
c) small bug fix in temp output writer (to be removed once I can emit vcfs)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4212 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 22:52:08 +00:00
depristo
7eeabe534a
QSample walker for 1KG -- measures aggregate quality of sequencing. Includes misc. improvements throughtout the code, including using the new Tribble GenotypeLikelihoods class for working with VCF GLs from the UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4211 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 18:21:43 +00:00
rpoplin
e3962c0d13
VR integration tests are longer but much more useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4210 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 15:50:19 +00:00
hanna
da11efa1a2
Automatically write BAM file indices for coordinate-sorted BAMs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4209 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 14:10:44 +00:00
fromer
529eecd4dc
Added phasing sub-directory to keep walkers directory clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4208 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:38:46 +00:00
fromer
c0ce9ca8cc
Added phasing sub-directory to keep walkers directory clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4207 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:32:30 +00:00
rpoplin
60003aeaca
Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4206 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:31:49 +00:00
fromer
c119f64514
Added phasing sub-directory to keep walkers directory clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4205 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:24:18 +00:00
depristo
3c9597d45a
OnTraversalDone writes output to out now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4203 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:55:03 +00:00
depristo
73d41bfa24
CountLoci nows writes out to a file for Queue status tracking. VariantAnnotatorEngine has a special group None that doesn't add any annotations; useful for those who are testing UG performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4202 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:52:33 +00:00
hanna
70bb480939
The battle is over. Picard is revved.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4200 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 05:28:01 +00:00
ebanks
fdaac4aa78
As the VCF guru, I'll take this one for Andrey. Someone has actually found a deletion at the beginning of the chromosome. Instead of failing with an ArrayIndexOutOfBoundsException, just don't try to print out the record. Our VCF writer doesn't really support this case (yet).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4199 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 03:27:43 +00:00
ebanks
c45ffcdaed
Changing documentation (temporarily) to warn people that -U is not supported.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4198 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 03:18:07 +00:00
delangel
8a7f5aba4b
First more or less sort of functional framework for statistical Indel error caller. Current implementation computes Pr(read|haplotype) based on Dindel's error model. A simple walker that takes an existing vcf, generates haplotypes around calls and computes genotype likelihoods is used to test this as first example. No attempt yet to use prior information on indel AF, nor to use multi-sample caller abilities.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4197 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 00:25:34 +00:00
fromer
a1cf3398a5
Added basic version of phasing evaluation: GenotypePhasingEvaluator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4196 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 22:09:50 +00:00
kshakir
fd5970fdd4
At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them.
...
No longer generating deprecated GATK arguments on the Queue extensions.
Emitting deprecation warnings to Queue compile to help debugging issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:30:48 +00:00
rpoplin
0bb05fb472
Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4194 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:12:09 +00:00
chartl
3a4844ebde
Additional partition types into DepthOfCoverage:
...
- Sequencing Center
- Platform
- Sample by Center
- Sample by Platform
- Sample by Platform by Center <---- needed for analysis I'm doing
The fact that the latter three needed their own partition types, rather than being dictatable from the command line, combined with the new hierarchical traversal types, and new output formatting engine, suggest that DepthOfCoverageV3 is about ready to be retired in favor of a newer, sleeker version.
For now, this will do.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4193 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 19:30:03 +00:00
chartl
590bb50d16
Test for missing read group
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4192 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 14:22:13 +00:00
kiran
acd6bd2430
Experimental tool to annotates indels that are provided in a VCF file based on RefGene. Specifies gene, transcript, strand, type (Non-frameshift, frameshift, 5'-UTR, 3'-UTR, SpliceSiteDisruption, Intron, or Unknown).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4191 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 23:30:28 +00:00
hanna
dc5f858d29
Replaced placeholder support for splitting by read group with read support (sorry everyone), and added relatively comprehensive unit tests to ensure that splitting by read group works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4190 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:24:50 +00:00
rpoplin
b28f63a948
Base recalibrator now uses -o and deprecates -outputBam
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4189 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:13:50 +00:00
depristo
995cfe34fe
You can have an error so early that some engine fields are uninitialized. Commit protects RunReport from these errors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4185 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 19:00:25 +00:00
rpoplin
a975db2c2e
Bug fix for the case of reads with no read bases!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4184 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 16:58:54 +00:00
rpoplin
469bbaa240
Added more integration tests for the variant quality score recalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4181 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:31:24 +00:00
rpoplin
5b94c926c8
More precise language.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4178 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:44:22 +00:00
rpoplin
96040726ac
Better exception text for the common error of providing only dbsnp but giving dbsnp sites zero clustering weight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4177 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:36:43 +00:00
depristo
32c6b48106
Proper memory metrics in the file. Please use -et if at all possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4175 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:30:09 +00:00
chartl
63c7cbd89b
Forgot to commit this long ago, change so the tables are correctly propagated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4174 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 19:06:52 +00:00
aaron
db4ff7317f
allowing empty RMD files (we need to not validate their sequence dictionaries against the reference in this case)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4173 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 17:45:33 +00:00
ebanks
3d6c4fc55f
Removing the obsolete --hapmap and --hapmap_chip options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4172 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:57:05 +00:00
depristo
b33873206a
GATKRunReport now has an ID (random 32 char string) that uniquely identifies the JOB run and can be used to find a run in the run repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4171 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:18:57 +00:00
ebanks
3c956110f3
Fixing up the VCFWriter storage code: instead of assuming all samples are coming from the input bam file (they're not), just use the original VCF header for writing the temporary thread files. Now parallelization in e.g. the Genomic Annotator works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4168 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 02:16:07 +00:00
aaron
69d92fab4f
adding the ability to get iterators from Tribble without having an index, and updating the Tabix code to the latest Samtools SVN version (this still doesn't fix the outstanding tabix bugs, waiting for Heng on that).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4167 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:49:23 +00:00
fromer
50f7f18cbd
Changed ReadBackedPhasing default PQ threshold to 10
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4166 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:26:15 +00:00
chartl
e64d1be475
Check if VC is null before trying to subset it (can happen with indels)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4165 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 20:43:37 +00:00
depristo
1ddb5d17c9
hostname now fully qualified and working
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4163 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 17:04:37 +00:00
depristo
4c28fc3a39
Clear documentation for GATKRunReport
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4161 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 15:59:25 +00:00
kiran
16b75e3b9a
A new version of the ErrorRateByReadPosition walker, using the GATKReport functionality to store and emit its output. This version of the walker is roughly half the number of lines as the previous version, owing simply to the removal of all of the output formatting that's now handled by GATKReport.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4160 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:41:13 +00:00
kiran
fd19c63aaf
A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module).
...
This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R. In the end, you get a table that looks like this:
##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads
cycle errorrate.61PA8.7 qualavg.61PA8.7
0 0.007451835696110506 25.474613284804366
1 0.002362777171937477 29.844949954504095
2 9.087604507451836E-4 32.87590975254731
3 5.452562704471102E-4 34.498999090081895
4 9.087604507451836E-4 35.14831665150137
5 5.452562704471102E-4 36.07223435225619
6 5.452562704471102E-4 36.1217248908297
7 5.452562704471102E-4 36.1910480349345
8 5.452562704471102E-4 36.00345705967977
...
A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession. Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone. This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect.
The display property of individual columns can be turned off. This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file.
Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations. For instance, two whole columns can be divided, the results of the operation being stored in a third column. This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:39:24 +00:00
ebanks
df76474b34
Proper filtering when indels are being lifted over
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4158 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 04:48:31 +00:00
depristo
3fd2392090
Improved interface to getting command line options. Now fully traverses all objects to get all internal argument collections. Preliminary (but disabled version) of phoning home (see -et argument for more information). Captures correct and erroring out runs and writes out gzipped, xml report with lots of useful information. Needs a bit more information but is approximately working. Reports going to /humgen/gsa-hpprojects/GATK/reports/ in submitted directory that will be collated by some external tool. Only operating if -et STANDARD or -et STDOUT are provided currently and REPORT_DIR contains a file called ENABLE. WalkerTest now adds -et NO_ET to tests to avoid populating the reports with tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4155 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-28 22:53:32 +00:00
rpoplin
9c3f403307
Add the calculated lod value to the info field of each recalibrated VCF record.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4153 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 21:33:58 +00:00
delangel
fe19539188
Small bug fix: if a read falls at the edge of an indel event (but is not part of it), don't count it towards consistency computation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4152 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 20:37:27 +00:00
rpoplin
54355b1864
In variant quality score recalibrator Preserve the definition of known and novel to be presence in dbSNP or not even when training with 1KG project calls.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4151 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 19:07:59 +00:00
ebanks
7a5f297083
actually modify the vcf when a sample has been down-sampled
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4150 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 19:03:21 +00:00
ebanks
9860db64a3
Fix up liftover to enable lifting over indels
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4148 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 17:55:27 +00:00
hanna
fb177c4fee
If only dcov is specified, assume that selected downsample type is BY_SAMPLE.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4147 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 17:35:41 +00:00
ebanks
9584cbc05e
UG now downsamples to 250x by default
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4146 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:53:15 +00:00
ebanks
431392330e
Re-enable the max records in ram argument, which I accidentally removed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4145 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:42:49 +00:00
hanna
de5ccfb0b1
Moved hasPileupBeenDownsampled() based on Eric's request. Also eliminated
...
@Deprecated constructors from AlignmentContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4142 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:12:05 +00:00
ebanks
427a2f85e9
The Indel Realigner now lets the engine do all of the setup for args affecting the SAM writer. Thanks, Matt!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4141 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:19:47 +00:00
asivache
a3d9d23b0f
Now prints het genotype with GQ=0 for each indel; in two-sample (normal-tumor) mode, prints both genotypes (N and T) as hets for germline events or hom ref for N and het for T for somatic events (all genotypes still have GQ=0)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4140 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:06:42 +00:00
ebanks
dda84a0e54
Re-enabling indels for the Genomic Annotator as per Steve's patch. Steve assures me that he will test this out really well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4139 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:01:25 +00:00
hanna
6f4af47aac
setMaxRecordsInRam now a member of StingSAMFileWriter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4138 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 14:50:41 +00:00
ebanks
bfcac33e80
Cleaning up playground utils and tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:25:47 +00:00
ebanks
4979dcc9a7
Finishing up the playground cleanup (for now)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4135 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:19:37 +00:00
ebanks
0452b1ab68
archiving, removing, or promoting to core from playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4134 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:07:42 +00:00
hanna
d773b3264b
Eliminated -mrl option.
...
Eliminated -fmq0 option.
Eliminated read group hallucination.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4133 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 21:38:03 +00:00
depristo
f384d4a5d6
A java reimplementation of vcf2table in python; supports getting more useful information about genotypes (HET, e.g.) than was possible in python.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4130 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 17:50:33 +00:00
asivache
1e193e4c20
prinring '\n' at the end of line leads to some aesthetical advantages
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4129 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 16:29:42 +00:00
asivache
9b3ffa5f64
Now outputs VCF (as standard output associated with -o)! Can also outptut, in parallel, a lightweight bed and fully annotated .txt (old verbose format) with --bed and --verbose, respectively
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4128 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 16:26:03 +00:00
ebanks
dfae48cee0
Moving supported tools to core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 13:56:19 +00:00
ebanks
45d895dcf4
Remove the check in the Unified Genotyper for hitting the max reads at locus value. Instead, simply add a flag to the INFO field if any of the samples has been downsampled. 95% hooked up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4126 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:50:47 +00:00
ebanks
e06b2c90ef
Cap the default size of join tables; this can be modified with the --maxJoinTableSize argument. Also, misc cleanup of the comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4125 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:21:26 +00:00
ebanks
79cd716671
More cleanup of the Genomic Annotator. Also, we now require join tables to have unique entries for the column keyed on the join.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4124 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 04:43:52 +00:00
kshakir
0105e8d063
Updated Queue GATK generation to reflect -B and -I changes.
...
To add support for "-I:tumor tumor.bam", the GATK argument
import_file (-I) is now generated as a List of NamedFile objects.
Could not get sugar working 100%. To activate sugar import the
gatk package. This effectively adds a new method to java.io.File
called toNamedFile. When adding a file to the list call
countReads.import_file :+= myJavaFile.toNamedFile
See scala/qscript/examples for actual examples.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4122 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 22:17:36 +00:00
hanna
bdb3a7ebe6
The tagger was automatically combining identical tags, but this is a problem
...
for the ROD system. Eliminate tag combine operation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4121 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 22:01:32 +00:00
fromer
39da567d48
Changed ReadBackedPhasing to be a RodWalker (corrected to By(READS))
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4120 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:53:04 +00:00
ebanks
4678613893
Significant fixes for the Genomic Annotator.
...
1. Rip out all of Ben's code intended to circumvent the stable VCF Writer output system in multi-threaded mode (I threw up a little when
I saw this code). This will improve memory consumption when running with -nt.
2. Don't annotate indels or > bi-allelic sites.
3. Fix bug where not all records were making it into the output VCF.
4. General code clean up.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4118 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:16:50 +00:00
fromer
41e53d37e1
Changed ReadBackedPhasing to be a RodWalker (more efficient, since it is ROD-focused)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4117 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 19:43:57 +00:00
rpoplin
ac58eb3cbb
Slightly better error message for the common error of only providing a dbsnp track but giving it zero clustering weight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4114 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:41:21 +00:00
rpoplin
5623e01602
GenerateVariantClusters and VariantRecalibrator now uses hapmap and 1kg ROD bindings (in addition to dbsnp) to distinguish between knowns and novels. It no longer looks at by-hapmap validation status so providing hapmap is highly recommended. Example on the wiki. Input variants tracks now must start with input.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4113 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:33:40 +00:00
hanna
bf0b6bd486
Update integration tests to use the new ROD syntax.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4112 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:13:30 +00:00
asivache
14198b74d5
Can now compute av. qualities and stddevs per cycle for both original (when present in bam) and recalibrated quals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4111 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 17:14:58 +00:00
asivache
23dbaa68e6
Can design assays when multiple (distinct) events occur at the same locus (one assay per event)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4110 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 16:52:47 +00:00
ebanks
b4baa3eb8f
Cleanup. INDELS model is now disconnected (and renamed 'DINDEL' in preparation for adding plumbing for Guillermo soon)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4106 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 14:52:51 +00:00
hanna
3dc78855fd
Command-line argument tagging is in, and the ROD system is hacked slightly to support the new syntax
...
(-B:name,type file) as well as the old syntax. Also, a bonus feature: BAMs can now be tagged at the
command-line, which should allow us to get rid of some of the hackier calls in GenomeAnalysisEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4105 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 03:47:57 +00:00
fromer
aa8cf25d08
Implemented fully symmetric sliding window read-backed phaser
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4104 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 21:12:32 +00:00
ebanks
cba5f05538
Small fixes for consistency in the numbers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4103 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:48:25 +00:00
rpoplin
7bbd67f3c4
Fixing stray comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4102 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:19:39 +00:00
rpoplin
85007ffa87
Some clean up for the variant recalibrator. Now uses @Input and @Output so that it can join the Queue party. Users now specify a -o, -clusterFile, -tranchesFile, and -reportDatFile. Example on the wiki. ApplyVariantCuts now has an integration test. Base quality recalibrator now requires a dbsnp rod or vcf file. Now that the base quality recalibrator is using @Output the PrintStream shouldn't be closed in OnTraversalDone.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4101 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:14:58 +00:00
delangel
f2b138d975
Small refactoring: make Haplotype a public class since it will be soon extended and shared with other callers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4100 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 17:52:36 +00:00
ebanks
43f1fb2380
Okay, finally done with VCF compression. Now:
...
1. Uses blocked gzip compression.
2. No more -bzip option available (since we can't compress to sdout).
3. Only file extensions that are compressed are .gz and .gzip.
4. No more need for CompressedVCFWriter.java
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4099 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 16:36:54 +00:00
ebanks
25fb53e7a2
Oops, forgot to call toLowerCase().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4097 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 14:43:24 +00:00
ebanks
7957b60768
We now automatically compress the output VCF if the file suffix is one of the supported types (.gz, .bz, .bz2). You can still specify -bzip if you want to use another file suffix (or pipe it to sdout for some reason).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4096 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 14:39:59 +00:00
rpoplin
7a8b6b87da
Committing Michael Yourshaw's patch for AnalyzeCovariates. We spawn each RScript process and wait for it to finish in series. Thanks Michael!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4095 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 13:06:25 +00:00
ebanks
9fb151f417
Minor update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4094 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 05:17:10 +00:00
ebanks
44f3c5639a
I have finally figured out that when you volunteer to do something in group meeting, you keep getting pestered about it on Mark's Omniplan doc until it gets done (except for contig aliasing, of course). As such...
...
We can now emit bzipped VCFs from the GATK.
Details: any walker that defines a VCFWriter for its @Output (i.e. pretty much every core walker from UG and on), also has associated with it the -bzip (--bzip_compression) boolean argument. When set, it will emit a VCF that is compressed with bzip2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4093 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 04:14:50 +00:00
hanna
691333f75c
Force isRequired() to be false for @Deprecated args.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4092 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:50:30 +00:00
hanna
5d6a6420a9
New behavior for filling it output streams: if required==true for a field and the field
...
is an output stream, we'll automatically create it and point it to stdout. Otherwise,
we'll leave it empty.
I think about it like this: marking a field 'required' indicates to the GATK that the
walker author requires a value for this field, and if the GATK can provide one without
end user intervention, it will. Maybe this is hackish. We'll try it and see.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4091 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:39:13 +00:00
ebanks
90aef66ec5
Minor fixes for my last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:25:29 +00:00
ebanks
ef795825fd
Yet more argument consistency updates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4089 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 20:52:30 +00:00
aaron
7474afa7a3
allow other objects access to the static method that resolves bam lists, and some renaming and improved documentation for the function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4087 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:52:00 +00:00
ebanks
ccda4f6ec1
More output consistency changes (updating wiki docs as I go along).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:46:08 +00:00
ebanks
c9c6ff49c2
Deprecated 'O' in favor of 'o' in the cleaner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4085 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:09:24 +00:00
ebanks
55a8306a0d
Update the @RMD tags to look for VariantContext.class instead of ReferenceOrderedDatum.class. Since the test for rod type is broken this won't affect anything right now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4084 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:49:37 +00:00
aaron
35b9883dd6
vcfwriter is in tribble now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4083 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:01:04 +00:00
aaron
2d3b6d89dc
adding the ability in Tribble to create indexes from a stream of features, so that we can create multiple indexes from one pass of the file. In the GATK we now create multiple indexes, and choose the
...
most appropriate based on feature density, and the longest feature in the file. Also:
- Converted Tribble to TestNG; it has better features and is about 6x faster.
- As much code clean-up as I could get done. More to do, especially in the example code.
- Moved asserts in the code to throw exceptions.
- Added getBinSize to the index interface; both indexes already implemented this.
- Removed the abstract parts of the indexCreator interface; this is now more simple.
- Added an IndexType enumeration; might be overkill but it is at least a single point of entry for index information.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4082 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 06:54:59 +00:00
kiran
295472bf69
Simple change to handle a no-call (must avoid asking for the second allele, which will be be null in this case). Also, added a hack to deal with input VCFs where there are no genotype likelihoods (needed in order to process Hapmap and 1KG VCFs). In this mode, called genotypes are assigned a likelihood of 0.96, and alternative genotypes are given 0.02 each. I know Beagle actually takes genotype data without likelihoods, so this might not be the right way to do this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4081 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:13:09 +00:00
kiran
dec713a184
Simple test code from Steve Schaffner to compute R^2 and D'. This is just for educational purposes. Don't use this code for anything, ever!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4080 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:06:16 +00:00
hanna
c177801d81
Add deprecated command-line arguments, and switched over UG to output to
...
-o/--out instead of -varout. Let's watch as our intrepid support engineer
gracefully responds to all the incoming questions of the form: "the GATK told
me to use -o instead of -varout. What do I do?"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4078 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 21:01:44 +00:00
hanna
b80cf7d1d9
Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 14:27:05 +00:00
ebanks
30a104228a
Don't require entropy reduction when cleaning only at known sites; instead we need to trust the known indels. This will improve consistency between lane-level and aggregated cleaning.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4076 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 02:44:38 +00:00
depristo
b6989289fc
Potential bug fix for bad references where some codons may have Ns
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4075 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 12:09:33 +00:00
kiran
121b4f23b6
Simple change to allow a list of samples or regular expressions to be provided in a text file (one line per sample).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4074 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 00:01:48 +00:00
ebanks
165dc6d3b0
Ryan, what did you decide about supporting this tool? Is it still useful?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4073 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 19:16:14 +00:00
ebanks
2ef2f1b24a
Fix UG's simple indel calculation model so that deletions are created correctly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4072 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 15:35:47 +00:00
fromer
1c4784999a
Updated to work exclusively in log10 space
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4069 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 21:31:07 +00:00
fromer
3af4e618cc
Fixed precision issues with PQ (phasing quality)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4068 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:34:47 +00:00
kshakir
88ca1fb22c
Lazy loading reflections so Queue can hack the classpath before the PluginManager looks for classes.
...
Removed extra quotes from 'cd' pre-exec command.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4067 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:29:52 +00:00
aaron
63ada20da5
allow RefSeq files to optionally contain the header line, which is the default output from the UCSC table browser
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4065 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:25:37 +00:00
fromer
effeedf1a3
Updated Bayesian phasing method to output per-site phasing statistics (and to not cap PQ at 40)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4064 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:55:47 +00:00
aaron
04e5b28f6d
updates for VCF; we can no longer cache genotypes or alleles in a static array, this is bad for sharred memory parallel runs. One instance per codec was better for performance than using ThreadLocal code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4063 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:34:44 +00:00
corin
8054b6b295
Changing a name of a column for variantevals output for easier reading by R--let me know if this needs to be updated elsewhere; it's just a space to an underscore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4062 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:18:16 +00:00
ebanks
4b94f8c21b
Silly me, I forgot to check for the contig boundaries. Thank goodness for performance tests!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4061 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 18:40:26 +00:00
aaron
f16bb1e830
fix for a bug in package utils.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4060 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 15:01:50 +00:00
fromer
15c5aa6e48
Efficient iteration over all possible combinations of variable assignments, for variables of arbitrary cardinalities
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4059 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 14:14:37 +00:00
ebanks
1ec305cd15
Fix for running the cleaner at the lane-level for known indels only: instead of relying on the reads to get the reference sequence, we now use an IndexedFastaSequenceFile in all cases and pad the reference with bases on either end. This allows us to deal with cases in which we are trying to clean just a single deletion-containing read with tiny LOD (so the read needs to be pushed off the seen reference; @Reference doesn't yet work for Read Walkers) and has the added benefit of allowing us now to get much larger known indels that aren't completely covered with reads.
...
Thanks to Matt for the advice.
Also, for Guillermo: while I was at it, I changed the .stats debug output to emit the original interval instead of the cleaned region.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4058 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 11:31:13 +00:00
ebanks
98f7679619
Fixed the bug reported on GS regarding a clipped read that got moved several hundred bases away. The code that got triggered here was written back in the original version of the cleaner and it never actually did the right thing.
...
While I was fixing it, I noticed that we weren't allowing the cleaner to un-clean reads with indels when they're wrong even though we should. Hypothetically, that should rarely happen: only when we can left-align out an indel or when the original mapper really went haywire. This situation is rare enough that I'm calling logger.info to let the user know it's happening and suggesting that they double-check that everything looks right with their reads. Better to be extra-cautious now that the cleaner is moving into the 1kg and Broad production pipelines soon
.
Mark, have no fear: this was truly a rare edge case - one that won't affect the cleaning stats. There is no need to re-clean the data processing paper bams!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4057 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 01:42:48 +00:00
aaron
3dc4d3c3a9
removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also
...
removed the auto-deletion of the reflections jar, and removed the very old OmniPlan document we had checked-in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4056 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 00:42:37 +00:00
fromer
1336ea17a3
quality-scored-based Bayesian phasing algorithm implemented
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4055 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-18 21:17:46 +00:00
fromer
553bda4e0e
PreciseNonNegativeDouble permits precise arithmetic operations on NON-NEGATIVE double values
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4054 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-18 21:10:58 +00:00
rpoplin
8f15b2ba72
Memory optimization for the VariantRecalibrator. Only add variants to the list if they pass the novelty and qual filters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4051 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 21:57:28 +00:00
kshakir
b7c60b9729
Queue now uses its own version instead of the gatk version.
...
Added a Queue release directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4050 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 19:34:23 +00:00
aaron
c1df293feb
remove testing code from tribble track builder, set the command line program in walker test to null to reclaim memory in integration tests, and removed some orphaned intergration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4046 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 23:52:01 +00:00
rpoplin
578e7fa36d
Don't output -0 as qual value in VariantRecalibrator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4044 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 16:47:58 +00:00
kiran
3d63302b70
Deprecated. Use SelectVariants instead.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4043 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 15:07:50 +00:00
depristo
20db00a3e8
Lazy reference loading; the engine doesn't fetch the reference bases until you actually call ref.getBases(). With the new hidden --dontUpdateUG to table recalibrator this is 2-3x faster than before. Enabled for locus, read, and rod walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4042 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:46:22 +00:00
aaron
9ab647b730
adding checks to the RefSeq rod for line's that contain less than the required number of columns (we expect there to be 16 columns)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4041 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:34:32 +00:00
aaron
b23545fafa
re-enable the check for up-to-date versions in the Tribble index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4039 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 12:47:58 +00:00
ebanks
37586d3a43
Don't exception out when bad aligners emit wonky alignments; instead, just don't clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4038 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 02:36:04 +00:00
depristo
a36951f11a
@output and @input arguments for table recalibration for use with Q
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4037 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 18:36:28 +00:00
depristo
61064d7075
GenotypeConcordance log file -- if provided, GC module will write FN/FP information to this file by context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4036 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 18:35:57 +00:00
depristo
0d209d5442
Nicer printing out of clustering
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4035 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 16:02:13 +00:00
kshakir
307c8ca027
Created a new playground script for cleaning bams in Firehose.
...
Some refactoring of Queue extensions for reusability in scripts.
Putting the extensions into the Queue.jar after building them.
More updates to GATK walker arguments specifying @Input and @Output for Queue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4032 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 23:52:24 +00:00
fromer
dfe2922b5e
First working version of statistical haplotype phaser
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4031 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 21:29:45 +00:00
ebanks
f36c0ed613
Stop building obsolete VCFTools and CGUtilities
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4030 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 19:28:36 +00:00
rpoplin
222f61df87
Bug fix for damoskow in TableRecalibration. Shouldn't try to update the reference mismatch rate tag for an unmapped read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4028 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 18:57:07 +00:00
kshakir
80a70ccf03
Repopulating rodsToSamples. Code reviewed by Eric.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4027 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 17:07:18 +00:00
hanna
cb144734c0
Getting rid of GenotypeWriter interface. Of note:
...
- GATKVCFWriter deleted, to be replaced if absolutely necessary when VCF writing goes into Tribble.
- VCFWriter is now an interface, for easier redirection.
- VCFWriterImpl fleshes out the VCFWriter interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4026 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 16:33:22 +00:00
kshakir
542d394e09
Cleaning up Queue debugging output.
...
-l DEBUG with local programs now prints out the stdout/stderr of the programs as they are run.
More documentation in the examples with a new even simpler CountReads example.
Took out unused option to build Queue GATK extensions separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4025 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:54:08 +00:00
chartl
49a3db9dfe
A brief implementation of a QD calculation that is not quite so bimodal for known variants (multiplicatively penalizes QD by (n variant samples)/(n variant alleles) ). Not sure how helpful this will be (which is why it is in oneoffs). Seems nice on MCKD1, but I'm still playing with the optimization.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4024 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:42:37 +00:00
chartl
c6a8fba922
Occasionally if a JEXL expression results in no variants being captured (like "QD > 20.0" on filtered variants) the per-sample mapping from samples to eval objects can be empty. This semi-hacky fix prevents null pointer exceptions in setting up the resulting empty table (by jumping straight to it in this case)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4023 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:37:45 +00:00
ebanks
f874e548aa
Shame on us. FlagStat used ints instead of longs, so we ended up getting negative read counts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4022 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 03:00:57 +00:00
kshakir
f39dce1082
Exposed CommandLineFunction defaults to the Queue.jar command line (see -help).
...
Added ability to skip up-to-date jobs where the outputs are older than the inputs.
Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names.
Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile
Moved Hidden from the GATK to StingUtils.
Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7
Added Queue to javadoc and testing build targets.
Added first Queue unit test.
Another pass at avoiding cycles in the DAG thanks to all function I/O being files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 21:58:26 +00:00