depristo
82f9327b5e
Throw the right exception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4666 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 22:18:42 +00:00
depristo
44d0cb6cde
New version of cutting routines for VQSR. Old code removed. Working unit tests. Best practice with testng integration test (everyone look at it). Walker test now allows you to not specify no. input files, if it can infer input counts from MD5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4664 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 16:19:56 +00:00
kshakir
62a106ca5a
Disabled VariantGaussianMixtureModelUnitTest
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4663 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 03:53:33 +00:00
kshakir
673fa841a4
Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader.
...
Removed obsolete usages of PackageUtils with updated PluginManager.
Ported Queue interval utilities written in scala over to Sting's java IntervalUtils.
Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles.
Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test).
While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1".
Upgraded to scala 2.8.1 and updated calls to deprecated functions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:14:28 +00:00
depristo
42acc968b1
Unit tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4660 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:09:39 +00:00
ebanks
b51762c279
When you commit code late at night you tend to make careless mistakes... like forgetting to update integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4658 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 14:41:10 +00:00
depristo
988da428ae
Bug fix for old style tranches file. ApplyVariantCuts moved over, and passes integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4657 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 14:38:26 +00:00
depristo
c5f8c4dd0d
VariantEval test for tranches file, plus cutting over VE to use the generic Tranches framework
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4656 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 13:52:40 +00:00
ebanks
69de3e51bf
Better precision for the calculated AF value. Now looks at the total number of samples to determine how much precision is necessary. Also, changing default min BQ used for calling in UGv2 to Q17.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4655 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 08:31:40 +00:00
depristo
ec83a4b765
Initial commit, without any tool changes, of a new infrastructure for determining tranches. This new version walker up from the lowest quality snps and determines Ti/Tv. This is marginally more stable than moving in the other direction when there are few novel variants (exomes). Can make a substantial difference in the size of the call set (10-20%). I'll hook it into the main system now. Includes an new class Tranche, isolated read/writing utilities that are now testing in TestVariantRecalibrator, which should be moved to UnitTest as soon as I can figure out how to do this on my mac.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4654 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 23:52:49 +00:00
depristo
ed6396ed43
No longer getting the inet, it seems to potentially hang the JVM
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4653 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 23:49:42 +00:00
ebanks
2f6666a988
Correcting traversal statistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4652 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 22:46:58 +00:00
depristo
dbde721dd0
Bug fix for filtered records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4651 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 18:54:51 +00:00
aaron
698e5cf345
for GATK style codecs, make sure we fill in their GenomeLocParser from the RMDIndexer
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4650 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 18:44:15 +00:00
delangel
2f3be24a00
Improvement in exact allele frequency calculation model (still under test, but this is definitely better than what I had before). Instead of approximating log(10^x+10^y) as max(x,y), approximate full Jacobian formula max(x,y)+log(1+10^-abs(x-y)) with static lookup table for the second term.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4647 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 01:22:35 +00:00
asivache
2e0296fef9
NWayOut logic slightly changed: 1) results.list file is gone; 2) now with -nWayOut one can specify either a) suffix to attach to every output file (i.e. cleaned reads from inputK.bam will be sent to inputK.suffix.bam) or b) *.map tab-separated file that must list <input_name> <output_name> mappings, one per line, for every input file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4645 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 20:32:16 +00:00
asivache
a1adfb91ce
And now @Hidden tags are really in place :-/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4644 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 20:28:40 +00:00
asivache
68ce55148e
(pseudo-)genotyping functionality added: force-emits calls (including REF) at specified locations. Currently @Hidden for testing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4643 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 20:25:40 +00:00
hanna
8e36a07bea
Convert GenomeLocParser into an instance variable. This change is required
...
for anything that needs to be simultaneously aware of multiple references, eg
Queue's interval sharding code, liftover support, distributed GATK etc.
GenomeLocParser instances must now be used to create/parse GenomeLocs.
GenomeLocParser instances are available in walkers by calling either
-getToolkit().getGenomeLocParser()
or
-refContext.getGenomeLocParser()
This is an intermediate change; GenomeLocParser will eventually be merged
with the reference, but we're not clear exactly how to do that yet. This
will become clearer when contig aliasing is implemented.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4642 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 17:59:50 +00:00
depristo
5ef4b234d8
Updates for broken integration tests. Counting annotations (AC, AF) now work correctly for AC = 0 sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4640 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-09 19:43:43 +00:00
depristo
4759fdd2ac
V1 of read and variant simulator and assessor. SimulateReadsForVariants generates BAM and VCF with given combinations of variant and read properties. AssessSimulatedPerformance produces a table suitable for analysis in R
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4637 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-08 21:01:33 +00:00
aaron
97db593efb
making my last commit message actually true
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4636 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 18:26:23 +00:00
aaron
be499fc986
making the reference optional (the GATK will set it on the first run if it's not included), and setting the seq index if they do supply it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4635 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 18:15:31 +00:00
ebanks
e05af54f3e
Found the cause of 80% of our non-called FNs: an excess of filtered bases were causing us to choose the wrong alternate allele. More details to dev team.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4634 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 03:39:57 +00:00
aaron
2a8c97a4a7
better error catching, as well as allowing for default index naming, <filename>.idx
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4633 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-06 19:12:19 +00:00
aaron
cb2e26a004
by request, an indexer tool to create Tribble style indexes outside of the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4632 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-06 18:59:06 +00:00
depristo
bbb890dd6c
Bug fix for variants in VCF header fetching to avoid null pointer when a VariantContext tribble codec doesn't have a header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4630 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-05 12:43:25 +00:00
ebanks
c9dbd8f80a
Bug fix for Tim: all point events must be treated equally
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4629 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-05 03:42:51 +00:00
rpoplin
913db5d1ab
Unfortunately when annotating sites with the UG the -G None option was wiping out the single annotations added by -A options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4625 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-04 19:27:23 +00:00
ebanks
816c86776e
Walker description was wrong and it was bothering me
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4624 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-04 02:17:09 +00:00
ebanks
87f6738d4c
Deprecated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4623 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-04 02:07:40 +00:00
chartl
42e9987e69
Bug fix to GenotypeConcordance. AC metrics get instantiated based on number of eval samples; if Comp has more samples, we can see AC indeces outside the bounds of the array.
...
Bug fix to LiftoverVariants - no barfing at reference sites.
AlleleFrequencyComparison - local changes added to make sure parsing works properly
Added HammingDistance annotation. Mostly useless. But only mostly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4622 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 19:23:03 +00:00
fromer
3d27defe93
Fixed output stats (percentage denominator)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4621 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 18:47:06 +00:00
ebanks
4e109f58bf
In preparation for Ryan's jumping into SLOD: getting rid of bad hack to ensure P(AF=i) is calculated in the strand-specific cases. With Mark's recent changes this is no longer necessary and just makes the code slower.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4620 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 03:44:59 +00:00
fromer
22d64f77ff
Added hidden --outputMultipleBaseCountsFile option to detect cases where a single read has more than one base at the same position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4619 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 03:22:48 +00:00
hanna
8f9bf82aa7
Bamboo is correctly interpreting test fails. Reverting forced-fail test
...
code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4617 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-02 19:32:34 +00:00
hanna
1df166b76e
Forcing a unit test fail to ensure that Bamboo is picking up on failed tests
...
as well as successes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4616 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-02 19:03:12 +00:00
fromer
a885ecf046
When merging MNPs, the phased flag and the phase quality (PQ) are determined simultaneously
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4613 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-02 14:44:26 +00:00
hanna
861ee3e37a
Changing testing framework from junit -> testng, for its enhanced configurability.
...
Initial test to see how Bamboo will respond. More detailed email to follow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4609 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 21:31:44 +00:00
asivache
fe3f78e1d3
make it full (absolute) path for the file names recorded in results.list
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4608 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 20:53:51 +00:00
asivache
2ac5e55130
typo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4607 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 20:38:02 +00:00
asivache
0e6dd38936
In n-way-out mode, added printing names of all the output files into 'results.list' file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4606 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 20:37:38 +00:00
fromer
64599d1074
Added debugging message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4605 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 19:51:42 +00:00
fromer
639ecdc931
Noted in comment that using a single sample in MergePhasedSegregatingAlternateAllelesVCFWriter does NOT update any of the INFO fields, though this could be changed in the future...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4604 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 19:02:52 +00:00
fromer
8439f0aa61
Check for VCFConstants.MISSING_VALUE_v4 when retrieving INFO fields and consider such values as non-existent
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4603 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 17:51:35 +00:00
asivache
aadd230636
N-Way-Out is back. Now uses SAMReadID to identify each read's source bam, so should be reliable. Interface is sort of ugly fo now: to generate output file names, .bam is stripped from input file names, then the value of -nWayOut argument is pasted on (and all the output files are written into the current dir).
...
Unrelated change: in the sorted-target mode (when we read sorted target intervals one by on from a file), one can now specify multiple semicolon-separated interval files (all must be sorted). Not hugely useful probably, but makes --targetIntervals always process its values in exactly the same way, so we are consistent (it has been already taking ;-separated args in unsorted mode)
NwayIntervalMergingIterator: reads in multiple sorted GenomeLoc input streams (iterators) and presents them as a single sorted and merged stream
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4602 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 16:06:51 +00:00
depristo
23cb399a88
Reasonable first pass at a correct SB calculation. Simple utilities to support it. VariantsToTable no longer prints filtered sites by default. New non-standard variant eval module to print comp sites not present in eval (FN finder)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4601 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-31 12:41:52 +00:00
delangel
30fae5cf18
Major redo of exact AF computation for UnifiedGenotyperV2. Fact of life is, there's no way we can compute an exact QUAL field and keep performing the AF computation in linear probability space. In good sites with lots of samples, the ratio of Pr(AC=K*|D) to Pr(AC=0|D) can be 10^1500 or some ridiculous large number like that, which no double can represent. So, we abandon probablity space and work now in log likelihood space, which has several major repercussions:
...
a) Sites were numerically well behaved now, but another hard fact of life is that the AF iteration is defined in linear Pr space, not in log likelihood space, and the math doesn't work out in log space. So, we need to convert back and forth from lin to log space.
b) As a consequence of a), the code got a major slowdown, and calling the 629 samples was about 15 times slower than before (sic).
c) To solve b), log10 of integers are now cached at init, and numerical approximations are now made. Most importantly, I'm using the approximation that log(exp(a) + exp(b)) ~= max(a,b) which seems almost inconsequential in practical performance but reduces computation time to what it was before. More detailes analyses are forthcoming. This approximation can be refined further on to avoid expensive log-exp conversions if further profiling and analysis deems it necessary.
Also, two other issues were solved:
a) Strand bias computation was actually wrong in the case where the optimal AC was bigger than max(forward reads,reverse reads). Now the code is exactly as buggy as the grid search model (all bugs are equal, but some are more equal than others)
b) Genotype likelihoods are now computed in a better way and if a likelihood < 0 we don't just cap to 0 but do something a bit smarter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4600 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-31 01:26:04 +00:00
hanna
d492621122
The TraversalEngine's habit of hanging onto old ROD states seems to have a bad
...
interaction with Tribble. In Tribble, keeping these references in memory until
the shard is flushed means keeping one 512K character buffer per object in
memory. Fixed by purging the reference to the object at the end of the
shard traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4599 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 17:09:58 +00:00
ebanks
1c056ea791
Users can now use VariantAnnotator to add annotations from one VCF to another. For example, if you want to annotate your target VCF with the AC field value from the rod bound to CEU1kg, you can specify -E CEU1kg.AC and records will be annotated with CEU1kg.AC=N when a record exists in that rod at the given position.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4598 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 16:38:31 +00:00
ebanks
1b3fc8ddd2
Doing things too quickly is also naughty. Thanks, Andrey. Now, we're even.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4597 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 14:50:04 +00:00
ebanks
58f7b4c595
Naughty use of assertions means that malformed records are not caught.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4596 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 14:41:38 +00:00
delangel
9a60e72364
Trivial change to LeftAlignVariants: make walker return number of aligned variants on map(), and print out the # of aligned variants at the end of the traversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4595 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 02:03:36 +00:00
hanna
2f8057bf24
Cleanup for multithreading memory leak during integration tests...unregister MXBean at end
...
of traversal to avoid holding a reference to the microscheduler, which holds a reference to
the engine, which in turn holds a reference to the walker, which itself holds a reference to
all the data aggregated during the course of the traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4594 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 18:37:42 +00:00
depristo
860de05a7c
Bug fix for PL vs. GL in header. PL now truly default output for UGv2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4592 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 12:39:18 +00:00
depristo
9782dde3dd
Bug fix for PL vs. GL in header. PL now truly default output for UGv2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4591 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 12:38:48 +00:00
ebanks
fe3cfb067c
very minor cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4590 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 02:11:33 +00:00
depristo
cbce3e3c83
General support for both GL (log10) and PL (phred-scaled) genotype likelihoods. All walkers now use the Tribble GenotypeLikelihoods object for parsing VCFs with genotype likelihood fields. Please use GenotypeLikelihoods object from now on for seamless support for GL and PL tags. UGv2 now uses PL by default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4589 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 01:48:47 +00:00
fromer
15183ed778
Reduced header to single sample when useSingleSample arg is given (to prevent lots of pointless no-calls)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4588 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 23:02:10 +00:00
fromer
34538bf2b3
Added ability to focus only on a single sample and/or emit only merged records in MNP merger
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4587 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 20:41:05 +00:00
kshakir
5cdd7a7ba4
There's no such thing as a sam index, so the GATK extension generator doesn't need to add an @Input for them.
...
Updated a call to swapExt to specify the directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4586 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 20:39:03 +00:00
hanna
4c23b1fe9c
Get rid of the static cache of ArgumentTypeDescriptors by making them an integral part of the
...
parsing engine. Hugely lowers our memory footprint in integrationtests, but not yet enough to
run Mark's new parallelized VariantEvalIntegrationTests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4585 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 19:44:55 +00:00
ebanks
e112df20df
Use a sorting VCF writer because records can flip positions during left-alignment
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4583 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 06:33:03 +00:00
ebanks
708e973911
Adding a walker to left-align indels in a VCF file (was able to reuse code from AlignmentUtils to do the hard part). The code correctly updates the alleles if they change. This makes it much easier to compare our indel calls to e.g. CG or dbSNP.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4582 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 06:08:26 +00:00
ebanks
ec442086ec
Minor refactoring of the cleaner allows me to add a trivial walker that left aligns the indels present in reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4581 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 03:39:10 +00:00
hanna
04e38929f0
Disabling parallelized version of VE integration tests. Still slow, but not
...
deadlocking any more.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4580 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 02:47:03 +00:00
ebanks
ffc0ed2b32
Renamed getName() to getSource() in VariantContext to be more accurate
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4579 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 02:21:41 +00:00
ebanks
52fc023d80
Added convenience methods to check/get the ID of the VariantContext
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4578 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 01:56:58 +00:00
fromer
a7af1a164b
Updated MNP merging to merge VC records if any sample has a haplotype of ALT-ALT, since this could possibly change annotations. Note that, besides the "interesting" case of an ALT-ALT MNP in a pair of HET sites, this could even occur if two records are hom-var (irrespective of using phasing). Note also that this procedure may generate more than one ALT allele.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4577 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 01:50:36 +00:00
depristo
e02aac0743
No longer print out 0 reads were filtered out... message when there were no reads scene at all
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4575 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-26 20:22:16 +00:00
depristo
b085648141
Parallelized VariantEval. Refactored output to support parallel output style. Minor improvements to testing framework to enable easy executeTestParallel to run -nt 1 and -nt 4 by default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4574 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-26 20:21:38 +00:00
kshakir
8211cee0b2
Queue UI Improvements:
...
- Forcing user to set the temp directory via -Djava.io.tmpdir to avoid filling up /tmp.
- By default deleting job outputs tagged as intermediate.
- Defaulting pipeline to scatter count 1 (no reads deleted).
- Cleaning up temp classes even when scripting fails.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4573 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-26 19:49:08 +00:00
ebanks
cedceb33cd
My only experience with getting external groups (GAP,dbSNP) to use VCF has been painful at best, so I'm not holding my breath to get indels for CG in VCF. To that extent, here's a oneoffs walker to convert from CG format to VCF for all 'del' & 'ins' types (but not 'sub' types, since they're too complex to code up in VCF and I don't care about them for now). rs ids are included.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4572 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-26 17:53:14 +00:00
ebanks
071799453c
More complete fix to previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4571 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-25 20:47:37 +00:00
ebanks
67a776d53c
Yikes! VariantEval was always loading genotypes unnecessarily when no sample list was provided because the order of the checks in the if statement wasn't optimal. This results in a massive performance penalty when running with many-sample VCFs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4570 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-25 20:30:23 +00:00
ebanks
0d97394c4f
Add capability to liftover to do the right thing when sections of the genome are reverse complemented. This does not work for indels (we don't try to reverse complement) because we need to figure out what the hell to do about the fact that the 'base to the left' that we automatically add on will be wrong because the location of the indel actually changes when reverse complemented. Sheesh.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4569 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-25 20:03:03 +00:00
fromer
c357ec775a
Trivially phases any hom site (since it is always correct to continue the previous haplotypes by appending the same allele onto both haplotypes)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4568 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-25 16:58:41 +00:00
rpoplin
da64183854
Fix for the case of the truth VCF file having multiple SNPs at the same locus.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4567 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-25 15:04:50 +00:00
hanna
3039c0de3c
Retire old ROD syntax.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4564 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 23:52:11 +00:00
depristo
78e71c4167
Fisher exact makes a return. Seems to be working properly. Current tagged as a work in progress. Needs to take the filtered context to be truly correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4561 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 20:35:44 +00:00
fromer
f06f955e06
Added count of number of mergeable records (within specified distance cutoff)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4560 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 20:11:15 +00:00
depristo
84b6d2926b
Useful walker that creates a new interval list with only the interval overlapping input sites list. Really a one-off walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4559 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 19:55:04 +00:00
depristo
78b4a1c240
VariantsToTable now supports the virtual TRANSITION field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4558 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 19:53:46 +00:00
hanna
e6d61197e6
Disable OTF indexing when writing indices for temporary VCFs when running
...
with -nt option. When last I checked in, Ryan was seeing a ~25% speedup
per shard by not indexing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4556 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 17:40:37 +00:00
depristo
e6b008f87c
Fixed >= vs. > test leading to failure to tolerate dynamic indexes that are created at *exactly* the instant the output VCF is closed too
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4555 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 16:11:14 +00:00
ebanks
72c5b75460
Tribble exceptions can be generated outside of the normal codec parsing code because we now lazy load the VCF genotype fields. I'm not sure how else to account for this (to make sure they show up as user errors and not GATK system errors) besides catching them here.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4554 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 15:22:17 +00:00
delangel
e24f7fec47
Fixed indel genotyper which broke yet again because we can't just call context.getBasePileup() without checking again for its existence in the first place.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4553 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 15:17:11 +00:00
ebanks
c0b4317311
Er, here's the right fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4552 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 15:08:25 +00:00
ebanks
181f901126
Fix for Ryan: don't pull reference sequence for the portions of reads that extend beyond the contig boundaries
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4551 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 14:38:26 +00:00
ebanks
9f76aed515
Fix for IDs 5zP7jJeffK2sdPH1BH4JBVSrQztVEDKP and nX0cuBjoqBW4NQFpM6dE13KpkCuYFpZu
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4550 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 14:05:27 +00:00
hanna
d4feb99d9a
For parallel ROD traversals, simplified reference sharding. Will replace
...
with a more sensible strategy for sharding w/o BAMs at some point after
ASHG.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4549 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 05:08:15 +00:00
fromer
9ba7269728
Fixed Integration Tests to output VCF files with -NO_HEADER
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4548 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:49:44 +00:00
fromer
60f88866dd
Uses VCFConstants instead of hard-coded constants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4547 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:49:01 +00:00
fromer
883b8ff80e
Removed flush() method from VCFWriter interface; added takeOwnershipOfInner parameter in constructor of wrapper VCFWriters to designate if the Writer should close the inner Writer it receives on construction
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4546 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:48:00 +00:00
fromer
1ea43be976
Removed flush() method from VCFWriter interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4545 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:46:42 +00:00
chartl
3566ad2146
Wrong if statement.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4544 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 17:37:45 +00:00
chartl
bf17f92b64
Do not look for samples in dbsnp binding
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4543 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 17:36:38 +00:00
ebanks
225cf49128
Implementing reference confidence estimate in UGv2 as per UGv1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4542 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 16:57:59 +00:00
delangel
cf9c9ae241
Three important updates for Dindel genotyper:
...
a) Fix it up because it broke with a recent checkin to annotate vcf with unfiltered depth.
b) Printout of ref/alt alleles in output vcf was incorrect because the start/stop positions of associated GenomeLoc were incorrectly computed in case of a deletion.
c) Redid Beagle input/output walkers as not assume that ref was a single base, not to assume that variant was a vcf and generalized it to be indel-capable, so now the Beagle walkers can be used for indels as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4541 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 16:00:16 +00:00
kshakir
b88cfd2939
Updated MD5s of VCFs, since the approximate command line arguments injected into the VCF headers now have a little more order to them thanks to changes in the ParsingEngine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4538 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 03:07:40 +00:00