hanna
391f248640
Inserted a dangerous (but hidden) command-line argument for use by the Picard team.
...
Used to process intervals over BAMs without indices. Tim understands the risks but
wants this anyway, as a temporary solution to a pipeline problem.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5148 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 22:10:06 +00:00
kiran
cab426f86f
VariantEval 3.0 is now in core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5139 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 17:42:08 +00:00
fromer
c59b2a8296
Removed experimental "master merging" from CombineVariants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5138 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 17:13:05 +00:00
kiran
b0432ee1e2
First part of a two-stage commit. Removing old VariantEval to make room for VariantEval 3.0 in core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5137 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 17:03:41 +00:00
ebanks
d406d9b3fc
There's no reason to special case no-calls if they already have PLs associated with them. Just use the PLs!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5136 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 15:05:45 +00:00
kiran
83dcca7e82
Added ability to load a GATKReport from disk.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5134 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 05:31:49 +00:00
depristo
b5d1aab8dc
Scripts to create the GATK IAM user and give him/her rights to PutObject (and only PutObject) into the S3 storage instance. Updated the GATKRunReport to now upload using the GATK user, not mark@depristo.com. Running with -et AWS_S3 sends run reports up to the Amazon S3 cloud now. Going to request a few external users try this option so we can see it running at scale. I'm sure S3 can handle a few hundred thousand 1Kb uploads per days, though
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5132 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 03:48:33 +00:00
depristo
197c91e2fb
Working implementation of GATKRunReport POSTing to Amazon Web Services S3 storage. Requires users to explicitly provide the secret key to do the upload. Am investigating options to avoid having to do this in the future. Pretty cool little experiment for those who are interested in S3 interaction (extremely trivial)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5130 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-30 21:23:54 +00:00
depristo
8640ca6278
Trivial bug fix so that we don't bring the start up TraversalEngine banner twice when we only process a single locus
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5129 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-30 21:22:16 +00:00
scalvo
5934b9cb82
Augment function isChrM by allowing "CRS" in addition to "chrM" or "MT", as a standard contig name indicating the mitochondrial chromosome. CRS stands for Cambridge Reference Sequence and is the standard in the field.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5119 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 22:45:45 +00:00
asivache
8d389e149f
Now can deal with input files that contain multiple copies of the same event. Only one assay sequence will be designed for each distinct variant, redundant variants will be discarded. Redundancy is defined as same start, same variant type, same ref and alt alleles (it does not matter, e.g., what the sample was as we do not record sample information anywhere).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5115 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 21:42:29 +00:00
kiran
9cb1ae384c
Constant precision for floating point numbers. Added integration test - carries over tests from VariantEval with the necessary modifications to command-line arguments and md5s. Disabled use of 'synchronized' keyword because I clearly don't get how that keyword is supposed to work yet...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5107 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 05:19:18 +00:00
depristo
f29bb0639b
Documentation and cleanup of the distributed GATK implementation. Detailed documentation -- given that Matt will be extending the system in the near future -- about how the locking and processing trackers work. Added error trapping to note that distributed, shared-memory parallelism isn't yet implemented, instead of just not working silently. General utility function for the analysis of distributedGATK operation in the analysis directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5106 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 03:40:09 +00:00
asivache
f036a178f1
Added support for MAF features. So far works for MAF Lite only, annotated MAF is NOT TESTED yet AT ALL.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5105 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 03:20:46 +00:00
asivache
ac3fd567b4
Ugly one-off error fixed in building design sequences for indels: the event position is immediately *before* the event, so the ref base at the current locus is the base immediately *before* [ref/alt] element
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5103 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 02:53:03 +00:00
kiran
3e9f185dad
Fixed issue with GenotypeConcordance being initialized incorrectly when the first seen comptrack had no samples.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5102 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 01:12:27 +00:00
kiran
58f0ecff89
Fixes to support evaluations with TableType elements - each such object now gets a separate entry in the output table. Added codon degeneracy stratification. Handle null elements in reports (useful for debugging).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5101 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 22:09:59 +00:00
hanna
a264b16358
Patch from Brett (with minor tweaking by me) to expose all the relationships
...
of a particular sample in hash format. Thanks, Brett!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5100 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 21:46:13 +00:00
depristo
61c29d550d
Fix for NullPointer where a run starts but there's nothing to do (no shards) and reduceInit() wasn't being called correctly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5096 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 15:15:10 +00:00
ebanks
d33162145b
Moving the --sites_only argument up into the VCFWriter itself so that any walkers that write VCFs can choose not to emit genotypes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5088 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 19:38:16 +00:00
kiran
22e599ec76
Fixed output report to properly handle evaluation modules with TableType objects. Promoted CpG to a standard stratification. Demoted Filter to a non-standard stratification. Now, if the filter stratification is not specified, VariantEval only evaluates PASSing sites.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5084 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 17:38:21 +00:00
ebanks
0429301536
Added ability to output just sites (no genotypes) from UG with the --sites_only argument. Note that we do still genotype in this mode so that the INFO annotations are identical, but we strip the genotypes out of the VC right before writing to output. In other words, this is not designed to make UG go faster; the point here is to allow downstream tools not to have to parse GTs if they don't want to. Here you go, Ryan.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5081 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 14:52:38 +00:00
ebanks
01e032e89c
Missorted BAMs are User Exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5080 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 14:09:39 +00:00
depristo
be697d96f9
An apparently robust implementation of the file locking for distributed computation, using Lucene's file creation locking approach. It is worth trying out for those with large-scale, high-cost data sets. Details and discussion at group meeting on Wednesday. Some cleanup still needed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5079 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 13:45:40 +00:00
delangel
db2e2cb0ff
Another trivial change to make VQSR work with indels
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5073 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 19:05:31 +00:00
hanna
9db02059ac
Fix for Ryan's issue: reads ending with indel distort the location of the
...
pileup, resulting a two map() calls for the same locus (and no map call for
the locus immediately following).
Fixed bug and added comprehensive unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5067 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 19:49:39 +00:00
depristo
c50f39a147
V3 of the distributed GATK. High-efficiency implementation. Support for status tracking for debugging and display. Still not safe for production use due to NFS filelock problem. V4 will use alternative file locking mechanism
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5063 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 16:45:07 +00:00
delangel
fd864e8e3a
Minimal necessary (but most likely not sufficient) changes to run VQSR on indel data: don't fill Ti/Tv fields if non-SNP, request VC only st start of position, check if isSNP() before doing snp-specific operations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5062 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 02:36:36 +00:00
depristo
a51061fd96
Improved distributed processing analytics. Still not 100% ready for prime-time. More improvements incoming. Iterator claim now supports requests to obtain in a single atomic claim (one lock) multiple sequential shards, which radically reduces overhead. However, deadlocking is still possible...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5061 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 16:17:25 +00:00
ebanks
2d4bcb60a1
Don't print out alt alleles for ref calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5060 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 06:33:31 +00:00
ebanks
2ba35dc7ba
Bad chain files are user errors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5059 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 06:04:36 +00:00
ebanks
2bbcc9275a
Committing the fragment-based calling code. Results look great in all datasets (will show this at 1000G this week with Ryan). Note that this is an intermediate commit. The code needs to be cleaned up and the fragmentation code needs to be moved up into LocusIteratorByState. This should all happen later this week, but I don't want Ryan to have to keep running from my own personal Sting directory. The current crappy implementation adds ~10% to the runtime, but that should all go away in the next iteration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5058 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 05:04:17 +00:00
depristo
9b1b8d46aa
Performance tracking of GenomeLocProcessingTrackers, as well as a marker for where to put tracker in HierarchicalMicroScheduler
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5051 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 22:24:42 +00:00
rpoplin
95d6ddc38c
lastProgressPrintTime should only be updated when a progress log is printed not when a performance log is printed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5050 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 22:23:14 +00:00
ebanks
78a43faebe
Adding options to warn instead of erroring out (so that you can see all errors in one shot) and to skip filtered records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5042 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 05:24:28 +00:00
ebanks
02b5d4357f
Deprecated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5041 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 05:05:07 +00:00
ebanks
c3dbbe7f91
Bug fix: don't assume users won't use arbitrary rods on the commandline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5040 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 04:59:28 +00:00
hanna
aea121a9d5
<key>=<value> tagging support for command-line arguments. Unfortunately, still
...
very hard to validate and still very hard to use (requires core hacking to
support additional tags).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5038 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 00:22:42 +00:00
depristo
85553cf5cb
V2 cleaner, easily testing, shared memory and distributed GATK job management. Serious unit testing. Very much cleaner processing. Some code cleanup remains in removing now unused classes but the system is ready for general testing. Confirmed that one can run the UG 100 ways parallel without error, but edge cases may remain.
...
See documentation at:
http://www.broadinstitute.org/gsa/wiki/index.php/Parallelism_and_the_GATK#Distributed_Parallelism_.28Experimental.29
for examples on how to run this, or the testing Scala script
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5032 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:58:13 +00:00
depristo
41c8552d0a
Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:54:03 +00:00
depristo
cacdac3914
Major refactoring of shards. No longer uses interfaces but is now an actual object hierarchy with most of the important and common functionality pushed up to base classes. Eliminated a lot of duplicated code, and the shards are much more understandable now. Also now require a GenomeLocParser to work with their own GenomeLocs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5030 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:36:56 +00:00
rpoplin
24bc843ae8
Dynamically change the log message update rate so that short jobs receive frequent updates while longer running jobs receive fewer updates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5016 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 15:09:11 +00:00
rpoplin
bd2af33a16
misc clean up in VQSR
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5014 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 21:04:31 +00:00
rpoplin
00453919d2
VQSR now only uses the valid polymorphic sites for training and truth sensitivity calculations. Any number of tracks whose ROD binding begins with the name truth can be used as truth sensitivity tracks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5012 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 20:48:19 +00:00
depristo
f8ba76d87c
Incremental commit for distributed computation. Appears to work but has potential deadlock situation not yet debugged. Do not use yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5010 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-17 21:23:09 +00:00
ebanks
366c3a0b8f
Incompatible chain files are user exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5008 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-16 05:26:47 +00:00
hanna
579e0d59fa
Rewrote warning message to discourage use of unsafe mode.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5003 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 21:32:53 +00:00
hanna
af31d02a2d
Fix concurrency issue that periodically kills VariantEvalIntegrationTest --
...
a member field of RMDTrackBuilder was getting rebuilt every time it was
called, creating concurrency issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5001 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 18:52:21 +00:00
hanna
bfbf75fe3e
Fix error in command-line validation: don't ever allow intervaled access to unindexed read stream, no
...
matter what type of traversal it is.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4997 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 02:49:04 +00:00
delangel
00310c05bb
Fix corner condition that happens when there are indels right at the end of a contig and there's not enough reference to build a haplotype.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4996 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 21:08:22 +00:00
fromer
b107c97c1a
Cannot have "=" sign in reason, so change to ":"
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4991 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 17:23:44 +00:00
fromer
b4a2112a0d
Added the "previous locus" to interesting sites VCF (locus with respect to which the site is phased)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4990 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 17:19:20 +00:00
fromer
e8f0ae4b09
Renamed and documented some phasing-specific classes to make their purpose clearer to someone browing through the code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4989 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 16:17:36 +00:00
fromer
ffae7bf537
Moved phasing-specific utilities to phasing sub-directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4987 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 15:38:20 +00:00
rpoplin
ce3d226183
Reverting back to the old definition of QD because it works better with large numbers of samples. The new QD is relegated to a new annotation: sumGLbyD. Tweaks to the new HaplotypeScore based on evaluation with better QD calculation. The default qual threshold in GenerateVariantClusters is updated to be in line with the variant quality scores coming from the exact model.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4984 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 14:12:30 +00:00
hanna
e0092bb160
Experimental feature: change the rate at which log messages appear on-the-fly
...
and enable/disable performance logs from outside the JVM process. Making this
available for the moment; we'll see whether it ends up being useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4983 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 04:20:53 +00:00
carneiro
9e93091e9a
-baqGOP now takes phred scaled scores instead of probabilities in the command line.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4982 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 00:06:38 +00:00
hanna
6d855041ec
Oops...forgot to commit the changes that allow primitive VCF streaming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4979 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 21:54:51 +00:00
delangel
8a6b126ea8
Several cleanups to IndelMetricsByAC:
...
- No longer a standard eval module to keep integration tests happy
- Remove class name overlaps with SimpleMetricsByAC so that modules don't overwrite each other's files, and to make it easier to grep results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4978 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 18:35:24 +00:00
depristo
8fe5641b2e
can explicitly set the now required ReferenceDataSource in unit tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4977 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 18:25:12 +00:00
depristo
468ef382b7
vastly improved progress meter that estimates % of work done and time until the job finishes and time remaining. Reordered GATK core initialization order -- intervals are created before the scheduler.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4975 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 17:32:27 +00:00
delangel
bdd382198c
Necessary changes to enable HaplotypeScore annotation for indels
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4974 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 01:09:12 +00:00
delangel
23597a2bde
Variant Eval module that collects indel statistics (basic counts and event sizes) and partitions by AC (similar to SimpleMetricsByAC in the SNP case)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4973 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 01:08:09 +00:00
fromer
48052907a6
A hom genotype can always be considered phased
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4972 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-11 18:48:48 +00:00
fromer
c2dd956888
Moved PrintReferenceVariantsWalker to playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4971 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-10 22:07:41 +00:00
ebanks
ee348ac9d4
Add a hidden mode to the realigner to turn off SW but still use indels other than known ones (i.e. those already in the reads)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4969 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-10 20:27:04 +00:00
fromer
01c2091cd9
A LocusWalker to print the haploid reference genome as a VCF file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4968 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-10 16:59:41 +00:00
delangel
9648399630
Boneheaded silly bug in indel caller - posterior probability computation was using priors gotten from SNP heterozygosity, not indel heterozygosity. Added then indel het. argument to command line and hook it up (not a radical change in calls though, just a few dubious calls around the edges fall off)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4967 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-10 14:56:28 +00:00
aaron
b24e1134f9
unfortunately samrecord pileup also uses zero length intervals to indicate deletions; this will have to be a BED specific exception.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4964 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 22:32:50 +00:00
kshakir
b34e2f733f
Removed stochasticity from IndelRealigner by random sampling using and seed based on the read list.
...
Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set.
Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found.
Now building all scala code at the same time, just like all java code is compiled at the same time.
Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile.
Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles.
Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified.
Fixed a couple errors when the <javadoc> task is run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 22:03:36 +00:00
ebanks
60f45a7c49
Stupid me. Forgot to put this check in the last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4959 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 19:16:41 +00:00
aaron
56b87da8f9
a better error message for the situation where a RMD track generates a negitive length interval; the user will now see a message like "Bad input: A feature produced by the reference metadata track named "bed" at position chr1:10434-10433 has a start greater than the stop; this is an invalid position "
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4958 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 19:06:04 +00:00
ebanks
4272b824d6
unused imports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4957 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 18:33:12 +00:00
ebanks
2ac5c52281
Better error message as per Mark
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4953 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:44:02 +00:00
ebanks
e0d091b3db
Die gracefully if the bam is malformed with quals that are too high
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4952 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:39:08 +00:00
kiran
d88fd7212f
Changes to allow the primary key of a table to be hidden. Formatting changes to account for when that column is hidden.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4948 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:27:19 +00:00
kiran
307c41c128
Changes to allow the primary key of a table to be hidden. Formatting changes to account for when that column is hidden.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4947 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:26:38 +00:00
kiran
e9201b81d1
A more general method for specifying samples to act on from the command-line. Supports samples specified individually on the console, a file of samples, or regular expressions to select multiple samples.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4945 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 14:54:56 +00:00
carneiro
5e9a8f9cb3
Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base.
...
Adding the first version of the techdev pipeline (tdPipeline)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 22:25:08 +00:00
aaron
cba436fa2f
small fix for the table codec; if you see a header line, you know you've finished parsing the header. Also also some changes to return the ref ordered data pool test to using MappedStreamSegment instead of EntireStream
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4942 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 21:20:26 +00:00
fromer
4b37710bcd
Added validator for phasing using read information, e.g., PacBio: ReadBasedPhasingValidationWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4940 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 20:05:56 +00:00
delangel
d203f5e39a
Experimental change in how we classify indels - up to now, an indel of say AA was counted as a 2-mer repeat expansion. But in reality, if the event is sounded by A's it's really a multiple monomer expansion. So, we first reduce the indel bases in case they are made of repeated elements before classifying them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4939 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 17:13:18 +00:00
rpoplin
4ac0590744
Fix for NaNs in the rank sum tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4938 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 15:21:30 +00:00
hanna
7cdaffbe5c
Create tmpdir if it doesn't exist.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4936 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 03:07:11 +00:00
hanna
0982d35f5b
Bug fixes in streaming in Tribble data via /dev/stdin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4935 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 02:43:04 +00:00
rpoplin
23dbc5ccf3
HaplotypeScore is revamped. It now uses reads' Cigar strings when building the haplotype blocks to skip over soft-clipped bases and factor in insertions and deletions. The statistic now uses only the reads from the filtered context to build the haplotypes but it scores all reads against the two best haplotypes. The score is now computed individually for each sample's reads and then averaged together. Bug fixes throughout. The math for the base quality and mapping quality rank sum tests is fixed. The annotations remain as ExperimentalAnnotations pending more investigation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4934 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 00:28:05 +00:00
ebanks
85714621be
Better interface to Genotypelikelihoods class. Now you need to specify the format (GL vs PL) of the output string when calling getAsString(). All likelihoods are represented as GLs internally. QualByDepth no longer does its own conversion.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4933 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-04 21:48:14 +00:00
ebanks
96729acd0d
Optional argument to put the original position into the INFO field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4930 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-04 19:22:44 +00:00
delangel
caedfed860
Fix bug where indels being incorrectly classified in VariantEval module
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4929 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-04 18:01:48 +00:00
hanna
8d2c14b29c
Update Picard / sam-jdk at Tim's request.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4925 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 02:17:25 +00:00
depristo
d31c658c2e
Organized performance monitoring passes unit tests and is more efficient
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4924 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 02:09:08 +00:00
depristo
c51e745bae
The engine can be null in a unit test, so check for it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4923 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 01:00:52 +00:00
depristo
75a7d8a76e
Trivial formatting error
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4922 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 23:44:36 +00:00
depristo
5539c2d9f3
--performanceLog (-PF) X.dat argument now enabled. Writes out a table (R-friendly) of the performance of the GATK over time, exactly as a more detailed version of the INFO progress meter. R script for useful plotting of the performance of the GATK over time. Will be helpful for upcoming scalability testing and debugging of memory leaks and other incremental performance problems
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4921 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 23:34:21 +00:00
depristo
4c9746f463
Disabled performance log intermediate commit. Will be refactored and committed to the responsiblity along with documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4919 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 22:18:12 +00:00
hanna
3fc9862964
Unit test fixed - Tribble codecs aren't designed to be stateless, but I was
...
using one as though it was. Fixed, and debug code reverted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4917 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 17:47:52 +00:00
hanna
b9cb57f4b9
A unit test is failing on bamboo in a way I can't reproduce (or even explain).
...
Checking in some debugging info.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4916 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 16:35:04 +00:00
hanna
cba18116e4
A significant refactoring of the ROD system, done largely to simplify the process of
...
streaming/piping VCFs into the GATK. Notable changes:
- Public interface to RMDTrackBuilder is greatly simplified; users can use it only to build
RMDTracks and lookup codecs.
- RODDataSource and RMDTrack are no longer functionally at the same level; RODDataSources now
manage RMDTracks on behalf of the GATK, and the only direct consumers of the RMDTrack class
are the walkers that feel the need to access the ROD system directly. (We need to stamp out
this access pattern.
A few minor warts were introduced as part of this process, labeled with TODOs. These'll be
fixed as part of the VCF streaming project.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4915 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 04:52:22 +00:00
ebanks
d70483c50a
Automatically filter out reads with consecutive indel operators in the CIGAR string
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4914 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 04:42:54 +00:00
ebanks
848977678d
No reason to convert the GLs to a String for formatting when they're just going to be converted to PLs later. That was 5% of the UG runtime...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4913 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-29 22:06:19 +00:00