Commit Graph

2315 Commits (255cc246a2209f3a9a4b86fad74f0753a2ece575)

Author SHA1 Message Date
hanna 391f248640 Inserted a dangerous (but hidden) command-line argument for use by the Picard team.
Used to process intervals over BAMs without indices.  Tim understands the risks but
wants this anyway, as a temporary solution to a pipeline problem.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5148 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 22:10:06 +00:00
kiran cab426f86f VariantEval 3.0 is now in core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5139 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 17:42:08 +00:00
fromer c59b2a8296 Removed experimental "master merging" from CombineVariants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5138 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 17:13:05 +00:00
kiran b0432ee1e2 First part of a two-stage commit. Removing old VariantEval to make room for VariantEval 3.0 in core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5137 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 17:03:41 +00:00
ebanks d406d9b3fc There's no reason to special case no-calls if they already have PLs associated with them. Just use the PLs!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5136 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 15:05:45 +00:00
kiran 83dcca7e82 Added ability to load a GATKReport from disk.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5134 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 05:31:49 +00:00
depristo b5d1aab8dc Scripts to create the GATK IAM user and give him/her rights to PutObject (and only PutObject) into the S3 storage instance. Updated the GATKRunReport to now upload using the GATK user, not mark@depristo.com. Running with -et AWS_S3 sends run reports up to the Amazon S3 cloud now. Going to request a few external users try this option so we can see it running at scale. I'm sure S3 can handle a few hundred thousand 1Kb uploads per days, though
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5132 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 03:48:33 +00:00
depristo 197c91e2fb Working implementation of GATKRunReport POSTing to Amazon Web Services S3 storage. Requires users to explicitly provide the secret key to do the upload. Am investigating options to avoid having to do this in the future. Pretty cool little experiment for those who are interested in S3 interaction (extremely trivial)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5130 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-30 21:23:54 +00:00
depristo 8640ca6278 Trivial bug fix so that we don't bring the start up TraversalEngine banner twice when we only process a single locus
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5129 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-30 21:22:16 +00:00
scalvo 5934b9cb82 Augment function isChrM by allowing "CRS" in addition to "chrM" or "MT", as a standard contig name indicating the mitochondrial chromosome. CRS stands for Cambridge Reference Sequence and is the standard in the field.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5119 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 22:45:45 +00:00
asivache 8d389e149f Now can deal with input files that contain multiple copies of the same event. Only one assay sequence will be designed for each distinct variant, redundant variants will be discarded. Redundancy is defined as same start, same variant type, same ref and alt alleles (it does not matter, e.g., what the sample was as we do not record sample information anywhere).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5115 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 21:42:29 +00:00
kiran 9cb1ae384c Constant precision for floating point numbers. Added integration test - carries over tests from VariantEval with the necessary modifications to command-line arguments and md5s. Disabled use of 'synchronized' keyword because I clearly don't get how that keyword is supposed to work yet...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5107 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 05:19:18 +00:00
depristo f29bb0639b Documentation and cleanup of the distributed GATK implementation. Detailed documentation -- given that Matt will be extending the system in the near future -- about how the locking and processing trackers work. Added error trapping to note that distributed, shared-memory parallelism isn't yet implemented, instead of just not working silently. General utility function for the analysis of distributedGATK operation in the analysis directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5106 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 03:40:09 +00:00
asivache f036a178f1 Added support for MAF features. So far works for MAF Lite only, annotated MAF is NOT TESTED yet AT ALL.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5105 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 03:20:46 +00:00
asivache ac3fd567b4 Ugly one-off error fixed in building design sequences for indels: the event position is immediately *before* the event, so the ref base at the current locus is the base immediately *before* [ref/alt] element
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5103 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 02:53:03 +00:00
kiran 3e9f185dad Fixed issue with GenotypeConcordance being initialized incorrectly when the first seen comptrack had no samples.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5102 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 01:12:27 +00:00
kiran 58f0ecff89 Fixes to support evaluations with TableType elements - each such object now gets a separate entry in the output table. Added codon degeneracy stratification. Handle null elements in reports (useful for debugging).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5101 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 22:09:59 +00:00
hanna a264b16358 Patch from Brett (with minor tweaking by me) to expose all the relationships
of a particular sample in hash format.  Thanks, Brett!


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5100 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 21:46:13 +00:00
depristo 61c29d550d Fix for NullPointer where a run starts but there's nothing to do (no shards) and reduceInit() wasn't being called correctly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5096 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 15:15:10 +00:00
ebanks d33162145b Moving the --sites_only argument up into the VCFWriter itself so that any walkers that write VCFs can choose not to emit genotypes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5088 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 19:38:16 +00:00
kiran 22e599ec76 Fixed output report to properly handle evaluation modules with TableType objects. Promoted CpG to a standard stratification. Demoted Filter to a non-standard stratification. Now, if the filter stratification is not specified, VariantEval only evaluates PASSing sites.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5084 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 17:38:21 +00:00
ebanks 0429301536 Added ability to output just sites (no genotypes) from UG with the --sites_only argument. Note that we do still genotype in this mode so that the INFO annotations are identical, but we strip the genotypes out of the VC right before writing to output. In other words, this is not designed to make UG go faster; the point here is to allow downstream tools not to have to parse GTs if they don't want to. Here you go, Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5081 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 14:52:38 +00:00
ebanks 01e032e89c Missorted BAMs are User Exceptions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5080 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 14:09:39 +00:00
depristo be697d96f9 An apparently robust implementation of the file locking for distributed computation, using Lucene's file creation locking approach. It is worth trying out for those with large-scale, high-cost data sets. Details and discussion at group meeting on Wednesday. Some cleanup still needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5079 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 13:45:40 +00:00
delangel db2e2cb0ff Another trivial change to make VQSR work with indels
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5073 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 19:05:31 +00:00
hanna 9db02059ac Fix for Ryan's issue: reads ending with indel distort the location of the
pileup, resulting a two map() calls for the same locus (and no map call for
the locus immediately following).
Fixed bug and added comprehensive unit tests.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5067 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 19:49:39 +00:00
depristo c50f39a147 V3 of the distributed GATK. High-efficiency implementation. Support for status tracking for debugging and display. Still not safe for production use due to NFS filelock problem. V4 will use alternative file locking mechanism
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5063 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 16:45:07 +00:00
delangel fd864e8e3a Minimal necessary (but most likely not sufficient) changes to run VQSR on indel data: don't fill Ti/Tv fields if non-SNP, request VC only st start of position, check if isSNP() before doing snp-specific operations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5062 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 02:36:36 +00:00
depristo a51061fd96 Improved distributed processing analytics. Still not 100% ready for prime-time. More improvements incoming. Iterator claim now supports requests to obtain in a single atomic claim (one lock) multiple sequential shards, which radically reduces overhead. However, deadlocking is still possible...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5061 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 16:17:25 +00:00
ebanks 2d4bcb60a1 Don't print out alt alleles for ref calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5060 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 06:33:31 +00:00
ebanks 2ba35dc7ba Bad chain files are user errors
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5059 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 06:04:36 +00:00
ebanks 2bbcc9275a Committing the fragment-based calling code. Results look great in all datasets (will show this at 1000G this week with Ryan). Note that this is an intermediate commit. The code needs to be cleaned up and the fragmentation code needs to be moved up into LocusIteratorByState. This should all happen later this week, but I don't want Ryan to have to keep running from my own personal Sting directory. The current crappy implementation adds ~10% to the runtime, but that should all go away in the next iteration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5058 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 05:04:17 +00:00
depristo 9b1b8d46aa Performance tracking of GenomeLocProcessingTrackers, as well as a marker for where to put tracker in HierarchicalMicroScheduler
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5051 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 22:24:42 +00:00
rpoplin 95d6ddc38c lastProgressPrintTime should only be updated when a progress log is printed not when a performance log is printed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5050 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 22:23:14 +00:00
ebanks 78a43faebe Adding options to warn instead of erroring out (so that you can see all errors in one shot) and to skip filtered records
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5042 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 05:24:28 +00:00
ebanks 02b5d4357f Deprecated
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5041 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 05:05:07 +00:00
ebanks c3dbbe7f91 Bug fix: don't assume users won't use arbitrary rods on the commandline
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5040 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 04:59:28 +00:00
hanna aea121a9d5 <key>=<value> tagging support for command-line arguments. Unfortunately, still
very hard to validate and still very hard to use (requires core hacking to 
support additional tags).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5038 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 00:22:42 +00:00
depristo 85553cf5cb V2 cleaner, easily testing, shared memory and distributed GATK job management. Serious unit testing. Very much cleaner processing. Some code cleanup remains in removing now unused classes but the system is ready for general testing. Confirmed that one can run the UG 100 ways parallel without error, but edge cases may remain.
See documentation at:

http://www.broadinstitute.org/gsa/wiki/index.php/Parallelism_and_the_GATK#Distributed_Parallelism_.28Experimental.29

for examples on how to run this, or the testing Scala script

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5032 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:58:13 +00:00
depristo 41c8552d0a Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:54:03 +00:00
depristo cacdac3914 Major refactoring of shards. No longer uses interfaces but is now an actual object hierarchy with most of the important and common functionality pushed up to base classes. Eliminated a lot of duplicated code, and the shards are much more understandable now. Also now require a GenomeLocParser to work with their own GenomeLocs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5030 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:36:56 +00:00
rpoplin 24bc843ae8 Dynamically change the log message update rate so that short jobs receive frequent updates while longer running jobs receive fewer updates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5016 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 15:09:11 +00:00
rpoplin bd2af33a16 misc clean up in VQSR
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5014 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 21:04:31 +00:00
rpoplin 00453919d2 VQSR now only uses the valid polymorphic sites for training and truth sensitivity calculations. Any number of tracks whose ROD binding begins with the name truth can be used as truth sensitivity tracks.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5012 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 20:48:19 +00:00
depristo f8ba76d87c Incremental commit for distributed computation. Appears to work but has potential deadlock situation not yet debugged. Do not use yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5010 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-17 21:23:09 +00:00
ebanks 366c3a0b8f Incompatible chain files are user exceptions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5008 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-16 05:26:47 +00:00
hanna 579e0d59fa Rewrote warning message to discourage use of unsafe mode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5003 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 21:32:53 +00:00
hanna af31d02a2d Fix concurrency issue that periodically kills VariantEvalIntegrationTest --
a member field of RMDTrackBuilder was getting rebuilt every time it was
called, creating concurrency issues.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5001 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 18:52:21 +00:00
hanna bfbf75fe3e Fix error in command-line validation: don't ever allow intervaled access to unindexed read stream, no
matter what type of traversal it is.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4997 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 02:49:04 +00:00
delangel 00310c05bb Fix corner condition that happens when there are indels right at the end of a contig and there's not enough reference to build a haplotype.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4996 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 21:08:22 +00:00
fromer b107c97c1a Cannot have "=" sign in reason, so change to ":"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4991 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 17:23:44 +00:00
fromer b4a2112a0d Added the "previous locus" to interesting sites VCF (locus with respect to which the site is phased)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4990 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 17:19:20 +00:00
fromer e8f0ae4b09 Renamed and documented some phasing-specific classes to make their purpose clearer to someone browing through the code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4989 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 16:17:36 +00:00
fromer ffae7bf537 Moved phasing-specific utilities to phasing sub-directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4987 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 15:38:20 +00:00
rpoplin ce3d226183 Reverting back to the old definition of QD because it works better with large numbers of samples. The new QD is relegated to a new annotation: sumGLbyD. Tweaks to the new HaplotypeScore based on evaluation with better QD calculation. The default qual threshold in GenerateVariantClusters is updated to be in line with the variant quality scores coming from the exact model.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4984 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 14:12:30 +00:00
hanna e0092bb160 Experimental feature: change the rate at which log messages appear on-the-fly
and enable/disable performance logs from outside the JVM process.  Making this
available for the moment; we'll see whether it ends up being useful.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4983 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 04:20:53 +00:00
carneiro 9e93091e9a -baqGOP now takes phred scaled scores instead of probabilities in the command line.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4982 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 00:06:38 +00:00
hanna 6d855041ec Oops...forgot to commit the changes that allow primitive VCF streaming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4979 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 21:54:51 +00:00
delangel 8a6b126ea8 Several cleanups to IndelMetricsByAC:
- No longer a standard eval module to keep integration tests happy
- Remove class name overlaps with SimpleMetricsByAC so that modules don't overwrite each other's files, and to make it easier to grep results.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4978 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 18:35:24 +00:00
depristo 8fe5641b2e can explicitly set the now required ReferenceDataSource in unit tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4977 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 18:25:12 +00:00
depristo 468ef382b7 vastly improved progress meter that estimates % of work done and time until the job finishes and time remaining. Reordered GATK core initialization order -- intervals are created before the scheduler.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4975 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 17:32:27 +00:00
delangel bdd382198c Necessary changes to enable HaplotypeScore annotation for indels
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4974 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 01:09:12 +00:00
delangel 23597a2bde Variant Eval module that collects indel statistics (basic counts and event sizes) and partitions by AC (similar to SimpleMetricsByAC in the SNP case)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4973 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-12 01:08:09 +00:00
fromer 48052907a6 A hom genotype can always be considered phased
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4972 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-11 18:48:48 +00:00
fromer c2dd956888 Moved PrintReferenceVariantsWalker to playground
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4971 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-10 22:07:41 +00:00
ebanks ee348ac9d4 Add a hidden mode to the realigner to turn off SW but still use indels other than known ones (i.e. those already in the reads)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4969 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-10 20:27:04 +00:00
fromer 01c2091cd9 A LocusWalker to print the haploid reference genome as a VCF file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4968 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-10 16:59:41 +00:00
delangel 9648399630 Boneheaded silly bug in indel caller - posterior probability computation was using priors gotten from SNP heterozygosity, not indel heterozygosity. Added then indel het. argument to command line and hook it up (not a radical change in calls though, just a few dubious calls around the edges fall off)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4967 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-10 14:56:28 +00:00
aaron b24e1134f9 unfortunately samrecord pileup also uses zero length intervals to indicate deletions; this will have to be a BED specific exception.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4964 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 22:32:50 +00:00
kshakir b34e2f733f Removed stochasticity from IndelRealigner by random sampling using and seed based on the read list.
Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set.
Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found.
Now building all scala code at the same time, just like all java code is compiled at the same time.
Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile.
Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles.
Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified.
Fixed a couple errors when the <javadoc> task is run.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 22:03:36 +00:00
ebanks 60f45a7c49 Stupid me. Forgot to put this check in the last commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4959 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 19:16:41 +00:00
aaron 56b87da8f9 a better error message for the situation where a RMD track generates a negitive length interval; the user will now see a message like "Bad input: A feature produced by the reference metadata track named "bed" at position chr1:10434-10433 has a start greater than the stop; this is an invalid position "
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4958 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 19:06:04 +00:00
ebanks 4272b824d6 unused imports
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4957 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 18:33:12 +00:00
ebanks 2ac5c52281 Better error message as per Mark
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4953 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:44:02 +00:00
ebanks e0d091b3db Die gracefully if the bam is malformed with quals that are too high
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4952 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:39:08 +00:00
kiran d88fd7212f Changes to allow the primary key of a table to be hidden. Formatting changes to account for when that column is hidden.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4948 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:27:19 +00:00
kiran 307c41c128 Changes to allow the primary key of a table to be hidden. Formatting changes to account for when that column is hidden.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4947 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:26:38 +00:00
kiran e9201b81d1 A more general method for specifying samples to act on from the command-line. Supports samples specified individually on the console, a file of samples, or regular expressions to select multiple samples.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4945 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 14:54:56 +00:00
carneiro 5e9a8f9cb3 Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base.
Adding the first version of the techdev pipeline (tdPipeline)




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 22:25:08 +00:00
aaron cba436fa2f small fix for the table codec; if you see a header line, you know you've finished parsing the header. Also also some changes to return the ref ordered data pool test to using MappedStreamSegment instead of EntireStream
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4942 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 21:20:26 +00:00
fromer 4b37710bcd Added validator for phasing using read information, e.g., PacBio: ReadBasedPhasingValidationWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4940 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 20:05:56 +00:00
delangel d203f5e39a Experimental change in how we classify indels - up to now, an indel of say AA was counted as a 2-mer repeat expansion. But in reality, if the event is sounded by A's it's really a multiple monomer expansion. So, we first reduce the indel bases in case they are made of repeated elements before classifying them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4939 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 17:13:18 +00:00
rpoplin 4ac0590744 Fix for NaNs in the rank sum tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4938 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 15:21:30 +00:00
hanna 7cdaffbe5c Create tmpdir if it doesn't exist.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4936 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 03:07:11 +00:00
hanna 0982d35f5b Bug fixes in streaming in Tribble data via /dev/stdin.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4935 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 02:43:04 +00:00
rpoplin 23dbc5ccf3 HaplotypeScore is revamped. It now uses reads' Cigar strings when building the haplotype blocks to skip over soft-clipped bases and factor in insertions and deletions. The statistic now uses only the reads from the filtered context to build the haplotypes but it scores all reads against the two best haplotypes. The score is now computed individually for each sample's reads and then averaged together. Bug fixes throughout. The math for the base quality and mapping quality rank sum tests is fixed. The annotations remain as ExperimentalAnnotations pending more investigation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4934 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 00:28:05 +00:00
ebanks 85714621be Better interface to Genotypelikelihoods class. Now you need to specify the format (GL vs PL) of the output string when calling getAsString(). All likelihoods are represented as GLs internally. QualByDepth no longer does its own conversion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4933 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-04 21:48:14 +00:00
ebanks 96729acd0d Optional argument to put the original position into the INFO field
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4930 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-04 19:22:44 +00:00
delangel caedfed860 Fix bug where indels being incorrectly classified in VariantEval module
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4929 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-04 18:01:48 +00:00
hanna 8d2c14b29c Update Picard / sam-jdk at Tim's request.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4925 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 02:17:25 +00:00
depristo d31c658c2e Organized performance monitoring passes unit tests and is more efficient
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4924 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 02:09:08 +00:00
depristo c51e745bae The engine can be null in a unit test, so check for it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4923 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 01:00:52 +00:00
depristo 75a7d8a76e Trivial formatting error
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4922 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 23:44:36 +00:00
depristo 5539c2d9f3 --performanceLog (-PF) X.dat argument now enabled. Writes out a table (R-friendly) of the performance of the GATK over time, exactly as a more detailed version of the INFO progress meter. R script for useful plotting of the performance of the GATK over time. Will be helpful for upcoming scalability testing and debugging of memory leaks and other incremental performance problems
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4921 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 23:34:21 +00:00
depristo 4c9746f463 Disabled performance log intermediate commit. Will be refactored and committed to the responsiblity along with documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4919 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 22:18:12 +00:00
hanna 3fc9862964 Unit test fixed - Tribble codecs aren't designed to be stateless, but I was
using one as though it was.  Fixed, and debug code reverted.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4917 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 17:47:52 +00:00
hanna b9cb57f4b9 A unit test is failing on bamboo in a way I can't reproduce (or even explain).
Checking in some debugging info.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4916 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 16:35:04 +00:00
hanna cba18116e4 A significant refactoring of the ROD system, done largely to simplify the process of
streaming/piping VCFs into the GATK.  Notable changes:
- Public interface to RMDTrackBuilder is greatly simplified; users can use it only to build 
  RMDTracks and lookup codecs.
- RODDataSource and RMDTrack are no longer functionally at the same level; RODDataSources now
  manage RMDTracks on behalf of the GATK, and the only direct consumers of the RMDTrack class
  are the walkers that feel the need to access the ROD system directly.  (We need to stamp out
  this access pattern.
A few minor warts were introduced as part of this process, labeled with TODOs.  These'll be
fixed as part of the VCF streaming project.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4915 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 04:52:22 +00:00
ebanks d70483c50a Automatically filter out reads with consecutive indel operators in the CIGAR string
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4914 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 04:42:54 +00:00
ebanks 848977678d No reason to convert the GLs to a String for formatting when they're just going to be converted to PLs later. That was 5% of the UG runtime...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4913 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-29 22:06:19 +00:00