Commit Graph

5563 Commits (80d547ae71627e2f292a1a9c3d2b70f8e7efd76a)

Author SHA1 Message Date
droazen 80d547ae71 Fix for bug GSA-445: Sequence dictionary validation can be very slow with
large numbers of contigs. SequenceDictionaryUtils.getCommonContigsByName() was
running in O(n^2) time due to poor choice of data structure -- modified it to
run in O(n) time. Also removed an unnecessary O(n log n) step at another stage
in the sequence dictionary validation process. In tests with a 181,813-entry
sequence dictionary, runtime improved from an average of 21.4 minutes to 45.1
seconds.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5604 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 18:33:10 +00:00
chartl b81228fec1 Minor bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5603 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 17:30:40 +00:00
hanna 437db28937 Incorporating Khalid's feedback.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5602 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 16:22:49 +00:00
ebanks b6e7b5dace Updating to reflect my recent Tribble fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5601 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 11:48:00 +00:00
ebanks 4f17004590 Allow walkers to enforce the ordering in which ReadFilters are applied (so that they're now done in the order specified in the walker). Useful if you have a computationally expensive filter (like adaptor clipping) that should only be applied to reads passing all other filters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5600 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:34:50 +00:00
hanna 53db7b8faa Did some refactoring which broke some unit tests, and then failed to run
the unit tests.  Definitely not my best effort...


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5599 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:31:52 +00:00
ebanks 74755cfd1c Adding a ReadFilter to hard-clip out bases from adaptor sequences. This is actually slightly more correct than having it be part of LocusIteratorByState because it allows us to remove reads that are complete garbage (and there are definitely some) based on the insert sizes. However, although conceptually this is great, it doesn't actually work. 'Why?' you may ask. Because when we hard-clip reads it often changes their start positions... which means that reads are no longer passed to LocusIteratorByState in coordinate order... which makes it (understandably) barf all over the place (and makes for some really fascinating SNP calls). This took me forever to find. I'm going to bed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5598 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:15:58 +00:00
ebanks cd61ef7169 Re-enabling multi-threaded integration tests. To make this work, downsampling and annotations are disabled for this test so that we don't have randomization issues for it based on which shards get executed first.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5597 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:07:39 +00:00
hanna 1763a41e94 Oops...broke a Queue compile dependency on the GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5596 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 02:53:22 +00:00
hanna fece2167b3 Prototype implementation of protoshard merging when protoshard n and protoshard
n+1 completely overlap.  Gives a small but consistent performance increase in 
non-intervaled whole exome traversals (2.79min original, 2.69min revised). 
Needs a more in depth analysis of optimal shard sizing to determine a true
optimum.

Also renamed a variable because Khalid disapproved of my naming choices.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5595 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 02:09:14 +00:00
hanna 32d502c122 Enable BAM OTF index writing by default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5594 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 23:44:25 +00:00
chartl cc58e19621 This is now running. Expect results in a few weeks when the ~7k jobs have percolated through the week queue. Pray gsa1 doesn't go down.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5593 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 21:12:59 +00:00
chartl 6a26957b65 Bug squashing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5592 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 20:11:28 +00:00
chartl a1b7d28375 Initial VQSR full search script
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5591 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 20:03:48 +00:00
droazen cb3e8aec5e Modified the buildfile and help extractor doclet so that help text is only
extracted from source files that have been modified since the help resource
file was last generated. This significantly speeds up builds where only a few 
source files have been modified, at the expense of making clean builds take 
slightly longer. Here's some performance data gathered by testing the old and 
new versions of extracthelp in isolation and averaging across 10 runs:
old extracthelp, 1 modified source file: 20.1 seconds
new extracthelp, 1 modified source file: 7.2 seconds <-- woohoo! :)
old extracthelp, clean build: 17.8 seconds
new extracthelp, clean build: 20.5 seconds


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5590 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 18:40:53 +00:00
ebanks af09170167 As I threatened yesterday, I've moved the various and disparate randomization code out of the walkers. Now they all (except VQSRv1, whose days are numbered anyways) use a static generator available in the engine itself. Please use this from now on. The seed is reset before every individual integration test is run. I think there may still be an issue with the IndelRealigner but I need to confirm with the commit to see what testNG does. Integration tests are already broken anyways, so no big deal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5589 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 17:03:48 +00:00
kshakir 45ebbf725c Instead of always merging Picard interval files they are optionally merged by Sting Utils.
Disabled the MFCP while the FCP gets an update.
Minor updates to email messages for upcoming scala 2.9.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5588 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 21:12:05 +00:00
carneiro 89bb21d024 typo in the argument description
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5587 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:45:32 +00:00
rpoplin febb883511 updates to MDCP
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5586 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:44:46 +00:00
rpoplin 3f3f35dea0 UnifiedGenotyper now BAQs via ADD_TAG to facilitate using BAQed quals for GL calculations but unBAQed quals for annotation calculations. UnifiedGenotyper now produces SNP and indel calls simultaneously. 40 base mismatch intrinsic filter removed from UG to greatly simplify the code. RankSumTests are now standard annotations but the integration tests are commented out pending changes that will allow random annotations to work.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5585 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:06:24 +00:00
ebanks 1aa4083352 Fortunately this code isn't used by anyone right now, but it needs to be fixed before someone unwitingly does: flags were wrong according to the SAM spec.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5584 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 17:16:41 +00:00
hanna b231a40da5 Augment PrintLocusContextWalker with extended event info.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5583 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 13:42:48 +00:00
aaron ab5c4064ed quick bug fix for variant context utils: only calculate the max AC if we're using the mergeInfoWithMaxAC flag, and if so deal with sites that have multiple alternate alleles correctly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5582 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 05:36:52 +00:00
rpoplin cc713f2769 fixing exception text
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5581 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 00:29:13 +00:00
ebanks 4b451314b2 Only store a read in the mate hash if it could possibly be moved. This reduces memory consumption especially when dealing with a case of tons of unmapped reads at the end of the bam; however, it's only mildly helpful for chr1 of the Papuans (there's a truly massive pileup 120Mb into it; more thought needed at a later point). Integration tests changed only because some of the reads in the original bam were busted to begin with (it's an old pilot 1000G bam).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5580 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 22:20:09 +00:00
chartl 79b5fa6cc5 Structural refactoring in advance of dichotomization statistics; generalization of statistical test infrastructure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5579 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 18:52:32 +00:00
asivache 77ca4eef31 IntelliJ complains that @Override is not allowed when implementing interface methods. Whatever.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5578 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 16:57:59 +00:00
ebanks f4c06bb4ce Traversal now says 'done with mapped reads' instead of 'done' so we don't confuse users when there are a lot of unmapped reads left to process.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5577 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 15:11:28 +00:00
fromer 5eccc7e528 Added annotation of INCORRECT SNP-based aa annotations in case of MNPdependentAA:true
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5576 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 02:46:45 +00:00
chartl bb6a30611c Forgot to modify the test too. What a bad commit. Sorry guys.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5575 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 02:11:08 +00:00
chartl a0d096c993 Forgot an import statement
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5574 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 22:55:00 +00:00
chartl b52c3e7e30 Make the window and slide-by values command-line accessible, and standardize for every context. Move the test classes (which are abstract association context modules) into the proper directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5573 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 22:37:12 +00:00
droazen db9908ec02 Small correction to the unit test code from my last commit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5572 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:55:38 +00:00
droazen a5acb0b7a6 Fix for bug GSA-314: Detect -XL and -L incompatibility. An ArgumentException is
now thrown if the combination of -L and -XL intervals specified on the command 
line results in an empty interval set after set subtraction. 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5571 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:41:55 +00:00
hanna 798fb6a7a2 First draft of a script to measure performance of read walkers when merging
dynamically.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5570 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 15:35:14 +00:00
carneiro b722ebf244 quick help/comments updates to match the wikipage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5569 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 12:55:55 +00:00
rpoplin 96f0f0d706 Fixing use of String != String
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5568 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 01:12:00 +00:00
depristo 095125152b Updated to now longer include 2nd-best base output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5567 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 20:13:10 +00:00
rpoplin b2a0331e2d Pushing hard coded arguments into VariantRecalibratorArgumentCollection
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5566 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 19:55:09 +00:00
rpoplin 79c43845ad Changing Uniform approximation to Normal approximation in rank sum test. n factorial was overflowing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5565 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 18:18:39 +00:00
depristo b316c9a590 Renamed StratifyAlignmentContext to AlignmentContextUtils, and StatiefyContextType to ReadOrientation. Also, went through the system and deleted all references to second bases. That ship passed long ago. This was the actual commit, the last was an intellij error
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5564 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 15:36:17 +00:00
depristo 349661b958 Renamed StratifyAlignmentContext to AlignmentContextUtils, and StatiefyContextType to ReadOrientation. Also, went through the system and deleted all references to second bases. That ship passed long ago.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5563 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 15:35:09 +00:00
depristo 5cca100aea Eliminated the redundant StratifiedAlignmentContext, which previously just held a ReadBackedPileup, and made all of the class methods here just static functions. Far more logical organization, and avoided O(N) endless copying of data for the COMPLETE context. Many tools have been trivially reorganized to take an alignment context now. Everything passes integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5562 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 14:20:43 +00:00
rpoplin 40a25af58e Bug fixes in MDCP
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5561 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 00:04:38 +00:00
rpoplin 98798eb276 Adding ReadPos rank sum test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5560 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 22:28:41 +00:00
rpoplin 09e89c8c97 Adding ReadPos rank sum test. Transitioned rank sum tests over to using Chris's implementation in order to harmonize the codebase. There isn't any reason to have competing implementations of rank sum. Thanks to Chris for adding the necessary hypothesis testing options. WilcoxonRankSum.java will be deleted soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5559 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 22:26:35 +00:00
depristo 11822da578 Stand alone, GATK dependent tool that Reads a list of BAM files and slices all of them into a single merged BAM file containing reads in overlapping chr:start-stop interval. Highly efficient when working with thousands of BAM files. Can merge 1MB of sequence of 1600 4x BAMs in 4g in only 2 hours.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5558 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 13:41:29 +00:00
depristo 0b02804dcb Scripts for creating S3 IGV account
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5557 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 13:39:47 +00:00
depristo 8fdad20f33 Useful utility for looking at the file size of GSA file systems
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5556 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 03:47:27 +00:00
depristo f59862dc44 A bit better echos
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5555 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-02 03:47:03 +00:00