Commit Graph

11366 Commits (14944b5d730c52d73765c9de41d8d157be2e6418)

Author SHA1 Message Date
Mark DePristo 14944b5d73 Incorporating clover into build.xml
-- See http://gatkforums.broadinstitute.org/discussion/2002/clover-coverage-analysis-with-ant for use docs
-- Fix for artificial reads not having proper read groups, causing NPE in some tests
-- Added clover itself to private/resources
2012-12-24 13:35:57 -05:00
Mark DePristo 7796ba7601 Minor optimizations for NanoScheduler
-- Reducer.maybeReleaseLatch is no longer synchronized
-- NanoScheduler only prints progress every 100 or so map calls
2012-12-24 13:35:56 -05:00
Mark DePristo 0f04485c24 NanoScheduler optimization: don't use a PriorityBlockingQueue for the MapResultsQueue
-- Created a separate, limited interface MapResultsQueue object that previously was set to the PriorityBlockingQueue.
-- The MapResultsQueue is now backed by a synchronized ExpandingArrayList, since job ids are integers incrementing from 0 to N.  This means we avoid the n log n sort in the priority queue which was generating a lot of cost in the reduce step
-- Had to update ReducerUnitTest because the test itself was brittle, and broken when I changed the underlying code.
-- A few bits of minor code cleanup through the system (removing unused constructors, local variables, etc)
-- ExpandingArrayList called ensureCapacity so that we increase the size of the arraylist once to accommodate the upcoming size needs
2012-12-24 13:35:56 -05:00
Mark DePristo b92f563d06 NanoScheduler optimization for TraverseReadsNano
-- Pre-read MapData into a list, which is actually faster than dealing with future lock contention issues with lots of map threads
-- Increase the ReadShard default size to 100K reads by default
2012-12-24 13:35:56 -05:00
Mark DePristo f849910c4e BQSR optimization: only compute BAQ when there's at least one error to delocalize
-- Saves something like 2/3 of the compute cost of BQSR
2012-12-24 13:35:56 -05:00
Mark DePristo eedc01ff22 BQSRPerformanceOverTime Qscript to asseess how much we are improving BQSR 2012-12-24 13:35:09 -05:00
Mark DePristo 0f0188ddb1 Optimization of BQSR
-- Created a ReadRecalibrationInfo class that holds all of the information (read, base quality vectors, error vectors) for a read for the call to updateDataForRead in RecalibrationEngine.  This object has a restrictive interface to just get information about specific qual and error values at offset and for event type.  This restrict allows us to avoid creating an vector of byte 45 for each read to represent BI and BD values not in the reads.  Shaves 5% of the runtime off the entire code.
-- Cleaned up code and added lots more docs
-- With this commit we no longer have much in the way of low-hanging fruit left in the optimization of BQSR.  95% of the runtime is spent in BAQing the read, and updating the RecalData in the NestedIntegerArrays.
2012-12-24 13:35:09 -05:00
Mark DePristo f6d5499582 The GATK engine now ensures that incoming GATKSAMRecords have GATKSAMReadGroupRecord objects in their header
-- Update SAMDataSource so that the merged header contains GATKSAMReadGroupRecord
-- Now getting the NGSPlatform for a GATKSAMRecord is actually efficient, instead of computing the NGS platform over and over from the PL string
-- Updated a few places in the code where the input argument is actually a GATKSAMRecord, not a SAMRecord for type safety
2012-12-24 13:35:09 -05:00
Ryan Poplin c8cd6ac465 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-20 14:58:04 -05:00
Ryan Poplin a098888f4d Updating missed UG md5 2012-12-20 14:57:53 -05:00
Tad Jordan b491c177ff Added functionality of outputting sorted GATKReport Tables
- Added an optional argument to BaseRecalibrator to produce sorted GATKReport Tables
- Modified BSQR Integration Tests to include the optional argument. Tests now produce sorted tables
2012-12-20 14:02:21 -05:00
Ryan Poplin b0cb513793 Adding a manual reviews file for the NA12878 knowledge base 2012-12-20 11:54:23 -05:00
Eric Banks d7dc370160 Re-enabling 1000G exome chip in the KB now that we have a good copy of the VCF 2012-12-20 00:16:33 -05:00
Eric Banks 6c3f5eefe9 Merged bug fix from Stable into Unstable 2012-12-19 22:29:21 -05:00
xingwei2012 22d13ccdab Bug fix for Queue LSF v8.3
the function ls_getLicenseUsage() is not supported by LSF v8.x, comment the line:

public static native lsfLicUsage.ByReference ls_getLicenseUsage()

Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-12-19 22:28:53 -05:00
David Roazen 828232d8d7 Fix a few fully-qualified usages of VariantContext classes in the HybridSelectionPipeline
Other QScripts may need to be updated as well to reflect the new package names.
2012-12-19 16:46:00 -05:00
Eric Banks a5f0fb3c4f Added some reviews manually 2012-12-19 15:22:12 -05:00
Eric Banks 58e7a7f6f3 Don't use exome chip for NA12878 KB because it's bad data 2012-12-19 14:26:39 -05:00
Eric Banks 4e173ef3b3 Merged bug fix from Stable into Unstable 2012-12-19 11:47:50 -05:00
Eric Banks 4a7e0427a3 Pushing the RR bug fix that I puished into unstable into stable, as requested by Tim 2012-12-19 11:47:16 -05:00
Ryan Poplin 54e5c84018 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-19 11:31:40 -05:00
Ryan Poplin aa39037be8 updating UG integration tests. 2012-12-19 11:31:35 -05:00
Eric Banks e6797d2869 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-19 11:02:19 -05:00
Eric Banks 70479cb71d RR bug fix: we were failing when a read started with an insertion just at the edge of the consensus region.
The weird part is that the comments claimed it was doing what it was supposed to, but it didn't actually do it.
Now we maintain the last header element of the consensus (but without bases and quals) if it adjoins an element with an insertion.

Added the user's test file as an integration test.
2012-12-19 10:59:07 -05:00
David Roazen f2f8172d0c Merged bug fix from Stable into Unstable
Resolved merge conflicts

Conflicts:
	private/java/test/org/broadinstitute/sting/gatk/walkers/na12878kb/core/MongoVariantContextUnitTest.java
	private/java/test/org/broadinstitute/sting/gatk/walkers/na12878kb/core/NA12878KBUnitTestBase.java
2012-12-19 10:39:21 -05:00
David Roazen 07b369ca7e Move VCF/BCF2/VariantContext to new standalone org.broadinstitute.variant package
This is an intermediate commit so that there is a record of these changes in our
commit history. Next step is to isolate the test classes as well, and then move
the entire package to the Picard repository and replace it with a jar in our repo.

-Removed all dependencies on org.broadinstitute.sting (still need to do the test classes,
though)

-Had to split some of the utility classes into "GATK-specific" vs generic methods
(eg., GATKVCFUtils vs. VCFUtils)

-Placement of some methods and choice of exception classes to replace the StingExceptions
and UserExceptions may need to be tweaked until everyone is happy, but this can be
done after the move.
2012-12-19 10:25:22 -05:00
Ryan Poplin cda0c48570 auto-merge 2012-12-19 10:12:49 -05:00
Ryan Poplin 92185dd5f4 updating HC integration tests. 2012-12-19 10:12:07 -05:00
David Roazen 3ad45223be Avoid db naming collisions when running multiple instances of na12878kb tests concurrently 2012-12-19 10:01:44 -05:00
Eric Banks 46c7ea834d Updating lftp script based on new alignment.index format 2012-12-19 09:50:09 -05:00
Mark DePristo 1ca13f9581 Fundamentally better model for the NanoScheduler
-- Now each map job reads a value, performs map, and does as much reducing as possible.  This ensures that we scale performance with the nct value, so -nct 2 should result in 2x performance, -nct 3 3x, etc.  All of this is accomplished using exactly NCT% of the CPU of the machine.
-- Has the additional value of actually simplifying the code
-- Resolves a long-standing annoyance with the nano scheduler.
2012-12-19 09:31:31 -05:00
David Roazen d0cd29cb36 Merged bug fix from Stable into Unstable 2012-12-19 02:20:28 -05:00
David Roazen 393117a596 Revert "Merged bug fix from Stable into Unstable"
This reverts commit 46f870d491e7dcc2c888c93728ad1f21bc01de08, reversing
changes made to 10ff9958443eb94f1ba77dcb50909ab15144a750.
2012-12-19 02:15:35 -05:00
David Roazen 8221b5e8c1 Merged bug fix from Stable into Unstable 2012-12-19 02:11:32 -05:00
David Roazen 0d93330ab9 Fix bug in the PerSampleDownsamplingReadsIterator that could lead to excessive memory usage at traversal startup
This is a MUST-HAVE update for GATK 2.3 users who want to try out the new
ability to use -dcov with ReadWalkers.
2012-12-19 02:05:36 -05:00
Joel Thibault a29df3e094 oops 2012-12-18 19:03:12 -05:00
Joel Thibault ee22c1bf44 More TODOs 2012-12-18 18:47:43 -05:00
Joel Thibault 2b1db519d7 Add reads which overstep a boundary by a single base 2012-12-18 18:47:43 -05:00
Joel Thibault 9828b2990f Reads off the end of a contig fail SAM validation when using actual BAMs 2012-12-18 18:47:43 -05:00
Joel Thibault 72e2394b26 Create actual BAM 2012-12-18 18:47:43 -05:00
Joel Thibault d69d1f8988 Fun with varargs 2012-12-18 18:47:42 -05:00
Joel Thibault 1158c1529f Refactor region/read comparisons 2012-12-18 18:47:42 -05:00
eitanbanks c86f5e46b0 Merge pull request #9 from yfarjoun/speedy
GATKBAMIndex now passes unit test!
2012-12-18 15:31:11 -08:00
Yossi Farjoun 6ed9eb3da9 GATKBAMIndex now passes unit test! Problem was that SeekableBufferedStream seems to have a bug: it will read beyond the end of a file if asked to. 2012-12-18 17:32:26 -05:00
Ryan Poplin 902ca7ea70 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-18 15:45:33 -05:00
Ryan Poplin 3950f7b3e3 Increasing the INFORMATIVE_LIKELIHOOD_THRESHOLD value to 0.2 2012-12-18 15:45:12 -05:00
Ryan Poplin b5d590ba92 Based on NA12878 knowledge base experiments updating HC to allow for a much smaller minimum kmer length in the assembly graph. 2012-12-18 15:43:56 -05:00
eitanbanks 002ce9c1d5 Merge pull request #8 from yfarjoun/master
Huge speedup in initial traversal of BAM index files (x20 speed!)
2012-12-18 10:16:53 -08:00
Eric Banks 18728ec5bd Updates to the bundle script:
1. Add the symbolic 'current' link for the new bundle dir
2. Don't gzip and copy .out files
3. Don't call chr20 SNPs on the example BAM because it's now just a few reads on chr1
2012-12-18 11:16:42 -05:00
Mark DePristo 16eb1c5436 Optimization to TraverseReadsNano
-- Don't just read all inputs into a list, and then provide an iterator to that list, actually make a real iterator so NanoScheduler input thread can contribute meaningfully to the work load
-- Use NanoScheduler progress function, instead of home-grown updater
2012-12-18 10:14:47 -05:00