gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	0c34e47a87	Merge pull request #50 from broadinstitute/gg_new_ReassignOneMappingQualityFilter_GSATDG-77 New ReadFilter allows users to reassign a specific mapping quality...	2013-02-20 04:51:45 -08:00
Eric Banks	551d33686c	Merge pull request #47 from broadinstitute/aw_reduceread_perf_1_GSA-761 Reduce memory footprint of SyntheticRead by replacing several Lists with...	2013-02-20 04:49:07 -08:00
Geraldine Van der Auwera	e674b4a524	Added new ReadFilter that allows users to specifically reassign one single mapping quality to a different value. Useful for TopHat and other RNA-seq software users.	2013-02-20 01:24:45 -05:00
MauricioCarneiro	76810465aa	Merge pull request #40 from broadinstitute/gg_retrieve_readfilters_GSATDG-63	2013-02-19 19:42:35 -08:00
Mark DePristo	910d966428	Extend timeout of NanoScheduler deadlock tests -- The previous timeout of 1 second was just dangerously short. Increase the timeout to 10 seconds	2013-02-19 20:25:25 -05:00
Eric Banks	9dfdb9528b	Merge pull request #49 from broadinstitute/gda_hidden_ug_args Hide arguments related to reference sample operation in UG - for interna...	2013-02-19 16:18:32 -08:00
Eric Banks	0055a6f1cd	Merge pull request #45 from broadinstitute/mc_fix_indelrealigner_GSA-774 Fix to the Indel Realigner bug described in GSA-774	2013-02-19 16:16:48 -08:00
Guillermo del Angel	5a0a9bc488	Hide arguments related to reference sample operation in UG - for internal use only until paper is published and docs are polished.	2013-02-19 19:06:42 -05:00
depristo	334d124145	Merge pull request #48 from broadinstitute/rp_calcAlignmentByteArrayOffset_contract_GSA-772 Fix for calculating read pos rank sum test with reads that are informati...	2013-02-19 15:09:58 -08:00
Geraldine Van der Auwera	faef85841b	Added GATKDocs fct to indicate default Read Filters for each tool -- Added getClazzAnnotations() as hub to retrieve various annotations values and class properties through reflection -- Added getReadFilters() method to retrieve Read Filter annotations -- getReadFilters() uses recursion to walk up the inheritance to also capture superclass annotations -- getClazzAnnotations() stores collected info in doc handler root, which is unit.forTemplate in Doclet -- Modified FreeMarker template to use the Readfilters info (displayed after arg table, before additional capabilities) -- Tadaaa :-) #GSATDG-63 resolve	2013-02-19 16:12:29 -05:00
Mauricio Carneiro	371ea2f24c	Fixed IndelRealigner reference length bug (GSA-774) -- modified ReadBin GenomeLoc to keep track of softStart() and softEnd() of the reads coming in, to make sure the reference will always be sufficient even if we want to use the soft-clipped bases -- changed the verification from readLength to aligned bases to allow reads with soft-clipped bases -- switched TreeSet -> PriorityQueue in the ConstrainedMateFixer as some different reads can be considered equal by picard's SAMRecordCoordinateComparator (the Set was replacing them) -- pulled out ReadBin class so it can be testable -- added unit tests for ReadBin with soft-clips -- added tests for getMismatchCount (AlignmentUtils) to make sure it works with soft-clipped reads GSA-774 #resolve	2013-02-19 16:00:36 -05:00
Mauricio Carneiro	815028edd4	Added verbose error message to the PluginManager -- added a logger.error with a more descriptive message of what the most likely cause of the error is Typical error happens when a walker's global variable is not initialized properly (usually in test conditions). The old error message was very hard to understand "Could not create module because of an exception of type NullPointerException ocurred caused by exception null"	2013-02-19 16:00:35 -05:00
Alec Wysoker	ab75e053da	Reduce memory footprint of SyntheticRead by replacing several Lists with a single List of a small private static class that contains the attributes that were scattered across the several Lists.	2013-02-19 15:33:33 -05:00
Ryan Poplin	c025e84c8b	Fix for calculating read pos rank sum test with reads that are informative but don't actually overlap the variant due to some hard clipping. -- Updated a few integration tests for HC, UG, and UG general ploidy	2013-02-19 14:09:24 -05:00
Eric Banks	8eda0c50df	Merge pull request #46 from broadinstitute/md_active_regions_GSA-770 ActivityProfile and ActiveRegions respects engine interval boundaries	2013-02-19 09:49:47 -08:00
Mark DePristo	be45edeff2	ActivityProfile and ActiveRegions respects engine interval boundaries -- Active regions are created as normal, but they are split and trimmed to the engine intervals when added to the traversal, if there are intervals present. -- UnitTests for ActiveRegion.splitAndTrimToIntervals -- GenomeLocSortedSet.getOverlapping uses binary search to efficiently in ~ log N time find overlapping intervals -- UnitTesting overlap function in GenomeLocSortedSet -- Discovered fundamental implementation bug in that adding genome locs out of order (elements on 20 then on 19) produces an invalid GenomeLocSortedSet. Created a JIRA to address this: https://jira.broadinstitute.org/browse/GSA-775 -- Constructor that takes a collection of genome locs now sorts its input and merges overlapping intervals -- Added docs for the constructors in GLSS -- Update HaplotypeCaller MD5s, which change because ActiveRegions are now restricted to the engine intervals, which changes slightly the regions in the tests and so the reads in the regions, and thus the md5s -- GenomeAnalysisEngineUnitTest needs to provide non-null genome loc parser	2013-02-18 10:40:25 -05:00
MauricioCarneiro	e919d62156	Merge pull request #44 from broadinstitute/rp_hc_reference_padding	2013-02-17 21:46:58 -08:00
Ryan Poplin	b7e9c342c7	Reducing the size of the reference padding in the HaplotypeCaller.	2013-02-17 11:09:00 -05:00
MauricioCarneiro	029de71a44	Merge pull request #43 from broadinstitute/md_qualutils_cleanup Cleanup of QualityUtils	2013-02-16 12:59:26 -08:00
Mark DePristo	73a363b166	Update MD5s due to new QualityUtils calculations -- Increase the allowed runtime of one UG integration test -- The GGA indels mode runs two UG commands, and was barely under the 10 minute limit before. Some updates can push this right over the edge. Increased limit -- CalibrateGenotypeLikelihoods runs on a small data set now, so it's faster -- Updating MD5s due to more correct quality utils. DuplicatesWalkers quality estimates have changed. One UG test has different FS and rank sum tests because the conversion to phred scores are slightly (second decimal place) different	2013-02-16 07:31:38 -08:00
Mark DePristo	3b67aa8aee	Final edge case bug fixes to QualityUtil routines -- log10 functions in QualityUtils allow -Infinity to allow log10(0.0) values -- Fix edge condition of log10OneMinusX failing with Double.MIN_VALUE -- Fix another edge condition of log10OneMinusX failing with a small but not min_value double	2013-02-16 07:31:38 -08:00
Mark DePristo	b393c27f07	QualityUtils now uses runtime argument checks instead of contract -- There's some runtime cost for these tests, but it's not big enough to outweigh the value of catching errors quickly	2013-02-16 07:31:38 -08:00
Mark DePristo	3231031c1a	Bugfix for FisherStrand -- FisherStrand pValues can sum to slightly greater than 1.0, so they need to be capped to convert to a Phred-scaled quality score	2013-02-16 07:31:38 -08:00
Mark DePristo	9a29d6d4be	Fix an catastrophic bug (WoW!) in the reference calculation of the UG -- The UG was using MathUtils binomial probability backward, so that the estimated confidence was always NaN, and was as a side effect other utils converted this to a meaningless 0.0. This is all because there wasn't a unit test. -- I've fixed the calculation, so it's now log10 based, uses robust MathUtils and QualityUtils functions to compute probabilities, and added a unit test.	2013-02-16 07:31:38 -08:00
Mark DePristo	9e28d1e347	Cleanup and unit tests for QualityUtils -- Fixed a few conversion bugs with edge case quals (ones that were very high) -- Fixed a critical bug in the conversion of quals that was causing near capped quals to fall below their actual value. Will undoubtedly need to fix md5s -- More precise prob -> qual calculations for very high confidence events in phredScaleCorrectRate, trueProbToQual, and errorProbToQual. Very likely to improve accuracy of many calculations in the GATK -- Added errorProbToQual and trueProbToQual calculations that accept an integer cap, and perform the (tricky) conversion from int to byte correctly. -- Full docs and unit tests for phredScaleCorrectRate and phredScaleErrorRate. -- Renamed probToQual to trueProbToQual -- Added goodProbability and log10OneMinusX to MathUtils -- Went through the GATK and cleaned up many uses of QualityUtils -- Cleanup constants in QualityUtils -- Added full docs for all of the constants -- Rename MAX_QUAL_SCORE to MAX_SAM_QUAL_SCORE for clarity -- Moved MAX_GATK_USABLE_Q_SCORE to RecalDatum, as it's s BQSR specific feature -- Convert uses of QualityUtils.errorProbToQual(1-x) to QualityUtils.trueProbToQual(x) -- Cleanup duplicate quality score routines in MathUtils. Moved and renamed MathUtils.log10ProbabilityToPhredScale => QualityUtils.phredScaleLog10ErrorRate. Removed 3 routines from MathUtils, and remapped their usages into the better routines in QualityUtils	2013-02-16 07:31:37 -08:00
MauricioCarneiro	307f709cc7	Merge pull request #42 from broadinstitute/yf_add_version_information_CL_option	2013-02-15 21:39:33 -08:00
Yossi Farjoun	aa99a5f47c	Added an option to print out the version string @argument (-)-version (should this be @hidden?) Prints out the version to System.out and quit(0) No tests. (any ideas on how to test this would be happily accepted)	2013-02-15 12:42:59 -05:00
MauricioCarneiro	bbfbe1bc26	Merge pull request #41 from broadinstitute/jt_cmi_queue_packaging ValidatingPileup was renamed to CheckPileup	2013-02-15 09:19:00 -08:00
droazen	664960373d	Merge pull request #31 from broadinstitute/yf_fast_BAM_index_traversal -re-enables fast BAM indexing	2013-02-15 09:12:32 -08:00
Joel Thibault	182a950202	ValidatingPileup was renamed to CheckPileup	2013-02-15 11:56:19 -05:00
MauricioCarneiro	d80b99143f	Merge pull request #37 from broadinstitute/rp_left_alignment_hc_contract_GSA-771	2013-02-15 08:32:45 -08:00
MauricioCarneiro	1dd284a5bb	Merge pull request #39 from broadinstitute/tj_printreads_tag_for_bqsr_GSA-720 PrintReads writes a header when used with -BQSR	2013-02-15 07:18:28 -08:00
MauricioCarneiro	b58a0eca6b	Merge pull request #33 from broadinstitute/gg_more_gatkdocs_tweaks_GSATDG-62 Refactored GATKDocs categories some more ( GSATDG-62 )	2013-02-14 22:35:07 -08:00
MauricioCarneiro	f2669c8438	Merge pull request #38 from broadinstitute/gda_hc_md5 Updated md5's from BiasedDownsamplerIntegrationTest that changed due to ...	2013-02-14 22:19:23 -08:00
Tad Jordan	6cb80591e3	PrintReads writes a header when used with -BQSR	2013-02-14 22:19:14 -05:00
Guillermo del Angel	b18f216033	Updated md5's from BiasedDownsamplerIntegrationTest that changed due to changes in HaplotypeCaller - changing HashMaps to LinkedHashMaps changed ordering of reads presented to BiasedDownSampler which changed reads chosen, thereby marginally changing PL's and some site info.	2013-02-14 20:18:49 -05:00
Yossi Farjoun	3a7c8c13e2	Re-enabled fastBAMindexing by replacing the FileChannel with a SeekableBufferedStream This helps a lot since FileChannel is very low-level and traversing the BAMIndex involves lots of short reads. - Fixed a deterioration in BAMIndex due to rev'ed picard (see below) - Added unit tests for SeekableBufferedStream - Added integrationTests for GATKBAMIndex (in PileupWalkerIntegrationTest) - Added a runtime-test to verify that the amount read equals the amount requested. - Added failing tests with expectedExceptions - Used a DataProvider to make code nicer	2013-02-14 17:51:15 -05:00
Ryan Poplin	871c8b3866	No need to consider haplotypes which Smith-Waterman aligns off the end of the large padded reference.	2013-02-14 11:18:10 -05:00
depristo	5cc5aedcd1	Merge pull request #34 from broadinstitute/md_longer_default_timeout Extend default timeout to 20 minutes	2013-02-13 17:44:30 -08:00
Mark DePristo	f92328a1a1	Extend default timeout to 20 minutes -- The default of 10 minutes is right on the edge for some tests, and we really want a default not to enforce a max time (test should be short) but to stop testng from failing to terminate ever in the case where some test is truly hung	2013-02-13 17:43:40 -08:00
Geraldine Van der Auwera	6208742f7c	Refactored GATKDocs categories some more ( GSATDG-62 ) -- Renamed ValidatePileup to CheckPileup since validation is reserved word -- Renamed AlignmentValidation to CheckAlignment (same as above) -- Refactored category definitions to use constants defined in HelpConstants -- Fixed a couple of minor typos and an example error -- Reorganized the GATKDocs index template to use supercategories -- Refactored integration tests for renamed walkers (my earlier refactoring had screwed them up or not carried over)	2013-02-13 16:49:18 -05:00
depristo	357d196dad	Merge pull request #32 from broadinstitute/yf_per-sample-downsampling_GSA_765 Fixed md5s for the per-sample downsampling IntegrationTests that were disabled.	2013-02-13 10:08:11 -08:00
Yossi Farjoun	6d12e5a54f	Fixed md5s for the per-sample downsampling IntegrationTests that were disabled. - got md5s from a interim version that does not have the per-sample downsampling hookedup - added an integration test that forces the result from flat-downsampling to equal that which results from an equivalent flat contamination file	2013-02-13 12:49:39 -05:00
depristo	961f2533a5	Merge pull request #29 from broadinstitute/gda_gga_hc_GSA-722 Gda gga hc gsa 722	2013-02-13 07:58:57 -08:00
Guillermo del Angel	4308b27f8c	Fixed non-determinism in HaplotypeCaller and some UG calls - -- HaplotypeCaller and PerReadAlleleLikelihoodMap should use LinkedHashMaps instead of plain HashMaps. That way the ordering when traversing alleles is maintained. If the JVM traverses HashMaps with random ordering, different reads (with same likelihood) may be removed by contamination checker, and different alleles may be picked if they have same likelihoods for all reads. -- Put in some GATKDocs and contracts in HaplotypeCaller files (far from done, code is a beast) -- Update md5's due to different order of iteration in LinkedHashMaps instead of HashMaps inside HaplotypeCaller (due to change in PerReadAlleleLikelihoodMap that also slightly modifies reads chosen by per-read downsampling). -- Reenabled testHaplotypeCallerMultiSampleGGAMultiAllelic test -- Added some defensive argument checks into HaplotypeCaller public functions (not intended to be done yet).	2013-02-12 15:43:29 -05:00
depristo	38cea0a7ab	Merge pull request #28 from broadinstitute/gg_reorganize_gatkdocs_categories_GSATDG-62 Reorganized walker categories in GATKDocs (@DocumentedGATKFeature details)	2013-02-12 11:11:45 -08:00
Geraldine Van der Auwera	dff5ef562b	Reorganized walker categories in GATKDocs (@DocumentedGATKFeature details) -- Sorted out contents of BAM Processing vs. Diagnostics & QC Tools -- Moved two validation-related walkers from Diagnostics & QC to Validation Utilities -- Reworded some category names and descriptions to be more explicit and user-friendly	2013-02-12 13:36:15 -05:00
depristo	59484dfae4	Merge pull request #27 from broadinstitute/rp_ranksumtest_optimization Optimization to ReadPosRankSumTest: Don't do the work of parsing through...	2013-02-11 08:58:22 -08:00
Ryan Poplin	3f2f837b6a	Optimization to ReadPosRankSumTest: Don't do the work of parsing through the cigar string for non-informative reads.	2013-02-11 11:36:09 -05:00
delangel	f8e2153c71	Merge pull request #25 from broadinstitute/md_hmm_fail_GSA-751 Md hmm fail gsa 751	2013-02-09 16:43:50 -08:00

1 2 3 4 5 ...

11891 Commits (0c34e47a87d5d87eeaae45f07704fc6bf0ada253) All Branches Search

11891 Commits (0c34e47a87d5d87eeaae45f07704fc6bf0ada253)

All Branches