gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	3db02e5ef1	Merge pull request #315 from broadinstitute/md_ref_conf_hc Reference confidence model for the haplotype caller	2013-07-02 13:04:33 -07:00
droazen	2e87d09c26	Merge pull request #319 from broadinstitute/dr_packaging_system_fail_gracefully_when_bcel_not_installed Fail gracefully in the packaging system when bcel is not installed	2013-07-02 13:01:45 -07:00
David Roazen	75d1f64416	Fail gracefully in the packaging system when bcel is not installed Packaging the GATK requires bcel to be installed. Detect when it's not, and output instructions on how to install it.	2013-07-02 15:50:51 -04:00
Mark DePristo	35cdc16822	Merge pull request #318 from broadinstitute/dr_improve_dcov_documentation Improve -dcov documentation to address recent user confusion	2013-07-02 12:47:29 -07:00
Mark DePristo	5f34054cc1	Remove filtering of MAPQ 0 reads from CalledHaplotypeBAMWriter	2013-07-02 15:46:49 -04:00
Mark DePristo	7be01777f6	Bugfix for incPos in GenomeLoc -- Shouldn't have taken a GenomeLoc as an argument, as it's a instance method, not a public static	2013-07-02 15:46:49 -04:00
Mark DePristo	ed0b1c5aba	Fix bug in ReadThreadingAssembler in cycle failures causing NPE	2013-07-02 15:46:48 -04:00
Mark DePristo	e3e8631ff5	Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences -- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction. Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure -- -- Output format looks like: 20 10026072 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120 20 10026073 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119 20 10026074 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,121 20 10026075 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119 20 10026076 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120 20 10026077 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120 20 10026078 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:5,0:5:15:0,15,217 20 10026079 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,240 20 10026080 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,268 20 10026081 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:7,0:7:21:0,21,267 We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values. Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty. -- Can we enabled for single samples with --emitRefConfidence (-ERC). -- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval. The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads -- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures. Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class. -- Includes GVCF writer -- Add 1 mb of WEx data to private/testdata -- Integration tests for reference model output for WGS and WEx data -- Emit GQ block information into VCF header for GVCF mode -- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC -- Control max indel size for the reference confidence model from the command line. Increase default to 10 -- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest -- Unittests for ReferenceConfidenceModel -- Unittests for new MathUtils functions	2013-07-02 15:46:38 -04:00
Mark DePristo	41aba491c0	Critical bugfix for adapter clipping in HaplotypeCaller -- The previous code would adapter clip before reverting soft clips, so because we only clip the adapter when it's actually aligned (i.e., not in the soft clips) we were actually not removing bases in the adapter unless at least 1 bp of the adapter was aligned to the reference. Terrible. -- Removed the broken logic of determining whether a read adaptor is too long. -- Doesn't require isProperPairFlag to be set for a read to be adapter clipped -- Update integration tests for new adapter clipping code	2013-07-02 15:46:36 -04:00
David Roazen	cdea744b95	Improve -dcov documentation to address recent user confusion -Explicitly state that -dcov does not produce an unbiased random sampling from all available reads at each locus, and that instead it tries to maintain an even representation of reads from all alignment start positions (which, of course, is a form of bias) -Recommend -dfrac for users who want a true across-the-board unbiased random sampling	2013-07-02 15:33:28 -04:00
David Roazen	8eab59419d	Ant-based walker dependency analyzer -Given a list of walkers and a pair of git commits, determines whether each of the walkers has compile-time dependencies on the Java classes changed between the two commits. -Output is in the form of a Java properties file, and can be easily loaded via the Properties class. Example output: org.broadinstitute.sting.gatk.walkers.bed.MergeIntervalLists=true org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper=false org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller=false org.broadinstitute.sting.gatk.walkers.na12878kb.NA12878DBWalker=true org.broadinstitute.sting.gatk.walkers.readutils.PrintReads=false "true" indicates that the walker does have compile-time dependencies on one or more of the changed Java classes, "false" indicates no dependencies -Considers classes within changed jar files as well, provided the jars are stored in our git repository (as they are with tribble, picard, etc.) -Ant-based solution with a shell script frontend. The previous Java-based solution had several issues and introduced problematic dependencies into the GATK.	2013-07-02 13:49:04 -04:00
Mark DePristo	9df58314ab	Bugfix for counting of applied filters -- Because LocusWalkers have multiple filtering streams, each counting filtering independent, and the close() function set calling setFilter on the global result, not on the private counter, which is incorporated into the global (thereby incrementing the counts of each filter). -- [delivers #52667213]	2013-07-01 21:09:48 -04:00
David Roazen	c3d59d890d	Update licenses for new PbsEngine* classes	2013-07-01 15:50:20 -04:00
droazen	2964ebaa4e	Merge pull request #314 from broadinstitute/ks_francesco_pbs_patch Ks francesco pbs patch	2013-07-01 12:39:38 -07:00
Khalid Shakir	f0c36e2890	Fixing failed test for HSP by changing dcov from 60 to 200.	2013-07-01 15:13:04 -04:00
Khalid Shakir	ec206eccfc	Switch "all" test pipeline job runners to mean the job runners that run at The Broad.	2013-07-01 15:12:55 -04:00
Francesco	acf90ca027	corrected number of arguments passed to PbsEngineJobRunner when requesting multiple cores Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2013-07-01 15:08:15 -04:00
Francesco	948b2fca20	added PbsEngine plugin into engine folders, to be called in Queue with -jobRunner PbsEngine; the plugin is written modifying the existing GridEngine plugin, used as a template Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2013-07-01 15:08:14 -04:00
Mark DePristo	4ec50caea2	Merge pull request #313 from broadinstitute/mc_generalize_dt_scala_script Added all the parameters to the scala script for DiagnoseTargets	2013-06-29 11:00:00 -07:00
Mauricio Carneiro	a6b569b395	Added all the parameters to the scala script for DiagnoseTargets	2013-06-29 11:28:25 -04:00
Mauricio Carneiro	815f119f7c	Walker to create a fastq file from an interval list useful to convert bait and target interval lists into actual sequences that we can align with bwa and test for mappability.	2013-06-29 11:24:16 -04:00
David Roazen	31827022db	Fix pipeline tests that were not respecting the pipeline test dry run setting There are a few pipeline test classes that do not run Queue, but are classified as pipeline tests because they submit farm jobs. Make these unconventional pipeline tests respect the pipeline test dry run setting.	2013-06-28 15:27:17 -04:00
Scott Thibault	82dcdc01c0	Merge branch 'master' into st_fpga_hmm Conflicts: protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/LikelihoodCalculationEngine.java	2013-06-28 10:13:05 -05:00
Scott Thibault	e691fa3e19	FPGA null pointer bug fix	2013-06-28 08:52:09 -05:00
Ryan Poplin	1ec56c9e64	Merge pull request #311 from broadinstitute/eb_require_min_mq_for_FN_in_kb We need to enforce a minimum base and mapping quality threshold to penal...	2013-06-27 16:49:45 -07:00
Mark DePristo	5717f3dc1c	Merge pull request #312 from broadinstitute/eb_fix_assessna12878_and_add_integration_tests I allowed a bad push yesterday with the KB because there weren't any int...	2013-06-27 13:51:37 -07:00
Eric Banks	7aa1f56dff	We need to enforce a minimum base and mapping quality threshold to penalize a callset for FNs. This is used in conjunction with the -BAM argument in AssessNA12878 and is necessary for the Jenkins assessment to work properly (Ryan's commit wasn't enough).	2013-06-27 15:58:39 -04:00
Eric Banks	1f1da56d28	I allowed a bad push yesterday with the KB because there weren't any integration tests for the assessment. Added one and fixed up the code so that the headers are more accurate for the -badSites output.	2013-06-27 15:42:53 -04:00
Ryan Poplin	825b603acb	Merge pull request #298 from broadinstitute/md_likelihood_rank_sum Md likelihood rank sum	2013-06-27 11:14:25 -07:00
Mark DePristo	de7fe2e086	Merge pull request #308 from broadinstitute/rp_assessment_low_coverage Don't count no coverage sites as false negatives in the assessment again...	2013-06-27 06:46:23 -07:00
Eric Banks	9f08718636	Merge pull request #309 from jsilter/master Add "isComplexEvent" as attribute	2013-06-26 16:00:51 -07:00
Jacob Silterra	beb834e849	Add "isComplexEvent" as attribute to VariantContextBuilder for MongoVariantContext	2013-06-26 17:12:32 -04:00
Ryan Poplin	fe5348ea5d	Don't count no coverage sites as false negatives in the assessment against the knowledge base	2013-06-26 16:02:44 -04:00
Mark DePristo	a514dd0643	Merge pull request #307 from broadinstitute/eb_rr_off_by_one_error Proper fix for previous RR -cancer_mode fix.	2013-06-26 13:02:23 -07:00
Eric Banks	876e40466a	Proper fix for previous RR -cancer_mode fix. I "fixed" this once before but instead of testing with unit tests I used integration tests. Bad decision. The proper fix is in now, with a bonafide unit test included.	2013-06-26 14:48:09 -04:00
Eric Banks	95eab80f9b	Merge pull request #306 from broadinstitute/eb_make_assessreducedquals_hidden Make this walker @Hidden	2013-06-26 08:47:28 -07:00
Eric Banks	f242be12c0	Make this walker @Hidden	2013-06-26 11:45:21 -04:00
Mark DePristo	28d4c3debc	Merge pull request #305 from broadinstitute/dr_move_DownsampleReadsQC_to_private Move DownsampleReadsQC walker to private	2013-06-25 16:33:20 -07:00
David Roazen	94294ed6c4	Move DownsampleReadsQC walker to private	2013-06-25 15:48:44 -04:00
Mark DePristo	d13ed06e9d	Merge pull request #303 from broadinstitute/eb_update_kb_to_use_exome_intervals Various updates to have the KB use the expanded exome intervals (from D ...	2013-06-24 13:06:52 -07:00
Eric Banks	6dc816beee	Various updates to have the KB use the expanded exome intervals (from D MacArthur) in addition to chr20. 1. MergeIntervalLists should take the global interval padding into account when merging. 2. Update the name of the imported callsets in the setup script because of renaming for expanded intervals. 3. If there are too many intervals to process, MongoDB falls apart. Refactored the site selection code so that in such cases we pull out all records from the DB and the GATK itself does the interval filtering. 4. Add isComplex to callset summary for the consensus summarizer. 5. Remove the check for out of order records in the SiteIterator since records now do come out of order (since contigs are sorted lexicographically in MongoDB). Results: Iteration over the gencode intervals (90 MB) in AssessNA12878 now takes 90 seconds. I can't tell you how much time it took before because it kept crashing Mongo (but it was a long, long time).	2013-06-24 14:57:35 -04:00
Mark DePristo	ff76d0c877	Merge pull request #304 from broadinstitute/eb_rr_header_negative_fix_again Fixing the 'header is negative' problem in Reduce Reads... again.	2013-06-24 11:55:52 -07:00
Eric Banks	165b936fcd	Fixing the 'header is negative' problem in Reduce Reads... again. Previous fixes and tests only covered trailing soft-clips. Now that up front hard-clipping is working properly though, we were failing on those in the tool. Added a patch for this as well as a separate test independent of the soft-clips to make sure that it's working properly.	2013-06-24 14:06:21 -04:00
Valentin Ruano-Rubio	b97f9a487d	Merged bug fix from Stable into Unstable	2013-06-24 14:00:01 -04:00
Mark DePristo	521d9c1df5	Merge pull request #302 from broadinstitute/mc_processing_pipeline2 quick updates to the techdev processing pipeline scala script	2013-06-24 09:52:55 -07:00
Mauricio Carneiro	c38b8065d8	quick fixes to the scala script * Increase the memory limit for HTSLIB - Bam shuffling just eats up a ton of memory. * Concurrent HTSLIB processes need unique temp files the bam shuffling step was messing up with the temporary files and failing without returning zero. Fixed it by giving a unique name to each process.	2013-06-24 12:44:47 -04:00
Mark DePristo	191e4ca251	Merge pull request #300 from broadinstitute/mc_move_qualify_intervals_to_protected Few bug fixes to this tool now that it is in protected	2013-06-24 09:35:45 -07:00
Yossi Farjoun	d8ca4d3e6d	Merge pull request #299 from broadinstitute/eb_mate_fixer_confused_by_nonprimary_alignment Another fix for the Indel Realigner that arises because of secondary alignments.	2013-06-24 06:58:27 -07:00
Valentin Ruano-Rubio	3e5ff6095f	Added the pertinent DocumentedGATKFeature annotation ot AnalyzeCovariates	2013-06-21 17:02:26 -04:00
Eric Banks	d976aae2b1	Another fix for the Indel Realigner that arises because of secondary alignments. This time we don't accidentally drop reads (phew), but this bug does cause us not to update the alignment start of the mate. Fixed and added unit test to cover it.	2013-06-21 16:59:22 -04:00

... 2 3 4 5 6 ...

12725 Commits (cdfd07f9eb4e2ca18b2b6b10d00797cd3a156ebd) All Branches Search

12725 Commits (cdfd07f9eb4e2ca18b2b6b10d00797cd3a156ebd)

All Branches