gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	191e4ca251	Merge pull request #300 from broadinstitute/mc_move_qualify_intervals_to_protected Few bug fixes to this tool now that it is in protected	2013-06-24 09:35:45 -07:00
Yossi Farjoun	d8ca4d3e6d	Merge pull request #299 from broadinstitute/eb_mate_fixer_confused_by_nonprimary_alignment Another fix for the Indel Realigner that arises because of secondary alignments.	2013-06-24 06:58:27 -07:00
Eric Banks	d976aae2b1	Another fix for the Indel Realigner that arises because of secondary alignments. This time we don't accidentally drop reads (phew), but this bug does cause us not to update the alignment start of the mate. Fixed and added unit test to cover it.	2013-06-21 16:59:22 -04:00
Mark DePristo	dee51c4189	Error out when NCT and BAMOUT are used with the HaplotypeCaller -- Currently we don't support writing a BAM file from the haplotype caller when nct is enabled. Check in initialize if this is the case, and throw a UserException	2013-06-21 09:25:57 -04:00
David Roazen	e03a5e9486	Update source release script in attempt to work around intermittent github issues Github was intermittently rejecting large pushes that were in fact fast-forward updates as being non-fast-forward. Try to prevent this by ensuring that all refs are up-to-date and properly checked out after branch filtering and before doing a source release.	2013-06-20 16:58:01 -04:00
David Roazen	0018af0c0a	Update README file for the 2.6 release	2013-06-20 13:08:29 -04:00
Eric Banks	6977d6e2a7	Merge remote-tracking branch 'unstable/master'	2013-06-20 12:14:33 -04:00
Eric Banks	9f979fdc81	Merge pull request #297 from broadinstitute/md_vcfversion2 Better GATK version and command line output	2013-06-20 09:11:36 -07:00
Mark DePristo	fdfe4e41d5	Better GATK version and command line output -- Previous version emitted command lines that look like: ##HaplotypeCaller="analysis_type=HaplotypeCaller input_file=[private/testdata/reduced.readNotFullySpanningDeletion.bam] ..." the new version provides additional information on when the GATK was run and the GATK version in a nicer format: ##GATKCommandLine=<ID=HaplotypeCaller,Version=2.5-206-gbc7be2b,Date="Thu Jun 20 11:09:01 EDT 2013",Epoch=1371740941197,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[private/testdata/reduced.readNotFullySpanningDeletion.bam] read_buffer_size=null phone_home=AWS ..."> -- Additionally, the command line options are emitted sequentially in the file, so you can see a running record of how a VCF was produced, such as this example from the integration test: ##GATKCommandLine=<ID=HaplotypeCaller,Version=2.5-206-gbc7be2b,Date="Thu Jun 20 11:09:01 EDT 2013",Epoch=1371740941197,CommandLineOptions="lots of stuff"> ##GATKCommandLine=<ID=SelectVariants,Version=2.5-206-gbc7be2b,Date="Thu Jun 20 11:16:23 EDT 2013",Epoch=1371741383277,CommandLineOptions="lots of stuff"> -- Removed the ProtectedEngineFeaturesIntegrationTest -- Actual unit tests for these features!	2013-06-20 11:19:13 -04:00
Mark DePristo	701d70401f	Merge pull request #296 from broadinstitute/md_pubprotfix Fix public / protected dependency	2013-06-19 17:17:21 -07:00
Mark DePristo	0672ac5032	Fix public / protected dependency	2013-06-19 19:42:09 -04:00
Eric Banks	74415a6a2a	Merge pull request #292 from broadinstitute/vrr_analyzeCovariates Added the AnalyzeCovariates tool to generate BQSR quality assessment plots.	2013-06-19 13:26:59 -07:00
Valentin Ruano-Rubio	1f8282633b	Removed plots generation from the BaseRecalibration software Improved AnalyzeCovariates (AC) integration test. Renamed AC test files ending with .grp to .table Implementation: * Removed RECAL_PDF/CSV_FILE from RecalibrationArgumentCollection (RAC). Updated rest of the code accordingly. * Fixed BQSRIntegrationTest to work with new changes	2013-06-19 14:47:56 -04:00
Valentin Ruano-Rubio	08f92bb6f9	Added AnalyzeCovariates tool to generate BQSR assessment quality plots. Implemtation details: * Added tool class .AnalyzeCovariates Added convenient addAll method to Utils to be able to add elements of an array. * Added parameter comparison methods to RecalibrationArgumentCollection class in order to verify that multiple imput recalibration report are compatible and comparable. * Modified the BQSR.R script to handle up to 3 different recalibration tables (-BQSR, -before and -after) and removed some irrelevant arguments (or argument values) from the output. * Added an integration test class.	2013-06-19 14:38:02 -04:00
Mark DePristo	fb114e34fe	Merge pull request #295 from broadinstitute/dr_remove_PrintReads_ds_argument PrintReads: remove -ds argument	2013-06-19 10:55:10 -07:00
droazen	573ecadecc	Merge pull request #294 from broadinstitute/dr_handle_zero_length_cigar_elements SAMDataSource: always consolidate cigar strings into canonical form	2013-06-19 10:32:22 -07:00
David Roazen	51ec5404d4	SAMDataSource: always consolidate cigar strings into canonical form -Collapses zero-length and repeated cigar elements, neither of which can necessarily be handled correctly by downstream code (like LIBS). -Consolidation is done before read filters, because not all read filters behave correctly with non-consoliated cigars. -Examined other uses of consolidateCigar() throughout the GATK, and found them to not be redundant with the new engine-level consolidation (they're all on artificially-created cigars in the HaplotypeCaller and SmithWaterman classes) -Improved comments in SAMDataSource.applyDecoratingIterators() -Updated MD5s; differences were examined and found to be innocuous -Two tests: -Unit test for ReadFormattingIterator -Integration test for correct handling of zero-length cigar elements by the GATK engine as a whole	2013-06-19 13:29:01 -04:00
David Roazen	23ee192d5e	PrintReads: remove -ds argument -This argument was completely redundant with the engine-level -dfrac argument. -Could produce unintended consequences if used in conjunction with engine-level downsampling arguments.	2013-06-19 13:22:44 -04:00
David Roazen	0be788f0f9	Fix typo in snpEff documentation	2013-06-19 13:15:24 -04:00
chartl	a3d6ad55f9	Merge pull request #271 from broadinstitute/chartl_extend_genotypeconcordance_documentation Extend Genotype Concordance Documentation	2013-06-19 09:03:05 -07:00
Chris Hartl	af275fdf10	Extend the documentation of GenotypeConcordance to include notes about Monomorphic and Filtered VCF records. Address Geraldine's comments - information on moltenization and explanation of fields Fix paren	2013-06-19 12:01:58 -04:00
amilev	28a8d74290	Merge pull request #293 from broadinstitute/md_catvariants CatVariants accepts reference files ending in any standard extension	2013-06-19 08:36:58 -07:00
Mark DePristo	15171c07a8	CatVariants accepts reference files ending in any standard extension -- [resolves #49339235] Make CatVariants accept reference files ending in .fa (not only .fasta)	2013-06-19 11:10:36 -04:00
MauricioCarneiro	6a5502c94a	Merge pull request #289 from broadinstitute/md_fix_bq Bugfix: defaultBaseQualities actually works now	2013-06-18 11:58:39 -07:00
delangel	1c400e8f8e	Merge pull request #291 from broadinstitute/gda_new_hmm_in_ug Swapping in logless Pair HMM for default usage with UG:	2013-06-18 07:07:57 -07:00
Guillermo del Angel	f176c854c6	Swapping in logless Pair HMM for default usage with UG: -- Changed default HMM model. -- Removed check. -- Changed md5's: PL's in the high 100s change by a point or two due to new implementation. -- Resulting performance improvement is about 30 to 50% less runtime when using -glm INDEL.	2013-06-18 10:06:27 -04:00
Mark DePristo	4c482eb0f0	Merge pull request #290 from broadinstitute/rp_pruning_priority_queue Adding new pruning parameter to ReadThreadingAssembler	2013-06-17 17:16:00 -07:00
Ryan Poplin	8511c4385c	Adding new pruning parameter to ReadThreadingAssembler -- numPruningSamples allows one to specify that the minPruning factor must be met by this many samples for a path to be considered good (e.g. seen twice in three samples). By default this is just one sample. -- adding unit test to test this new functionality	2013-06-17 16:46:40 -04:00
delangel	a6a58cbc78	Merge pull request #288 from broadinstitute/gda_more_ancient_dna_fixes Feature requested by Reich lab and Paavo lab in Leipzig for ancient DNA ...	2013-06-17 13:04:21 -07:00
Mark DePristo	cb5b1c3c34	Create README.md	2013-06-17 16:03:45 -03:00
Mark DePristo	7b22467148	Bugfix: defaultBaseQualities actually works now -- It was being applied in the wrong order (after the first call to the underlying MalformedReadFilter) so if your first read was malformed you'd blow up there instead of being fixed properly. Added integration tests to ensure this continues to work. -- [delivers #49538319]	2013-06-17 14:37:27 -04:00
Guillermo del Angel	f6025d25ae	Feature requested by Reich lab and Paavo lab in Leipzig for ancient DNA processing: -- When doing cross-species comparisons and studying population history and ancient DNA data, having SOME measure of confidence is needed at every single site that doesn't depend on the reference base, even in a naive per-site SNP mode. Old versions of GATK provided GQ and some wrong PL values at reference sites but these were wrong. This commit addresses this need by adding a new UG command line argument, -allSitePLs, that, if enabled will: a) Emit all 3 ALT snp alleles in the ALT column. b) Emit all corresponding 10 PL values. It's up to the user to process these PL values downstream to make sense of these. Note that, in order to follow VCF spec, the QUAL field in a reference call when there are non-null ALT alleles present will be zero, so QUAL will be useless and filtering will need to be done based on other fields. -- Tweaks and fixes to processing pipelines for Reich lab.	2013-06-17 13:21:09 -04:00
Mark DePristo	fce448cc9e	Merge pull request #287 from broadinstitute/md_gzip_vcf_nt Bugfix: allow gzip VCF output in multi-threaded GATK output	2013-06-17 09:39:37 -07:00
Mark DePristo	b69d210255	Bugfix: allow gzip VCF output in multi-threaded GATK output -- VariantContextWriterStorage was gzipping the intermediate files that would be merged in, but the mergeInto function couldn't read those outputs, and we'd throw a very strange error. Now tmp. VCFs aren't compressed, even if the final VCF is. Added integrationtest to ensure this behavior works going forward. -- [delivers #47399279]	2013-06-17 12:39:18 -04:00
delangel	485ceb1e12	Merge pull request #283 from broadinstitute/md_beagleoutput Simpler FILTER and info field encoding for BeagleOutputToVCF	2013-06-17 09:31:03 -07:00
Mark DePristo	5b1a472d2c	Merge pull request #286 from broadinstitute/eb_add_tiers_to_KBconsensus Added 2 new fields to the MongoVariantContext: confidence and isComplex.	2013-06-17 08:38:57 -07:00
Mark DePristo	ee78927bdb	Merge pull request #279 from broadinstitute/eb_make_rms_mq_work_with_rr Fixes to several of the annotations for reduced reads (and other issues)...	2013-06-16 09:48:19 -07:00
Eric Banks	e48f754478	Fixes to several of the annotations for reduced reads (and other issues). 1. Have the RMSMappingQuality annotation take into account the fact that reduced reads represent multiple reads. 2. The rank sume tests should not be using reduced reads (because they do not represent distinct observations). 3. Fixed a massive bug in the BaseQualityRankSumTest annotation! It was not using the base qualities but rather the read likelihoods?! Added a unit test for Rank Sum Tests to prove that the distributions are correctly getting assigned appropriate p-values. Also, and just as importantly, the test shows that using reduced reads in the rank sum tests skews the results and makes insignificant distributions look significant (so it can falsely cause the filtering of good sites). Also included in this commit is a massive refactor of the RankSumTest class as requested by the reviewer.	2013-06-16 01:18:20 -04:00
Eric Banks	9ec71bba26	Added 2 new fields to the MongoVariantContext: confidence and isComplex. IsComplex will be used to designate calls as representing complex events which have multiple correct allele representations. Then call sets can get points for including them but will not get penalized for missing them (because they may have used a different representation). This is currently the biggest bane when trying to characterize FNs these days. The confidence will be used to refactor the consensus making algorithm for the truth status of the NA12878 KB. The previous version allowed for 2 tiers: reviews and everything else. But that is problematic when some of the input sets are of higher quality than others because when they disagree the calls become discordant and we lose that information. The new framework will allow each call to have its own associated confidence. Then when determining the consensus truth status we probabilistically calculate it from the various confidences, so that nothing is hard coded in anymore. Note that I added some unit tests to ensure the outcome that I expect for various scenarios and then implemented a very rough version of the estimator that successfully produced those outcomes. HOWEVER, THIS IS NOT COMPLETE AND NEITHER FUNCTIONALITY IS HOOKED UP AT ALL. Rather, this is an interim commit. The only goal here is to get these fields added to the MVC for the upcoming release so that Jacob (who prefers to work with stable) can add the necessary functionality to IGV for us.	2013-06-16 00:31:16 -04:00
droazen	4151753718	Merge pull request #285 from broadinstitute/dr_james_warren_fasta_suffix_bugfix deducing dictionary path should not use global find and replace	2013-06-14 16:57:10 -07:00
James Warren	f46f7d9b23	deducing dictionary path should not use global find and replace Signed-off-by: David Roazen <droazen@broadinstitute.org>	2013-06-14 19:15:27 -04:00
Mark DePristo	52677429a0	Merge pull request #284 from broadinstitute/dr_fewer_stranded_temp_files Reduce number of leftover temp files in GATK runs	2013-06-14 13:06:28 -07:00
Mark DePristo	1677a0a458	Simpler FILTER and info field encoding for BeagleOutputToVCF -- Previous version created FILTERs for each possible alt allele when that site was set to monomorphic by BEAGLE. So if you had a A/C SNP in the original file and beagle thought it was AC=0, then you'd get a record with BGL_RM_WAS_A in the FILTER field. This obviously would cause problems for indels, as so the tool was blowing up in this case. Now beagle sets the filter field to BGL_SET_TO_MONOMORPHIC and sets the info field annotation OriginalAltAllele to A instead. This works in general with any type of allele. -- Here's an example output line from the previous and current versions: old: 20 64150 rs7274499 C . 3041.68 BGL_RM_WAS_A AN=566;DB;DP=1069;Dels=0.00;HRun=0;HaplotypeScore=238.33;LOD=3.5783;MQ=83.74;MQ0=0;NumGenotypesChanged=1;OQ=1949.35;QD=10.95;SB=-6918.88 new: 20 64062 . G . 100.39 BGL_SET_TO_MONOMORPHIC AN=566;DP=1108;Dels=0.00;HRun=2;HaplotypeScore=221.59;LOD=-0.5051;MQ=85.69;MQ0=0;NumGenotypesChanged=1;OQ=189.66;OriginalAltAllele=A;QD=15.81;SB=-6087.15 -- update MD5s to reflect these changes -- [delivers #50847721]	2013-06-14 15:56:13 -04:00
David Roazen	d167292688	Reduce number of leftover temp files in GATK runs -WalkerTest now deletes .idx files on exit -ArtificialBAMBuilder now deletes .bai files on exit -VariantsToBinaryPed walker now deletes its temp files on exit	2013-06-14 15:56:03 -04:00
Mark DePristo	b72880cc94	Merge pull request #282 from broadinstitute/md_gatklogs_gitversions Use git hash to lookup versions when necessary in analyzeRunReports.py	2013-06-14 12:39:54 -07:00
Mark DePristo	20bb4902a3	Use git hash to lookup versions when necessary in analyzeRunReports.py	2013-06-14 15:31:25 -04:00
Mark DePristo	50ea098c11	Merge pull request #281 from broadinstitute/md_gatklogs Update utilities to get GATKRunReports	2013-06-14 10:00:16 -07:00
Ryan Poplin	c4e508a71f	Merge pull request #275 from broadinstitute/md_fragment_with_pcr Improvements to HaplotypeCaller and NA12878 KB	2013-06-14 09:32:26 -07:00
Mark DePristo	a057f37331	Update utilities to get GATKRunReports -- Critical bugfix: the GATK run reports magically changed names from something like GATK-run-report to GATKRunReport in GATK 2.4. All GATK logs from 2.4 onwards were being eaten by the scripts that download logs, so the GATK usage is actually much much higher than our logs have suggested. Looking forward to seeing some real numbers. Unfortunately the error occurred so early in the downloading process that we actually deleted away these logs, so they cannot be recovered -- Added a step in the downloader that archives the raw, unprocessed files so we can recover from such problems in the future -- The s3 download scripts now download to /local/dev/GATKLogs so will only work on gsa4, but this is ok as this is better than taking forever to get the logs to the isilon. -- Turn off some crazy debugging output from the downloader that was actually masking me from seeing the issue each night -- Make analyzeRunReports.py robust to svn version abominations -- Use python-2.6 in runGATKReport.csh	2013-06-14 10:17:32 -04:00
droazen	ac346a93ba	Merge pull request #278 from broadinstitute/md_gatk_version_in_vcf Emit the GATK version number in the VCF header	2013-06-13 13:22:20 -07:00

1 2 3 4 5 ...

12521 Commits (191e4ca251f1b655e5dcc5e00ab38c308e3c83f9) All Branches Search

12521 Commits (191e4ca251f1b655e5dcc5e00ab38c308e3c83f9)

All Branches