gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Khalid Shakir	1e25a758f5	Moved files to maven directories. Here are the git moved directories in case other files need to be moved during a merge: git-mv private/java/src/ private/gatk-private/src/main/java/ git-mv private/R/scripts/ private/gatk-private/src/main/resources/ git-mv private/java/test/ private/gatk-private/src/test/java/ git-mv private/testdata/ private/gatk-private/src/test/resources/ git-mv private/scala/qscript/ private/queue-private/src/main/qscripts/ git-mv private/scala/src/ private/queue-private/src/main/scala/ git-mv protected/java/src/ protected/gatk-protected/src/main/java/ git-mv protected/java/test/ protected/gatk-protected/src/test/java/ git-mv public/java/src/ public/gatk-framework/src/main/java/ git-mv public/java/test/ public/gatk-framework/src/test/java/ git-mv public/testdata/ public/gatk-framework/src/test/resources/ git-mv public/scala/qscript/ public/queue-framework/src/main/qscripts/ git-mv public/scala/src/ public/queue-framework/src/main/scala/ git-mv public/scala/test/ public/queue-framework/src/test/scala/	2014-02-03 13:50:44 -05:00
Khalid Shakir	faaef236ea	Moved gsalib, R and other resources, Queue GATK extensions generator, Queue version java files.	2014-02-03 13:49:21 -05:00
Khalid Shakir	eb52dc6a9b	Moved build.xml, ivy.xml, ivysettings.xml, ivy properties, public/packages/*.xml into private/archive/ant	2014-02-03 13:49:20 -05:00
Eric Banks	83d07280ef	Merge pull request #482 from broadinstitute/vrr_reference_model_alt_allele gVCF <NON_REF> in all vcf lines including variant ones when –ERC gVCF is...	2014-01-30 08:25:43 -08:00
Valentin Ruano-Rubio	89c4e57478	gVCF <NON_REF> in all vcf lines including variant ones when –ERC gVCF is requested. Changes: ------- <NON_REF> likelihood in variant sites is calculated as the maximum possible likelihood for an unseen alternative allele: for reach read is calculated as the second best likelihood amongst the reported alleles. When –ERC gVCF, stand_conf_emit and stand_conf_call are forcefully set to 0. Also dontGenotype is set to false for consistency sake. Integration test MD5 have been changed accordingly. Additional fix: -------------- Specially after adding the <NON_REF> allele, but also happened without that, QUAL values tend to go to 0 (very large integer number in log 10) due to underflow when combining GLs (GenotypingEngine.combineGLs). To fix that combineGLs has been substituted by combineGLsPrecise that uses the log-sum-exp trick. In just a few cases this change results in genotype changes in integration tests but after double-checking using unit-test and difference between combineGLs and combineGLsPrecise in the affected integration test, the previous GT calls were either border-line cases and or due to the underflow.	2014-01-30 11:23:33 -05:00
Valentin Ruano Rubio	383a4f4a70	Merge pull request #481 from broadinstitute/vrr_pairhmm_log_probability_fix Fix for the PairHMM transition probability miscalculation.	2014-01-27 10:59:08 -08:00
Valentin Ruano-Rubio	748d2fdf92	Added Integration test to verify the bugs are not there anymore as reported in pivotracker	2014-01-26 23:29:31 -05:00
Valentin Ruano-Rubio	9e7bf75e89	Fix for the PairHMM transition probability miscalculation. Problem: matchToMatch transition calculation was wrong resulting in transition probabilites coming out of the Match state that added more than 1. Reports: https://www.pivotaltracker.com/s/projects/793457/stories/62471780 https://www.pivotaltracker.com/s/projects/793457/stories/61082450 Changes: The transition matrix update code has been moved to a common place in PairHMMModel to dry out its multiple copies. MatchToMatch transtion calculation has been fixed and implemented in PairHMMModel. Affected integration test md5 have been updated, there were no differences in GT fields and example differences always implied small changes in likelihoods that is what is expected.	2014-01-26 16:30:36 -05:00
Ryan Poplin	bdd06ebfc2	Merge pull request #478 from broadinstitute/eb_generalize_hc_values_as_args Pulled out some hard-coded values from the read-threading and isActive c...	2014-01-21 09:01:54 -08:00
Eric Banks	8812278c2c	Merge pull request #479 from broadinstitute/eb_move_test_up_one_level Moving this test up one level to where it actually belongs.	2014-01-21 06:45:55 -08:00
Eric Banks	9e858270d7	Moving this test up one level to where it actually belongs.	2014-01-19 02:33:11 -05:00
Eric Banks	64d5bf650e	Pulled out some hard-coded values from the read-threading and isActive code of the HC, and made them into a single argument. In unifying the arguments it was clear that the values were inconsistent throughout the code, so now there's a single value that is intended to be more liberal in what it allows in (in an attempt to increase sensitivity). Very little code actually changes here, but just about every md5 in the HC integration tests are different (as expected). Added another integration test for the new argument. To be used by David R to test his per-branch QC framework: does this commit make the HC look better against the KB?	2014-01-19 01:15:13 -05:00
Eric Banks	abd4f552ba	Merge pull request #476 from broadinstitute/yf_logging_all_input_SAMFiles Added an info log containing the SAM/BAM files that were eventually found.	2014-01-17 08:54:33 -08:00
Yossi Farjoun	c79e8ca53e	Added an info log containing the SAM/BAM files that were eventually found from the commandline (useful for when there are files hiding inside bam.lists which may or may not have been constructed correctly...) Added a @hidden option controling the appearance of the full BamList in the log	2014-01-17 11:25:21 -05:00
Eric Banks	3b6b7626aa	Merge pull request #472 from broadinstitute/eb_extend_private_simulate_reads_tool Fixed up and refactored what seems to be a useful private tool to create...	2014-01-15 17:51:07 -08:00
Eric Banks	de56134579	Fixed up and refactored what seems to be a useful private tool to create simulated reads around a VCF. It didn't completely work before (it was hard-coded for a particular long-lost data set) but it should work now. Since I thought that it might prove useful to others, I moved it to protected and added integration tests. GERALDINE: NEW TOOL ALERT!	2014-01-15 13:49:31 -05:00
Eric Banks	e2c2aa7b05	Merge pull request #475 from broadinstitute/eb_fix_null_alleles_bug_PT63551060 Added in a check for what would be an empty allele after trimming.	2014-01-15 08:05:21 -08:00
Eric Banks	9f1ab0087a	Added in a check for what would be an empty allele after trimming.	2014-01-15 11:04:19 -05:00
Ryan Poplin	201ad398ac	Merge pull request #473 from broadinstitute/eb_fix_qd_indel_normalization The QD normalization for indels was busted and is now fixed.	2014-01-14 08:56:19 -08:00
Eric Banks	e4fdc5ac44	Merge pull request #474 from broadinstitute/eb_fix_haplotype_resolver_PT63333488 Fixing the Haplotype Resolver so that it doesn't complain about missing header lines	2014-01-14 07:36:53 -08:00
Geraldine Van der Auwera	f67c33919b	Merge pull request #468 from broadinstitute/gg_fixSAMPileup Updated SAMPileup codec and pileup-related docs	2014-01-14 06:30:04 -08:00
Geraldine Van der Auwera	edf5880022	Updated SAMPileup codec and pileup-related docs Problem: the codec was written to take in consensus pileups produced with pileup -c option (which consists of 10 or 13 fields per line depending on the variant type) but errored out on the basic pileup format (which only has 6 fields per line). This was inconsistent and confusing to users. Solution: I added a switch in the parsing to recognize and handle both cases more appropriately, and updated related docs. While I was at it I also improved error messages in CheckPileup, which now emits User Error: Bad Input exceptions when reporting mismatches. Which may not be the best thing to do (ultimately they're not really errors, they're just reporting unwelcome results) but it beats emitting Runtime Exceptions. Tested by CheckPileupIntegrationTest which tests both format cases.	2014-01-14 09:14:16 -05:00
Eric Banks	16ecc53749	Merge pull request #469 from broadinstitute/gg_gatkdoc_fixes Assorted fixes and improvements to gatkdocs	2014-01-14 05:56:07 -08:00
Eric Banks	fd511d12a2	Fixing the Haplotype Resolver so that it doesn't complain about missing header lines. The code comments very clearly state that INFO fields shouldn't be propagated into the output, but someone must have accidentally changed it afterwards. This is just a simple one-line fix to make sure the code adhered to the comments. Delivers #63333488.	2014-01-13 22:47:43 -05:00
droazen	347fab4717	Merge pull request #471 from broadinstitute/eb_output_log_info_for_tim Adding more meta information about the user to the GATK logging output, per Tim F's request.	2014-01-13 17:48:40 -08:00
Geraldine Van der Auwera	bdb3954eb3	removed maxRuntime minValue	2014-01-13 20:45:43 -05:00
Geraldine Van der Auwera	8fcad6680b	Assorted fixes and improvements to gatkdocs -Added docs for ERC mode in HC -Move RecalibrationPerformance walker since to private since it is experimental and unsupported -Updated VR docs and restored percentBad/numBad (but @Hidden) to enable deprecation alert if users try to use them -Improved error msg for conflict between per-interval aggregation and -nt -Minor clean up in exception docs -Added Toy Walkers category for devs and dev supercat (to build out docs for developers) -Added more detailed info to GenotypeConcordance doc based on Chris forum post -Added system to include min/max argument values in gatkdocs (build gatkdocs with 'ant gatkdocs' to test it, see engine and DoC args for in situ examples) -Added tentative min/max argument annotations to DepthOfCoverage and CommandLineGATK arguments (and improved docs while at it) -Added gotoDev annotation to GATKDocumentedFeature to track who is the go-to person in GSA for questions & issues about specific walkers/tools (now discreetly indicated in each gatkdoc)	2014-01-13 17:46:22 -05:00
Eric Banks	c7e08965d0	The QD normalization for indels was busted and is now fixed. It is true that indels of length > 1 have higher QUALS than those of length = 1. But for the HC those QUALS are not that much higher, and it doesn't continue scaling up as the indels get larger. So we no longer normalize by indel length (which massively over-penalizes larger events and effectively drops their QD to 0). For the UG the previous normalization also wasn't perfect. Now we divide the indel length by a factor of 3 to make sure that QD is consistent over the range of indel lengths. Integration tests change because QD is different for indels. Also, got permission from Valentin to archive a failing test that no longer applies. Thanks to Kurt on the GATK forum for pointing this all out.	2014-01-13 15:23:36 -05:00
Eric Banks	851ec67bdc	Adding more meta information about the user to the GATK logging output, per Tim F's request.	2014-01-13 14:36:02 -05:00
droazen	7cd304fb41	Merge pull request #470 from broadinstitute/mf_new_RBP Mf new rbp	2014-01-13 08:46:27 -08:00
Ryan Poplin	3b8209f3b2	Merge pull request #467 from broadinstitute/rp_fix_names_NA12878ROCCurve The ROC Curve report lists the name as the name of the vcf file now inst...	2014-01-09 06:56:34 -08:00
MauricioCarneiro	50cd6781b3	Merge pull request #465 from broadinstitute/eb_improvements_to_ref_confidence_merger Improvements to ref confidence merger	2014-01-08 10:51:01 -08:00
Ryan Poplin	8881926bc6	The ROC Curve report lists the name as the name of the vcf file now instead of project+name.	2014-01-08 09:44:21 -05:00
Ryan Poplin	c86e36c909	Merge pull request #466 from broadinstitute/rp_phase3_vqsr_scala Adding here the Qscript used to perform the VQSR for 1000 Genomes Projec...	2014-01-08 06:39:46 -08:00
Ryan Poplin	7d5a710ea6	Adding here the Qscript used to perform the VQSR for 1000 Genomes Project phase 3	2014-01-08 09:38:13 -05:00
Eric Banks	553b3e56bd	Merge pull request #463 from broadinstitute/eb_fix_realigner_bugs_from_pearson Fixed edge condition in the realigner where a realigned read can sometim...	2014-01-08 05:36:11 -08:00
Eric Banks	0323caefc8	Added some bug fixes to the gVCF merging code after finally getting some real data to play with. Still under construction, awaiting more test data from Valentin.	2014-01-08 08:34:35 -05:00
Eric Banks	f172c349f6	Adding the functionality to enable users to input a file of VCFs for -V. To do this I have added a RodBindingCollection which can represent either a VCF or a file of VCFs. Note that e.g. SelectVariants allows a list of RodBindingCollections so that one can intermix VCFs and VCF lists. For VariantContext tags with a list, by default the tags for the -V argument are applied unless overridden by the individual line. In other words, any given line can have either one token (the file path) or two tokens (the new tags and the file path). For example: foo.vcf VCF,name=bar bar.vcf Note that a VCF list file name must end with '.list'. Added this functionality to CombineVariants, CombineReferenceCalculationVariants, and VariantRecalibrator.	2014-01-08 00:45:00 -05:00
Eric Banks	c133909d32	Fixed edge condition in the realigner where a realigned read can sometimes get partially aligned off the end of the contig. Now we ignore such reads (which is much easier than trying to figure out when to soft-clip). Added unit test.	2014-01-08 00:37:28 -05:00
Menachem Fromer	e33d3dafc6	Add documentation for RBP, and also update the MD5 for the tests now that the output uses HP tags instead of '\|', which is now reserved for trio-based phasing	2014-01-03 12:04:47 -05:00
Menachem Fromer	d1275651ae	Merge remote-tracking branch 'origin/master' into mf_new_RBP	2014-01-03 01:13:40 -05:00
Eric Banks	f6a44afa3a	Merge pull request #464 from broadinstitute/eb_rev_variant_jar_for_bcf_fixes Rev'ing the Variant jar to incorporate some patches to the BCF encoder t...	2014-01-02 21:05:13 -08:00
Eric Banks	856c17868b	Rev'ing the Variant jar to incorporate some patches to the BCF encoder that Menachem needs.	2014-01-02 23:33:17 -05:00
Ryan Poplin	5c32ad174a	Merge pull request #452 from broadinstitute/rp_vqsr_aggregate_model Allow for additional input data to be used in the VQSR for clustering bu...	2014-01-02 12:54:45 -08:00
Ryan Poplin	856c1f87c1	Allow for additional input data to be used in the VQSR for clustering but don't carry it forward into the output VCF file. -- New -a argument in the VQSR for specifying additional data to be used in the clustering -- New NA12878KB walker which creates ROC curves by partitioning the data along VQSLOD and calculating how many KB TP/FP's are called.	2014-01-02 14:46:04 -05:00
Ryan Poplin	c82501ac35	Merge pull request #462 from broadinstitute/rp_SingleSampleHC_exome_scala Adding SingleSampleHC_exome.scala for Valentin to use as a jumping off p...	2014-01-02 08:57:27 -08:00
Ryan Poplin	15372c4873	Adding SingleSampleHC_exome.scala for Valentin to use as a jumping off point.	2014-01-02 11:56:17 -05:00
amilev	f81a38f596	Merge pull request #446 from broadinstitute/ami-RNAseq-tools Write a new tool for spliting reads that have N cigar string.	2014-01-01 21:06:25 -08:00
MauricioCarneiro	1223345726	Merge pull request #459 from broadinstitute/eb_fix_bad_hmm_clipping Fixed up edge condition for clipping long reads in the HMM.	2014-01-01 20:00:34 -08:00
Ami Levy-Moonshine	6da53aea09	Write a new tool for spliting reads that have N cigar string. For example, this tool can be used for processing bowtie RNA-seq data. Each read with k N-cigar elemments is plit to k+1 reads. The split is done by hard clipping the bases rest of the bases. In order to do it, few changes were introduced to some other clipping methods: - make a segnificant change in ClippingOp.hardClip() that prevent the spliting of read with cigar: 1M2I1N1M3I. - change getReadCoordinateForReferenceCoordinate in ReadUtil to recognize Ns create unitTests for that walker: - change ReadClipperTestUtils to be more general in order to use its code and avoid code duplication - move some useful methods from ReadClipperTestUtils to CigarUtils create integration test for that class small change in a comment in FullProcessingPipeline last commit: Address review comments: - move to protected under walkers/rnaseq - change the read splitting methods to be more readable and more efficiant - change (minor changes) some methods in ReadClipper to allow the changes in split reads - add (minor change) one method to CigarUtils to allow the changes in split reads - change ReadUtils.getReadCoordinateForReferenceCoordinate to include possible N in the cigar - address the rest of the review comments (minor changes) - fix ReadUtilsUnitTest.testReadWithNs acoording to the defult behaviour of getReadCoordinateForReferenceCoordinate (in case of refernce index that fall into deletion, return the read index of the base before the deletion). - add another test to ReadUtilsUnitTest.testReadWithNs - Allow the user to print the split positions (not working proparly currently)	2014-01-01 22:21:36 -05:00

1 2 3 4 5 ...

12965 Commits (1e25a758f57b74c538afd7da6ba4798e0d2f1590) All Branches Search

12965 Commits (1e25a758f57b74c538afd7da6ba4798e0d2f1590)

All Branches