Commit Graph

4432 Commits (e351ca869d5cc0ebc13289d911100a8ce3b7e0c0)

Author SHA1 Message Date
Geraldine Van der Auwera e351ca869d Updated gsalib package to match version we put in CRAN 2014-10-30 14:26:38 -04:00
Geraldine Van der Auwera b69b256003 Update pom versions to mark the start of GATK 3.4 development 2014-10-23 22:31:44 -04:00
Geraldine Van der Auwera eee94ec81f Update pom versions for the 3.3 release 2014-10-23 22:25:17 -04:00
Khalid Shakir 75c17bbd6f Adding sysinternals:junction:1.04:exe to local repo.
With Geraldine's blessing, works around issue where the repo for this exe keeps going down.
2014-10-24 05:33:47 +08:00
Khalid Shakir ac33eda231 Queue patches.
Fixed Queue bug with bad localhost addresses.
Spaces in job names, while fine in GridEngine 6, break in (Son of) GE8.
2014-10-23 22:59:19 +08:00
Geraldine Van der Auwera 3ba94b987c Minor documentation clarifications 2014-10-22 17:54:11 -04:00
rpoplin 0f89d1a362 Merge pull request #755 from broadinstitute/sc_Annotation_Docs_73647570
Improvements to documentation of variant annotations
2014-10-22 13:41:00 -04:00
Khalid Shakir 26ba4c11aa Minor fixups for previous commit once tests (only runnable at Broad) were run.
Fixed off by one error in size calculation IntervalUtils.scatterContigIntervals().
In test for fewer files than intervals, adjusted expected intervals.
In test for more files than intervals, adjusted expected exception.
2014-10-22 17:37:37 +08:00
Chris Smowton a62dc84795 Improved scatter contigs algorithm to be fairer when splitting a large number of contigs into a small number of parts.
See also: http://gatkforums.broadinstitute.org/discussion/comment/16010

Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-10-22 16:26:17 +08:00
Sheila Chandran b3c5ed4414 Improvements to documentation of variant annotations
- Added or modified explanations for majority of variant annotations
	- Generalized NBaseCount to include all tech platforms (not just SOLiD)
2014-10-21 18:20:04 -04:00
rpoplin c4fcd70a88 Merge pull request #754 from broadinstitute/rhl_variant_array_exception
Do not process a variant if it is too large (> readLength), and log an e...
2014-10-21 12:01:52 -04:00
Ron Levine 239151ac7b Do not process a variant if it is too large (> readLength), and log an error
remove final keyword before refMap and altMap, constructHaplotype() changes their values

return ArtificialHaplotype from constructHaplotype instaed of passing as an argument

Add logic so arraycopy does not throw an IndexOutOfBoundsException, add test for a long insert
2014-10-20 15:51:32 -04:00
Phillip Dexheimer f766608b4e Changed license of gsalib R package from BSD to MIT
- PT 75827540
2014-10-16 21:37:07 -04:00
Laura Gauthier 0f08065ebc Throw UserException if input VCFs have duplicate samples but no genotypemergeoption is specified 2014-10-15 16:03:10 -04:00
Geraldine Van der Auwera e7e8052f84 Updated license information
- Updated license files (private/protected) for version, address and a couple of legal clauses
- Updated license snippet throught the codebase
2014-10-14 17:10:12 -04:00
rpoplin 426907ddd0 Merge pull request #744 from broadinstitute/gg_gatkdocs_annots_and_GSON
Output JSON version of docs for Galaxy
2014-10-14 11:41:16 -04:00
ldgauthier d259f3c84f Merge pull request #745 from broadinstitute/ldg_VariantAnnotatorDocs
Added docs to VariantFiltration is accordance with new htsjdk changes.  ...
2014-10-10 14:11:24 -04:00
Laura Gauthier 0ecb85d321 Added docs to VariantFiltration is accordance with new htsjdk changes. Fixed typo in VariantAnnotator docs. 2014-10-10 11:54:24 -04:00
Ron Levine 36c27155af Made the threshold for the probability of a state being active a command line argument
remove TODO comment after activeProbThreshold

recover static ACTIVE_PROB_THRESHOLD for unit tests

Add min/max values for active_probability_threshold parameter

Move activeProbThreshold parameter to GATKArguemtnCollection

define ACTIVE_PROB_THRESHOLD in unit tests

add construction of argCollection in in ctor

Move arguments from GATKArgumentCollection to ActiveRegionWalker

Throw exception if threshold < 0 or > 1 in ActivityProfile ctor

max propogation distance parameter to ActiveRegionWalker for AcrtivityProfile

Use polymorphic getMaxProbPropagationDistance() so BandPassActivityProfile computes the crrect region size cutoff

Get the maxProbPropagationDistance from the super class's method, instead of directly, this is safer

Removed extraneous command line imports and make maxProbPropagationDistance a hidden argument

remove limit check for activeProbThreshold, not necessary because the check is made when imput as a command line arg

Remove extra 'region' in the doxygen param description for maxProbPropagationDistance
2014-10-10 10:36:02 -04:00
Geraldine Van der Auwera 3f21f63161 Output JSON version of docs for Galaxy 2014-10-09 06:42:25 -04:00
Ryan Poplin ac1a397024 This warning message actually happens all the time in AssessNA12878 when we subset down to biallelic events but I've verified that it is working as intended. Moving the logging level up to debug. 2014-09-29 11:40:38 -04:00
Phillip Dexheimer 1482a53aba Added -writeFullFormat engine-level argument
* This argument forces GATK to always write every record in the VCF format field, even if some records at the end are missing and could be removed
  * Revved htsjdk and picard
  * PT 70993484
2014-09-17 08:25:27 -04:00
Valentin Ruano-Rubio 95b45443ae Updated test according to changes in the AF calculator framework.
Changes:
-------

* Updated current unit and integration test to use the new API components.
* Added unit tests for new classes AFPriorProvider and AFCalculatorProviders.
* Added integration test for mixed ploidy GenotypeGVCFs and CombineGVCFs
2014-09-12 14:59:47 -04:00
Valentin Ruano-Rubio 3cdeab6e9e GenotypingEngines and walkers now use AFCalc(ulator) providers rathern than instanciate their own (fixed) calculators directly.
Changes:
-------

* GenotypingEngine uses now a AFCalc provider instead of
  its own thread-local with one-time initialized and fixed
  AF calculator.

* All walkers that use a GenotypingEngine now are passing
  the appropiate AF calculator provider. For now most
  just use a fix calculator (FixedAFCalculatorProvider)
  except GenotypeGVCFs as this one now can cope with
  mixture of ploidies failing-over to a general-ploidy
  calculator when the preferred implementation is not
  capable to handle a site's analysis.
2014-09-12 14:25:09 -04:00
Phillip Dexheimer a35f5b8685 Moved arguments controlling options in output files into the engine
* Arguments involved are --no_cmdline_in_header, --sites_only, and --bcf for VCF files and --bam_compression, --simplifyBAM, --disable_bam_indexing, and --generate_md5 for BAM files
 * PT 52740563
 * Removed ReadUtils.createSAMFileWriterWithCompression(), replaced with ReadUtils.createSAMFileWriter(), which applies all appropriate engine-level arguments
 * Replaced hard-coded field names in ArgumentDefinitionField (Queue extension generator) with a Reflections-based lookup that will fail noisily during extension generation if there's an error
2014-09-05 21:18:11 -04:00
Khalid Shakir 376592f423 Various fixes for package tests.
Explicitly including gatk/queue test-jar artifacts in package test classpaths.
SelectVariantsIntegrationTest#testInvalidJexl now resets the JexlEngine silent flag that VariantFiltration.initialize() toggles.
External example no longer tries to unpack nonexistent gatk artifact jars during package tests.
2014-09-04 15:30:31 -04:00
droazen 5c087a6e1f Merge pull request #724 from broadinstitute/ks_remove_test_qscript_symbolic_links
Removed symlink creation for tests and qscripts
2014-09-04 09:10:54 -04:00
Valentin Ruano Rubio c7925f6e5c Merge pull request #719 from broadinstitute/vrr_generalize_ploidy_in_genotype_gvcfs
Adds support for omniploidy to GenotypeGVCFs and CombineGVCFs.
2014-09-02 16:51:02 -04:00
Valentin Ruano-Rubio d363725b4b Adds support for omniploidy to GenotypeGVCFs and CombineGVCFs.
Same changes fixed the problem for GenotypeGVCFs and CombineGVCFs.

Stories:

  - https://www.pivotaltracker.com/story/show/77626044
  - https://www.pivotaltracker.com/story/show/77626854

Changes:

  - Generalized the code for the merging in GATKVariantContextUtils to cope
    with ploidy != 2.
  - GenotypeGVCFs now check that the input's ploidy conform to the '-ploidy'
    argument.
  - Moved out Refernce Confidence VC merging code from GATKVariantContextUtils
    so that we can keep new code in protected.

Caveats:

  - GenotypeGVCFs only can deal with input files that have the same ploidy in
    all positions; the one that the user MUST indicate in the -ploidy argument
    (if different to the default 2).
  - CombineGVCFs won't necessarely complain if its passed mixed ploidy
    inputs but you won't be able to genotype it with GenotypeGVCFs.

Test:

   - Removed deprecated unit tests for GATKVariantContextUtils.
   - Moved unit-tests regarding GVCF merging from GATKVariantContextUtilsUnitTest
     to ReferenceConfidenceVariantContextUtilsUnitTest.
   - Added unit test for new code for mapping genotype indices between allele
     index encoding in GenotypeLikelihoodCalculator.
   - GenotypeGVCFs and CombineGVCFs original integration test are unaffected
     by the change.
   - Added tetraploid run integration tests to check on non-diploid execution
     of GenotypeGVCFs and CombineGVCFs.
2014-09-02 15:06:47 -04:00
Eric Banks fe86dafc41 Merge pull request #705 from broadinstitute/gg_simplify_gatkdocs_templates
Changed the GATKDocs format to PHP
2014-09-02 06:28:26 -04:00
Khalid Shakir fcb0eca203 Now passing in the path to the GATK directory to tests.
Changed tests and scripts to use gatkdir full path instead of relative testdata/qscripts symbolic links.
Although symlinks not created, left the symlink deletion script execution with a comment about future removal.
Re-enabled example UG pipeline queue test.
Replaced all hardcoded strings of {public,private}/testdata with BaseTest variables.
Refactored temp list creation method from ListFileUtilsUnitTest to BaseTest.createTempListFile.
Removed list files with hardcoded paths, now using createTempListFile instead with private test dir variable.
2014-09-02 01:40:59 +08:00
Michael Linderman 380cd67146 Update extension generator to recognize RodBindingCollection as 'taggable'
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-09-01 12:19:44 +08:00
Valentin Ruano-Rubio 6695aeafd9 Disable physical phasing for non-diploid HC calling.
Story:

    https://www.pivotaltracker.com/story/show/77452256

Changes:

    If ploidy != 2, disable physical phasing and log an info message to let the user know.

Tests:

    Change md5s affected by this change.
2014-08-23 10:52:07 -04:00
Valentin Ruano-Rubio fc5ce4b662 Created the stand-alone AC and AF annotation AlleleCountBySample
Story:

  https://www.pivotaltracker.com/story/show/77250524

Changes:

  - Remove the annotating code in GeneralPloidyExactAFCalc (GPEAFC) class.
  - Added the asAlleleList to GenotypeAlleleCounts class and get (GPEAFC) to use that instead of implementing its own (nicer and more reusable code).
  - Removed the explicit addition of AlleleCountBySample fields to the VCF header by the walker initialize
  - Added utility methods in Utils to wrap and int[] array into a List<Integer>, and double[] array into a List<Double> efficiently.

Test:

  - Added unit-testing for asAlleleList in GenotypeAlleleCountsUnitTest (within testFirst and testNext).
  - Added unit-testing for new methods in Utils : asList(int[]) and asList(double[])
  - Changed UG General Ploidy test to add explicitly those annotations.
  - Non-trivial changes in integration tests involving non-diploid runs (namelly haploid and tetraploid) as they are not showing
    those annotations anylonger, so the MD5s have been changed accordingly.
2014-08-22 20:33:25 -04:00
Valentin Ruano-Rubio 8d9a55ae60 Moving new omniploidy likelihood calculation classes to their final package (as far as this pull-request is concerned) in org.broadinstitute.gatk.tools.walkers.genotyper 2014-08-19 11:54:29 -04:00
Valentin Ruano-Rubio 611b7f25ea Adds unit-test and integration test for new omniploidy likelihood calculation components
Added md5 to HaplotypeCallerIntegrationTest.testHaplotypeCallerSingleSampleWithDbsnp
2014-08-19 11:53:19 -04:00
Valentin Ruano-Rubio 9ee9da36bb Generalize the calculation of the genotype likelihoods in HC to cope with haploid and multiploidy
Changes in several walker to use new sample, allele closed lists and new GenotypingEngine constructors signatures

Rebase adoption of new calculation system in walkers
2014-08-19 11:53:06 -04:00
Valentin Ruano-Rubio 4f993e8dbe Added read-likelihoods array base structure to substitute existing Map-of-Map-of-Maps. 2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio 242cd0e58f Added genotype allele counts and likelihood calculator utilities for arbitrary ploidy and number of alleles 2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio b0a4cb9f0c Added close sample and allele list data-structures and utility classes 2014-08-19 11:50:12 -04:00
Geraldine Van der Auwera cdba069b02 changed the GATKDocs format to PHP 2014-08-18 18:04:07 -04:00
Eric Banks eb84091702 Update the --keepOriginalAC functionality in SelectVariants to work for sites that lose alleles in the selection. 2014-08-14 15:34:09 -04:00
Ryan Poplin 3a9a78c785 Removing an assumption that ADs were in the same order if the number of alleles matched. This happens for example when one sample is C->T and another sample is C->G. 2014-08-13 13:26:40 -04:00
Eric Banks 27193c5048 Merge pull request #700 from broadinstitute/eb_phase_HC_variants_PT74816060
Initial implementation of functionality to add physical phasing informat...
2014-08-13 12:30:32 -04:00
Eric Banks 4512940e87 Initial implementation of functionality to add physical phasing information to the output of the HaplotypeCaller.
If any pair of variants occurs on all used haplotypes together, then we propagate that information into the gVCF.
Can be enabled with the --tryPhysicalPhasing argument.
2014-08-13 12:25:31 -04:00
Geraldine Van der Auwera 49702dc695 Clarified Phone Home system details re: privacy 2014-08-12 17:23:35 -04:00
jmthibault79 6d7201a7f8 Merge pull request #698 from broadinstitute/pd_printreads_subset
Improvements to read-group filtering in PrintReads
2014-08-12 14:13:07 -04:00
Phillip Dexheimer 7e77875c81 Improvements to read-group filtering in PrintReads
- Read groups that are excluded by sample_name, platform, or read_group arguments no longer appear in the header
 - The performance penalty associated with filtering by read group has been essentially eliminated
 - Partial fulfillment of PT 73075482
2014-08-11 20:08:16 -04:00
Valentin Ruano-Rubio 9a9a68409e ReadLikelihoods class introduction final changes before merging
Stories:

        https://www.pivotaltracker.com/story/show/70222086
        https://www.pivotaltracker.com/story/show/67961652

Changes:

  Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM.
  Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set.
  Updated some integration test md5s.

Fixing GraphBased bugs with new master code
Fixed ReadLikelihoods.changeReads difficult to spot bug.
Changed PairHMM interface to fix a bug
Fixed missing changes for various PairHMM implementations to get them to use the new structure.
Fixed various bugs only detectable when running with full sample(s).
Believe to have fixed the lack of annotations in UG runs
Fixed integrationt test MD5s
Updating some md5s
Fixed yet another md5 probably left out by mistake
2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio 0b472f6bff Added new test to verify the functionality of ReadLikelihoods.java and its use in HC. Updated existing integration test md5s.
Stories:

    https://www.pivotaltracker.com/story/show/70222086
    https://www.pivotaltracker.com/story/show/67961652
2014-08-11 17:46:28 -04:00