Commit Graph

8924 Commits (2f334a57c2433aade864d1c28fa08b6fc5f28dec)

Author SHA1 Message Date
Mark DePristo 2f334a57c2 ReadGroupProperties mk2
-- Includes paired end status (T/F)
-- Includes count of reads used in calculation
-- Includes simple read type (2x76 for example)
-- Better handling of insert size, read length when there's no data, or the data isn't paired end by emitting NA not 0
2012-03-01 18:43:53 -05:00
Mauricio Carneiro 486712bfc2 ugly RG encoding 2012-03-01 17:56:45 -05:00
Mauricio Carneiro 4409293b5d get rid of the sorting parameter 2012-03-01 17:56:45 -05:00
Mauricio Carneiro 29f74b658b Unit tests for the context covariate
this is simple, but it's the infra-structure to start messing around with the context.
2012-03-01 17:56:45 -05:00
Mark DePristo aff508e091 ReadGroupProperties walker and associated infrastructure
-- ReadGroupProperties: Emits a GATKReport containing read group, sample, library, platform, center, median insert size and median read length for each read group in every BAM file.
-- Median tool that collects up to a given maximum number of elements and returns the median of the elements.
-- Unit and integration tests for everything.
-- Making name of TestProvider protected so subclasses and override name more easily
2012-03-01 15:01:11 -05:00
Eric Banks 88e060e1e9 Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-01 10:55:21 -05:00
Eric Banks ed70d9b380 The reduced reads + calling script I am using; adding now for Khalid's benefit 2012-03-01 10:55:08 -05:00
Ryan Poplin 2af470404c HaplotypeCaller integration test goes back to using flat base insertion and base deletion quality scores while the BQSR is in flux. 2012-03-01 09:05:55 -05:00
Mauricio Carneiro 9e95b10789 Context covariate now operates as a highly compressed bitset
* All contexts with 'N' bases are now collapsed as uninformative
   * Context size is now represented internally as a BitSet but output as a dna string
   * Temporarily disabled sorted outputs because of null objects
2012-02-29 19:25:21 -05:00
Mauricio Carneiro d379c3763a DNA Sequence to BitSet and vice-versa conversion tools
* Turns DNA sequences (for context covariates) into bit sets for maximum compression
  * Allows variable context size representation guaranteeing uniqueness.
  * Works with long precision, so it is limited to a context size of 31 bases (can be extended with BigNumber precision if necessary).
  * Unit Tests added
2012-02-29 19:25:20 -05:00
Mark DePristo 18c91e3cb3 Massively expanded haplotype caller likelihood testing
-- Now include combinatorial testing for all input parameters: base quality, indel quality, continuation penalty, base identity, and indel length
-- Disabled default the results coming back are not correct
2012-02-29 08:49:12 -05:00
Mark DePristo a6735d1d88 UnitTest framework for HaplotypeCaller likelihood function
-- Currently disabled as the likelihood function doesn't pass basic unit tests
-- Also make low-level function in LikelihoodCalculationEngine protected
2012-02-28 20:41:52 -05:00
Menachem Fromer 30abe123f1 Added space before command-line parameters 2012-02-28 14:54:45 -05:00
Mauricio Carneiro cf6472eea6 Silly build fix. 2012-02-28 10:52:27 -05:00
Mark DePristo 3ddcd6879f Bugfix for fullRefresh mode for gsafolkLogsForTableau 2012-02-28 10:36:58 -05:00
Eric Banks 129b5e7f6b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-28 10:09:34 -05:00
Eric Banks a4a279ce80 Damn you, Mark 2012-02-28 10:09:09 -05:00
Khalid Shakir 0681bea5a5 Changed DoC from PartitionType.INTERVAL to PartitionType.NONE since it doesn't have a way to gather scattered outputs.
Added MultiallelicSummary to HSP eval.
2012-02-28 09:27:27 -05:00
Eric Banks bd398e30fd Another quick optimization 2012-02-28 09:25:35 -05:00
Eric Banks 40bdadbda5 Minor optimization as per Mark 2012-02-28 09:24:07 -05:00
Eric Banks d7928ad669 Drat, missed one: handle null alleles being passed in. 2012-02-27 21:31:54 -05:00
Mauricio Carneiro 1245a3c868 Tool for diagnosing new technology error modes 2012-02-27 19:37:19 -05:00
Mark DePristo 24356f11b7 Merged bug fix from Stable into Unstable
-- Resolved conflict

Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/datasources/reads/SAMDataSource.java
2012-02-27 17:13:17 -05:00
Mark DePristo 0b29d54937 Changed most BAMSchedule ReviewedStingExceptions to UserExceptions
-- As these represent the bulk of the StingExceptions coming from BAMSchedule and are caused by simple problems like the user providing bad input tmp directories, etc.
2012-02-27 17:08:41 -05:00
Mark DePristo f9e8e82e33 Removed unused class variable from VCFHeaderLineTranslator 2012-02-27 17:07:19 -05:00
Mark DePristo 100ddef930 Fix typo in VariantContextBuilder 2012-02-27 17:06:45 -05:00
Mark DePristo ca0931c01f Adding test for reading samtools VCF file 2012-02-27 17:05:50 -05:00
Menachem Fromer 33cf1368ba Added options to add XHMM command-line parameters for discovery and genotyping 2012-02-27 16:03:02 -05:00
Eric Banks bd944ab04f Another test where we no longer print out 'NaN' for the AF. 2012-02-27 15:19:08 -05:00
Mark DePristo 5f7ccdcc01 Avoid calling getBasePileup when there's no pileup in NBaseCount annotation 2012-02-27 15:12:25 -05:00
Eric Banks 52871187d7 Adding integration test for file with no GTs. Also updated md5 for one other test (since we no longer print out 'NaN' for the AF). 2012-02-27 15:09:56 -05:00
Mark DePristo 729bb954e2 Throws ReviewedStingException for a bug when parent VariantContext argument is null 2012-02-27 15:09:00 -05:00
Eric Banks 998ed8fff3 Bug fix to deal with VCF records that don't have GTs. While in there, optimized a bunch of related functions (including removing a copy of the method calculateChromosomeCounts(); why did we have 2 copies? very dangerous). 2012-02-27 14:56:10 -05:00
Mark DePristo 4d9582de77 More general catching of Exceptions in interval reading to throw MalformedFile exception in all cases
-- Now throws UserException no matter what happens during the reading of the intervals file.
2012-02-27 14:02:26 -05:00
Mark DePristo 9712fed7a5 Trap SAMFormatException and rethrow as MalformatedBAM exception
-- Trap errors in header and rethrow
-- Wrap underlying iterator in MalformatedBAMErrorReformattingIterator
2012-02-27 13:52:50 -05:00
Eric Banks 1ea34058c2 Updating integration tests now that standard annotations support multiple alleles 2012-02-27 11:32:26 -05:00
Eric Banks 64754e7870 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-27 11:31:41 -05:00
Eric Banks 850c5d0db2 Enabling Rank Sum Tests for multi-allelics: use ref vs any alt allele. 2012-02-27 09:59:36 -05:00
Eric Banks dfdf4f989b Enabling Fisher Strand for multi-allelics: use the alt allele with max AC. Added minor optimization to the method in the VC. 2012-02-27 09:50:09 -05:00
Guillermo del Angel 16122bea8d Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-25 13:57:54 -05:00
Guillermo del Angel dea35943d1 a) Bug fix in calling new functions that give indel bases and length from regular pileup in LocusIteratorByState, b) Added unit test to cover these. 2012-02-25 13:57:28 -05:00
Mark DePristo c8a06e53c1 DoC now properly handles reference N bases + misc. additional cleanups
-- DoC now by default ignores bases with reference Ns, so these are not included in the coverage calculations at any stage.
-- Added option --includeRefNSites that will include them in the calculation
-- Added integration tests that ensures the per base tables (and so all subsequent calculations) work with and without reference N bases included
-- Reorganized command line options, tagging advanced options with @Advanced
2012-02-25 11:32:50 -05:00
Mark DePristo 50de1a3eab Fixing bad VCFIntegration tests
-- Left disabled a test that should have been enabled
-- Didn't add the md5 to the test I actually added
-- Now VCFIntegrationTests should be working!
2012-02-25 11:26:36 -05:00
Mark DePristo 9bad51877e Generalized gsafolkLSFLogs.py to gsafolkLogsForTableau.py
-- Now updates both LSF logs and filesystem sizes
-- New Tableau emails will include both LSF and FS info!
2012-02-24 15:58:24 -05:00
Mark DePristo 80b5c7ad21 Fix gitVersionNumbers script to not print git status messages to our file 2012-02-24 15:58:22 -05:00
Mark DePristo 747e1a728f Script to recreate entire GATKLog db from scratch
Useful primarily as a reference.  Sometimes necessary when low-level changes are made to the scripts, requiring all of the data to be reprocessed
2012-02-24 15:58:21 -05:00
Mark DePristo e94a534076 Added dry run and verbose options to gsafolkLSFLogs 2012-02-24 15:58:20 -05:00
Mark DePristo 253bb46bcd Add support to analyzeRunReports to tag xml logs with git version numbers 2012-02-24 15:58:19 -05:00
Guillermo del Angel c9a4c74f7a a) Bug fixes for last commit related to PileupElements (unit tests are forthcoming). b) Changes needed to make pool caller work in GENOTYPE_GIVEN_ALLELES mode c) Bug fix (yet again) for UG when GENOTYPE_GIVEN_ALLELES and EMIT_ALL_SITES are on, when there's no coverage at site and when input vcf has genotypes: output vcf would still inherit genotypes from input vcf. Now, we just build vc from scratch instead of initializing from input vc. We just take location and alleles from vc 2012-02-24 10:27:59 -05:00
Mauricio Carneiro 470375db58 added integration test for the ReduceReadsStash bug reported by Adam 2012-02-23 18:59:27 -05:00