Commit Graph

1770 Commits (ce617b2dfc8a24a71c4afaea01b4d885bdbd62a7)

Author SHA1 Message Date
David Roazen 811f871f78 Do not fail tests that require the GATK private key if the user does not have permission to read it
Several of the unit tests for the new key authorization feature require
read access to the GATK master private key file. Since this file is only
readable by members of the group gsagit, this makes it hard for people
outside the group to run the test suite.

Now, we skip tests that require the master private key if the private
key exists (since not existing would be a true error) but is not readable
by the user running the test suite

Bamboo, of course, will always be able to run these tests.
2012-03-06 15:57:02 -05:00
Christopher Hartl 67def6acc8 Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2012-03-06 14:23:14 -05:00
Christopher Hartl 20c1fbaf0f Fixing a merge (turning off downsampling on DoC) 2012-03-06 14:22:45 -05:00
Ryan Poplin 46b470cc69 Minor misc updates 2012-03-06 10:14:45 -05:00
David Roazen 0702ee1587 Public-key authorization scheme to restrict use of NO_ET
-Running the GATK with the -et NO_ET or -et STDOUT options now
 requires a key issued by us. Our reasons for doing this, and the
 procedure for our users to request keys, are documented here:
 http://www.broadinstitute.org/gsa/wiki/index.php/Phone_home

-A GATK user key is an email address plus a cryptographic signature
 signed using our private key, all wrapped in a GZIP container.
 User keys are validated using the public key we now distribute with
 the GATK. Our private key is kept in a secure location.

-Keys are cryptographically secure in that valid keys definitely
 came from us and keys cannot be fabricated, however keys are not
 "copy-protected" in any way.

-Includes private, standalone utilities to create a new GATK user key
 (GenerateGATKUserKey) and to create a new master public/private key
 pair (GenerateKeyPair). Usage of these tools will be documented on
 the internal wiki shortly.

-Comprehensive unit/integration tests, including tests to ensure the
 continued integrity of the GATK master public/private key pair.

-Generation of new user keys and the new unit/integration tests both
 require access to the GATK private key, which can only be read by
 members of the group "gsagit".
2012-03-06 00:09:43 -05:00
Lechu 027843d791 I've simply added a "library(grid)" call at the beginning of the R script generation since R 2.14.2 doesn't seem to load the "grid" package as default. I haven't tested it on previous R versions (you may edit the R version comment to be more precise if desired), but I'm almost certain that this library call shouldn't do any harm on them.
Signed-off-by: Ryan Poplin <rpoplin@broadinstitute.org>
2012-03-05 21:27:03 -05:00
Ryan Poplin f6905630bb Adding Unit test for Haplotype class. Used in HC's genotype given alleles mode. 2012-03-05 21:08:07 -05:00
Ryan Poplin 9b53250bef Adding Unit test for Haplotype class. Used in HC's genotype given alleles mode. 2012-03-05 21:07:36 -05:00
Ryan Poplin b37461587d Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-05 17:54:59 -05:00
Ryan Poplin c6ded4d23c Bug fix for hard clipping reads when base insertion and base deletion qualities are present in the read. Updating HaplotypeCaller integration tests to reflect all the recent changes. 2012-03-05 17:54:42 -05:00
Ryan Poplin 14a77b1e71 Getting rid of redundant methods in MathUtils. Adding unit tests for approximateLog10SumLog10 and normalizeFromLog10. Increasing the precision of the Jacobian approximation used by approximateLog10SumLog which changes the UG+HC integration tests ever so slightly. 2012-03-05 12:28:32 -05:00
Mauricio Carneiro e9ad382e74 unifying the BQSR argument collection 2012-03-05 10:48:26 -05:00
Ryan Poplin f879daa7d0 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-05 08:29:08 -05:00
Ryan Poplin d6871967ae Adding more unit tests and contracts to PairHMM util class. Updating HaplotypeCaller to use the new PairHMM util class. Now that the HMM result isn't dependent on the length of the haplotype there is no reason to ensure all haplotypes have the save length which simplifies the code considerably. 2012-03-05 08:28:42 -05:00
Guillermo del Angel 3b5a7c34d7 Added argument to ValidationAmplicons to only output valid sequences - useful for not having to post-filter or grep resulting files before delivering downstream 2012-03-04 10:24:29 -05:00
Mark DePristo 69611af7d3 Workaround for bug in Picard in ReadGroupProperties
-- NPE caused when you call getRunDate on a read group without a date.
2012-03-02 18:53:45 -05:00
Mark DePristo ba71b0aee4 ReadGroupProperties mk3
-- Includes sequencing date
2012-03-02 16:12:42 -05:00
Eric Banks 1e07e97b58 Optimization: create allele list just once, not for each genotype 2012-03-02 13:30:17 -05:00
Ryan Poplin 0ad7d5fbc1 Standalone common Pair HMM utility class with associated unit tests. 2012-03-01 22:41:13 -05:00
Mark DePristo 2f334a57c2 ReadGroupProperties mk2
-- Includes paired end status (T/F)
-- Includes count of reads used in calculation
-- Includes simple read type (2x76 for example)
-- Better handling of insert size, read length when there's no data, or the data isn't paired end by emitting NA not 0
2012-03-01 18:43:53 -05:00
Mauricio Carneiro 486712bfc2 ugly RG encoding 2012-03-01 17:56:45 -05:00
Mauricio Carneiro 29f74b658b Unit tests for the context covariate
this is simple, but it's the infra-structure to start messing around with the context.
2012-03-01 17:56:45 -05:00
Mark DePristo aff508e091 ReadGroupProperties walker and associated infrastructure
-- ReadGroupProperties: Emits a GATKReport containing read group, sample, library, platform, center, median insert size and median read length for each read group in every BAM file.
-- Median tool that collects up to a given maximum number of elements and returns the median of the elements.
-- Unit and integration tests for everything.
-- Making name of TestProvider protected so subclasses and override name more easily
2012-03-01 15:01:11 -05:00
Mauricio Carneiro 9e95b10789 Context covariate now operates as a highly compressed bitset
* All contexts with 'N' bases are now collapsed as uninformative
   * Context size is now represented internally as a BitSet but output as a dna string
   * Temporarily disabled sorted outputs because of null objects
2012-02-29 19:25:21 -05:00
Mauricio Carneiro d379c3763a DNA Sequence to BitSet and vice-versa conversion tools
* Turns DNA sequences (for context covariates) into bit sets for maximum compression
  * Allows variable context size representation guaranteeing uniqueness.
  * Works with long precision, so it is limited to a context size of 31 bases (can be extended with BigNumber precision if necessary).
  * Unit Tests added
2012-02-29 19:25:20 -05:00
Eric Banks 129b5e7f6b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-28 10:09:34 -05:00
Eric Banks a4a279ce80 Damn you, Mark 2012-02-28 10:09:09 -05:00
Khalid Shakir 0681bea5a5 Changed DoC from PartitionType.INTERVAL to PartitionType.NONE since it doesn't have a way to gather scattered outputs.
Added MultiallelicSummary to HSP eval.
2012-02-28 09:27:27 -05:00
Eric Banks bd398e30fd Another quick optimization 2012-02-28 09:25:35 -05:00
Eric Banks 40bdadbda5 Minor optimization as per Mark 2012-02-28 09:24:07 -05:00
Eric Banks d7928ad669 Drat, missed one: handle null alleles being passed in. 2012-02-27 21:31:54 -05:00
Mark DePristo 24356f11b7 Merged bug fix from Stable into Unstable
-- Resolved conflict

Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/datasources/reads/SAMDataSource.java
2012-02-27 17:13:17 -05:00
Mark DePristo 0b29d54937 Changed most BAMSchedule ReviewedStingExceptions to UserExceptions
-- As these represent the bulk of the StingExceptions coming from BAMSchedule and are caused by simple problems like the user providing bad input tmp directories, etc.
2012-02-27 17:08:41 -05:00
Mark DePristo f9e8e82e33 Removed unused class variable from VCFHeaderLineTranslator 2012-02-27 17:07:19 -05:00
Mark DePristo 100ddef930 Fix typo in VariantContextBuilder 2012-02-27 17:06:45 -05:00
Mark DePristo ca0931c01f Adding test for reading samtools VCF file 2012-02-27 17:05:50 -05:00
Eric Banks bd944ab04f Another test where we no longer print out 'NaN' for the AF. 2012-02-27 15:19:08 -05:00
Mark DePristo 5f7ccdcc01 Avoid calling getBasePileup when there's no pileup in NBaseCount annotation 2012-02-27 15:12:25 -05:00
Eric Banks 52871187d7 Adding integration test for file with no GTs. Also updated md5 for one other test (since we no longer print out 'NaN' for the AF). 2012-02-27 15:09:56 -05:00
Mark DePristo 729bb954e2 Throws ReviewedStingException for a bug when parent VariantContext argument is null 2012-02-27 15:09:00 -05:00
Eric Banks 998ed8fff3 Bug fix to deal with VCF records that don't have GTs. While in there, optimized a bunch of related functions (including removing a copy of the method calculateChromosomeCounts(); why did we have 2 copies? very dangerous). 2012-02-27 14:56:10 -05:00
Mark DePristo 4d9582de77 More general catching of Exceptions in interval reading to throw MalformedFile exception in all cases
-- Now throws UserException no matter what happens during the reading of the intervals file.
2012-02-27 14:02:26 -05:00
Mark DePristo 9712fed7a5 Trap SAMFormatException and rethrow as MalformatedBAM exception
-- Trap errors in header and rethrow
-- Wrap underlying iterator in MalformatedBAMErrorReformattingIterator
2012-02-27 13:52:50 -05:00
Eric Banks 1ea34058c2 Updating integration tests now that standard annotations support multiple alleles 2012-02-27 11:32:26 -05:00
Eric Banks 64754e7870 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-27 11:31:41 -05:00
Eric Banks 850c5d0db2 Enabling Rank Sum Tests for multi-allelics: use ref vs any alt allele. 2012-02-27 09:59:36 -05:00
Eric Banks dfdf4f989b Enabling Fisher Strand for multi-allelics: use the alt allele with max AC. Added minor optimization to the method in the VC. 2012-02-27 09:50:09 -05:00
Guillermo del Angel 16122bea8d Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-25 13:57:54 -05:00
Guillermo del Angel dea35943d1 a) Bug fix in calling new functions that give indel bases and length from regular pileup in LocusIteratorByState, b) Added unit test to cover these. 2012-02-25 13:57:28 -05:00
Mark DePristo c8a06e53c1 DoC now properly handles reference N bases + misc. additional cleanups
-- DoC now by default ignores bases with reference Ns, so these are not included in the coverage calculations at any stage.
-- Added option --includeRefNSites that will include them in the calculation
-- Added integration tests that ensures the per base tables (and so all subsequent calculations) work with and without reference N bases included
-- Reorganized command line options, tagging advanced options with @Advanced
2012-02-25 11:32:50 -05:00
Mark DePristo 50de1a3eab Fixing bad VCFIntegration tests
-- Left disabled a test that should have been enabled
-- Didn't add the md5 to the test I actually added
-- Now VCFIntegrationTests should be working!
2012-02-25 11:26:36 -05:00
Guillermo del Angel c9a4c74f7a a) Bug fixes for last commit related to PileupElements (unit tests are forthcoming). b) Changes needed to make pool caller work in GENOTYPE_GIVEN_ALLELES mode c) Bug fix (yet again) for UG when GENOTYPE_GIVEN_ALLELES and EMIT_ALL_SITES are on, when there's no coverage at site and when input vcf has genotypes: output vcf would still inherit genotypes from input vcf. Now, we just build vc from scratch instead of initializing from input vc. We just take location and alleles from vc 2012-02-24 10:27:59 -05:00
Mauricio Carneiro ee9a56ad27 Fix subtle bug in the ReduceReads stash reported by Adam
* The tailSet generated every time we flush the reads stash is still being affected by subsequent clears because it is just a pointer to the parent element in the original TreeSet. This is dangerous, and there is a weird  condition where the clear will affects it.
   * Fix by creating a new set, given the tailSet instead of trying to do magic with just the pointer.
2012-02-23 18:35:25 -05:00
Mark DePristo e0c189909f Added support for breakpoint alleles
-- See https://getsatisfaction.com/gsa/topics/support_vcf_4_1_structural_variation_breakend_alleles?utm_content=topic_link&utm_medium=email&utm_source=new_topic
-- Added integrationtest to ensure that we can parse and write out breakpoint example
2012-02-23 12:14:48 -05:00
Guillermo del Angel 6866a41914 Added functionality in pileups to not only determine whether there's an insertion or deletion following the current position, but to also get the indel length and involved bases - definitely needed for extended event removal, and needed for pool caller indel functionality. 2012-02-23 09:45:47 -05:00
Eric Banks d34f07dba0 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-22 20:41:03 -05:00
Ryan Poplin 2b6c0939ab Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-22 19:00:38 -05:00
Ryan Poplin 8695738400 Bug fix in HaplotypeCaller's GENOTYPE_GIVEN_ALLELES mode for insertions greater than length 1. The allele being genotyped was off by one base pair. 2012-02-22 19:00:04 -05:00
Christopher Hartl 2c1b14d35e Mostly small changes to my own scala scripts: .vcf.gz compatibility for output files, smarter beagle generation, simple script to scatter-gather combine variants. Whole genome indel calling now uses the gold standard indel set. 2012-02-22 17:20:04 -05:00
Mauricio Carneiro 75783af6fc int <-> BitSet conversion utils for MathUtils
* added unit tests.
2012-02-21 14:10:36 -05:00
Guillermo del Angel 0f5674b95e Redid fix for corner case when forming consensus with reads that start/end with insertions and that don't agree with each other in inserted bases: since I can't iterate over the elements of a HashMap because keys might change during iteration, and since I can't use ConcurrentHashMaps, the code now copies structure of (bases, number of times seen) into ArrayList, which can be addressed by element index in order to iterate on it. 2012-02-20 09:12:51 -05:00
Ryan Poplin 3d9eee4942 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-18 10:55:29 -05:00
Ryan Poplin a8be96f63d This caching in the BQSR seems to be too slow now that there are so many keys 2012-02-18 10:54:39 -05:00
Ryan Poplin 78718b8d6a Adding Genotype Given Alleles mode to the HaplotypeCaller. It constructs the possible haplotypes via assembly and then injects the desired allele to be genotyped. 2012-02-18 10:31:26 -05:00
Guillermo del Angel e724c63f2b Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on 2012-02-17 17:18:43 -05:00
Guillermo del Angel f2ef8d1d23 Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on 2012-02-17 17:15:53 -05:00
Guillermo del Angel 3e031a540f Solve merge conflict 2012-02-17 10:56:03 -05:00
Guillermo del Angel cd352f502d Corner case bug fix: if a read starts with an insertion, when computing the consensus allele for calling the insertion was only added to the last element in the consensus key hash map. Now, an insertion that partially overlaps with several candidate alleles will have their respective count increased for all of them 2012-02-17 10:21:37 -05:00
Eric Banks 2f33c57060 No reason to restrict HaplotypeScore to bi-allelic SNPs when the plumbing for multi-allelic events is already present. 2012-02-16 13:58:00 -05:00
Guillermo del Angel 2f08846d82 Merged bug fix from Stable into Unstable 2012-02-14 21:26:25 -05:00
Guillermo del Angel 7dc6f73399 Bug fix for validation site selector: records with AC=0 in them were always being thrown out if input vcf was sites-only, even when -ignorePolymorphicStatus flag was set 2012-02-14 21:11:24 -05:00
Ryan Poplin 30085781cf Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-14 14:01:20 -05:00
Ryan Poplin ae5b42c884 Put base insertion and base deletions in the SAMRecord as a string of quality scores instead of an array of bytes. Start of a proper genotype given alleles mode in HaplotypeCaller 2012-02-14 14:01:04 -05:00
David Roazen 85d31f80a2 Merged bug fix from Stable into Unstable 2012-02-13 16:37:11 -05:00
David Roazen 03e5184741 Fix serious engine bug that could cause reads to be dropped under certain circumstances
When aggregating raw BAM file spans into shards, the IntervalSharder tries to combine
file spans when it can. Unfortunately, the method that combines two BAM file
spans was seriously flawed, and would produce a truncated union if the file spans
overlapped in certain ways. This could cause entire regions of the BAM file containing
reads within the requested intervals to be dropped.

Modified GATKBAMFileSpan.union() to correct this problem, and added unit tests
to verify that the correct union is produced regardless of how the file spans
happen to overlap.

Thanks to Khalid, who did at least as much work on this bug as I did.
2012-02-13 16:25:21 -05:00
Eric Banks ad90af94ed Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-13 15:10:10 -05:00
Eric Banks 0920a1921e Minor fixes to splitting multi-allelic records (as regards printing indel alleles correctly); minor code refactoring; adding integration tests to cover +/- splitting multi-allelics. 2012-02-13 15:09:53 -05:00
Eric Banks 14981bed10 Cleaning up VariantsToTable: added docs for supported fields; removed one-off hidden arguments for multi-allelics; default behavior is now to include multi-allelics in one record; added option to split multi-allelics into separate records. 2012-02-13 14:32:03 -05:00
Ryan Poplin e9338e2c20 Context covariate needs to look in the reverse direction for negative stranded reads. 2012-02-13 13:40:41 -05:00
Ryan Poplin 41ffd08d53 On the fly base quality score recalibration now happens up front in a SAMIterator on input instead of in a lazy-loading fashion if the BQSR table is provided as an engine argument. On the fly recalibration is now completely hooked up and live. 2012-02-13 12:35:09 -05:00
Ryan Poplin 3caa1b83bb Updating HC integration tests 2012-02-11 11:48:32 -05:00
Ryan Poplin 9b8fd4c2ff Updating the half of the code that makes use of the recalibration information to work with the new refactoring of the bqsr. Reverting the covariate interface change in the original bqsr because the error model enum was moved to a different class and didn't make sense any more. 2012-02-11 10:57:20 -05:00
Eric Banks f52f1f659f Multiallelic implementation of the TDT should be a pairwise list of values as per Mark Daly. Integration tests change because the count in the header is now A instead of 1. 2012-02-10 14:15:59 -05:00
Mauricio Carneiro 1fb19a0f98 Moving the covariates and shared functionality to public
so Ryan can work on the recalibration on the fly without breaking the build. Supposedly all the secret sauce is in the BQSR walker, which sits in private.
2012-02-10 11:44:01 -05:00
Eric Banks 5e18020a5f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-10 11:08:33 -05:00
Eric Banks f53cd3de1b Based on Ryan's suggestion, there's a new contract for genotyping multiple alleles. Now the requester submits alleles in any arbitrary order - rankings aren't needed. If the Exact model decides that it needs to subset the alleles because too many were requested, it does so based on PL mass (in other words, I moved this code from the SNPGenotypeLikelihoodsCalculationModel to the Exact model). Now subsetting alleles is consistent. 2012-02-10 11:07:32 -05:00
Mauricio Carneiro 5af373a3a1 BQSR with indels integrated!
* added support to base before deletion in the pileup
   * refactored covariates to operate on mismatches, insertions and deletions at the same time
   * all code is in private so original BQSR is still working as usual in public
   * outputs a molten CSV with mismatches, insertions and deletions, time to play!
   * barely tested, passes my very simple tests... haven't tested edge cases.
2012-02-09 18:46:45 -05:00
Eric Banks 7a937dd1eb Several bug fixes to new genotyping strategy. Update integration tests for multi-allelic indels accordingly. 2012-02-09 16:14:22 -05:00
Eric Banks 0f728a0604 The Exact model now subsets the VC to the first N alleles when the VC contains more than the maximum number of alleles (instead of throwing it out completely as it did previously). [Perhaps the culling should be done by the UG engine? But theoretically the Exact model can be called outside of the UG and we'd still want the context subsetted.] 2012-02-09 14:02:34 -05:00
Matt Hanna aa097a83d5 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-09 11:26:48 -05:00
Matt Hanna b57d4250bf Documentation request by Eric. At each stage of the GATK where filtering occurs, added documentation suggesting the goal of the filtering along with examples of suggested inputs and outputs. 2012-02-09 11:24:52 -05:00
Mauricio Carneiro d561914d4f Revert "First implementation of GATKReportGatherer"
premature push from my part. Roger is still working on the new format and we need to update the other tools to operate correctly with the new GATKReport.

This reverts commit aea0de314220810c2666055dc75f04f9010436ad.
2012-02-08 23:28:55 -05:00
Eric Banks 2f800b078c Changes to default behavior of UG: multi-allelic mode is always on; max number of alternate alleles to genotype is 3; alleles in the SNP model are ranked by their likelihood sum (Guillermo will do this for indels); SB is computed again. 2012-02-08 15:27:16 -05:00
Matt Hanna 51ac87b28c Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-08 08:43:55 -05:00
Matt Hanna 5b58fe741a Retiring Picard customizations for async I/O and cleaning up parts of the code to use common Picard utilities I recently discovered.
Also embedded bug fix for issues reading sparse shards and did some cleanup based on comments during BAM reading code transition meetings.
2012-02-08 08:34:37 -05:00
Mauricio Carneiro 337819e791 disabling the test while we fix it 2012-02-07 19:22:32 -05:00
Roger Zurawicki c0c676590b First implementation of GATKReportGatherer
- Added the GATKReportGatherer
- Added private methods in GATKReport to combine Tables and Reports
- It is very conservative and it will only gather if the table columns, match.
- At the column level it uses the (redundant) row ids to add new rows. It will throw an exception if it is overwriting data.
Added the gatherer functions to CoverageByRG

Also added the scatterCount parameter in the Interval Coverage script
Made some more GATKReport methods public

The UnitTest included shows that the merging methods work
Added a getter for the PrimaryKeyName
Fixed bugs that prevented the gatherer form working

Working GATKReportGatherer
Has only the functional to addLines
The input file parser assumes that the first column is the primary key

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-02-07 18:14:47 -05:00
Mauricio Carneiro e89887cd8e laying groundwork to have insertions and deletions going through the system. 2012-02-07 18:11:53 -05:00
Mauricio Carneiro 0d3ea0401c BQSR Parameter cleanup
* get rid of 320C argument that nobody uses.
   * get rid of DEFAULT_READ_GROUP parameter and functionality (later to become an engine argument).
2012-02-07 14:42:11 -05:00
Eric Banks 717cd4b912 Document -L unmapped 2012-02-07 13:30:54 -05:00
Eric Banks 718da7757e Fixes to ValidateVariants as per GS post: ref base of mixed alleles were sometimes wrong, error print out of bad ACs was throwing a RuntimeException, don't validate ACs if there are no genotypes. 2012-02-07 13:15:58 -05:00
Eric Banks 9d1a19bbaa Multi-allelic indels were not being printed out correctly in VariantsToTable; fixed. 2012-02-06 22:49:29 -05:00
Mauricio Carneiro 5961868a7f fixup for BQSR (HC integration tests)
In the new BQSR implementation, covariates do depend on the RecalibrationArgumentCollection.
2012-02-06 22:47:27 -05:00
Mauricio Carneiro 6e6f0f10e1 BaseQualityScoreRecalibration walker (bqsr v2) first commit includes
* Adding the context covariate standard in both modes (including old CountCovariates) with parameters
   * Updating all covariates and modules to use GATKSAMRecord throughout the code.
   * BQSR now processes indels in the pileup (but doesn't do anything with them yet)
2012-02-06 17:38:29 -05:00
Eric Banks 0717c79901 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-06 16:23:36 -05:00
Eric Banks 91897f5fe7 Transpose rows/cols in AF table to make it molten (so I can plot easily in R) 2012-02-06 16:23:32 -05:00
Guillermo del Angel fb5786385c Merged bug fix from Stable into Unstable 2012-02-06 13:22:56 -05:00
Guillermo del Angel 6ec686b877 Complement to previous commit: make sure we also don't inherit filter from input VCF when genotyping at an empty site 2012-02-06 13:19:26 -05:00
Guillermo del Angel 93ffca1e3a Merged bug fix from Stable into Unstable 2012-02-06 11:58:58 -05:00
Guillermo del Angel 827be878b4 Bug fix when running UG in GenotypeGivenAlleles mode: if an input site to genotype had no coverage, the output VCF had AC,AF and AN inherited from input VCF, which could have nothing to do with given BAM so numbers could be non-sensical. Now new vc has clear attributes instead of attributes inherited from input VCF. 2012-02-06 11:58:13 -05:00
Eric Banks fbbd04621d Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-06 11:53:31 -05:00
Eric Banks edb4edc08f Commented out unused metrics for now 2012-02-06 11:53:15 -05:00
Ryan Poplin 096c23a473 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-06 11:10:38 -05:00
Ryan Poplin dc05b71e39 Updating Covariate interface with Mauricio to include an errorModel parameter. On the fly recalibration of base insertion and base deletion quals is live for the HaplotypeCaller 2012-02-06 11:10:24 -05:00
Guillermo del Angel 1e11408f8b Merged bug fix from Stable into Unstable 2012-02-06 10:34:26 -05:00
Guillermo del Angel 090d87b48b Bug fix in ValidationSiteSelector: when input vcf had genotypes and was multiallelic, the parsing of the AF/AC fields was wrong. Better logic to unify parsing of field 2012-02-06 10:33:12 -05:00
Eric Banks 9d94f310f1 Break AF histogram into max and min AFs 2012-02-06 09:01:19 -05:00
Ryan Poplin b7ffd144e8 Cleaning up the covariate classes and removing unused code from the bqsr optimizations in 2009. 2012-02-06 08:54:42 -05:00
Eric Banks cef550903e Minor optimization 2012-02-06 00:48:00 -05:00
Ryan Poplin 5343f8ba67 Initial version of on-the-fly, lazy loading base quality score recalibration. It isn't completely hooked up yet but I'm committing so Mauricio and Mark can see how I envision it will fit together. Look it over and give any feedback. With the exception of the Solid specific code we are very very close to being able to remove TableRecalibrationWalker from the code base and just replace it with PrintReads -BQSR recal.csv 2012-02-05 13:09:03 -05:00
Ryan Poplin f94d547e97 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-03 17:14:20 -05:00
Ryan Poplin 894d3340be Active Region Traversal should use GATKSAMRecords everywhere instead of SAMRecords. misc cleanup. 2012-02-03 17:13:52 -05:00
Mauricio Carneiro 4a57add6d0 First implementation of DiagnoseTargets
* calculates and interprets the coverage of a given interval track
   * allows to expand intervals by specified number of bases
   * classifies targets as CALLABLE, LOW_COVERAGE, EXCESSIVE_COVERAGE and POOR_QUALITY.
   * outputs text file for now (testing purposes only), soon to be VCF.
   * filters are overly aggressive for now.
2012-02-03 17:12:43 -05:00
Mauricio Carneiro 3dd6a1f962 Adding some generic sum and average functions to MathUtils 2012-02-03 17:12:43 -05:00
Mauricio Carneiro e1d69e4060 make the size of a GenomeLoc int instead of long
it will never be bigger than an int and it's actually useful to be an int so we can use it as parameters to array/list/hash size creation.
2012-02-03 17:12:42 -05:00
Ryan Poplin 0e44430e47 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-03 13:45:11 -05:00
Christopher Hartl aa3638ecb3 Merge branch 'master' of ssh://chartl@ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-03 13:42:09 -05:00
Eric Banks 3abfbcbcf2 Generalized the TDT for multi-allelic events 2012-02-03 12:23:21 -05:00
Ryan Poplin 601e53d633 Fix when specifying preset active regions with -AR argument 2012-02-02 16:34:26 -05:00
Christopher Hartl 0111505ea9 Terrible. Swapping the paternal and sample ids. 2012-02-02 11:41:16 -05:00
Ryan Poplin 1f50f6970b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-02 10:17:13 -05:00
Ryan Poplin 4ed06801a7 Updating HaplotypeCaller's HMM calc to use GOP as a function of the read instead of a function of the haplotype in preparation for IQSR 2012-02-02 10:17:04 -05:00
Matt Hanna 8adfc79123 Merged bug fix from Stable into Unstable 2012-02-01 16:07:41 -05:00
Matt Hanna 30b937d2af Fix bug discovered in FGTP branch in which BlockInputStream returns -1 in cases where some data could be read,
but not all the data requested by the caller.
2012-02-01 16:06:22 -05:00
Mauricio Carneiro 45da892ecc Better exceptions to catch malformed reads
* throw exceptions in LocusIteratorByState when hitting reads starting or ending with deletions
2012-02-01 11:56:19 -05:00
Christopher Hartl 810996cfca Introducing: VariantsToPed, the world's most annoying walker! And also a busted QScript to run it that I need Khalid's help debugging ( frownie face ). Note that VariantsToPed and PlinkSeq generate the same binary file (up to strand flips...thanks PlinkSeq), so I know it's working properly. Hooray! 2012-02-01 10:39:03 -05:00
Christopher Hartl 25d943f706 Merge branch 'master' of ssh://chartl@ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-01 10:32:11 -05:00
Ryan Poplin 056b24ccd6 Resolving merge conflicts with LocusIteratorByState 2012-01-31 16:13:32 -05:00
Ryan Poplin febc634557 Changing PileupElement's isSoftClipped to isNextToSoftClip since soft clipped bases aren't actually added to pileups, oops. Removing the intrinsic clustered variants filter from the HaplotypeCaller 2012-01-31 16:06:14 -05:00
Matt Hanna 7f70612beb Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-31 11:59:25 -05:00
Matt Hanna a630db1703 Oops...HierarchicalMicroScheduler was transforming any exception from the walker level into a ReviewedStingException.
Thanks to Ryan for pointing this out.
2012-01-31 11:58:21 -05:00
Christopher Hartl faba3dd530 Merge branch 'master' of ssh://chartl@ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-31 10:25:29 -05:00
Mauricio Carneiro 17dbe9a95d A few cleanups in the LocusIteratorByState
* No more N's in the extended event pileups
   * Only add to the pileup MQ0 counter if the read actually goes into the pileup
2012-01-31 09:40:51 -05:00
Ryan Poplin f9162ea705 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-30 19:45:19 -05:00
Ryan Poplin abb91cf26b Increasing the size of the active regions that are produced by the active probability integrator, more context is needed to call more complex events 2012-01-30 15:36:12 -05:00
Mauricio Carneiro d5d4fa8a88 Fixed discordance bug reported by Brad Chapman
discordance now reports discordance between genotypes as well (just like concordance)
2012-01-30 09:50:45 -05:00
Mark DePristo 3164c8dee5 S3 upload now directly creates the XML report in memory and puts that in S3
-- This is a partial fix for the problem with uploading S3 logs reported by Mauricio.  There the problem is that the java.io.tmpdir is not accessible (network just hangs).  Because of that the s3 upload fails because the underlying system uses tmpdir for caching, etc.  As far as I can tell there's no way around this bug -- you cannot overload the java.io.tmpdir programmatically and even if I could what value would we use?  The only solution seems to me is to detect that tmpdir is hanging (how?!) and fail with a meaningful error.
2012-01-29 15:14:58 -05:00
Menachem Fromer 0e17cbbce9 Merged bug fix from Stable into Unstable 2012-01-27 16:03:16 -05:00
Menachem Fromer a9671b73ca Fix to permit proper handling of mapping qualities between 128 to 255 (which get converted to byte values of -128 to -1) 2012-01-27 16:01:30 -05:00
Ryan Poplin f7ac1f4a69 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-27 15:12:55 -05:00