Commit Graph

1000 Commits (c50274e02e8bd381a2f7995a77b22f0ac8e0b00b)

Author SHA1 Message Date
David Roazen cdde32acbd Merged bug fix from Stable into Unstable 2011-10-31 14:21:15 -04:00
Eric Banks f62af0291b Check for invalid VCF records (not enough tokens) instead of assuming they are there. 2011-10-31 14:09:51 -04:00
Andrey Sivachenko bed0acaed4 nWayOut now adds PG tag to the header as it should. Also, additional hidden option added: keepPGTags. If invoked, IndelRealigner PG tags from previous runs (if any) are kept in the header and the new PG tag is simply added, instead of overriding them 2011-10-31 12:28:28 -04:00
Mauricio Carneiro 389380a590 ReduceReads ref bases are now output as '=' to save space
Restructured the sliding window framework to manipulate a wrapped version of the SAMRecord that contains information about the reference.
2011-10-30 12:04:39 -04:00
Eric Banks 0ca7428e76 Allow processing of empty intervals, but warn user when this case is encountered. 2011-10-28 12:12:14 -04:00
Eric Banks 649dfe98f0 Add VCF header for any expressions that are requested 2011-10-28 10:22:19 -04:00
Eric Banks 057a79f598 This argument should be annotated as @Input 2011-10-28 09:44:49 -04:00
Eric Banks 4ba7c0cecd Moving to private 2011-10-28 09:29:28 -04:00
Eric Banks 1bdd76c2f2 These tools now use the IntervalBinding system to handle intervals instead of doing it all manually 2011-10-28 09:28:12 -04:00
Eric Banks 6ba08a103d Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory. 2011-10-28 09:23:25 -04:00
Eric Banks 3d04bb5608 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-27 23:55:18 -04:00
Eric Banks 19e27d4568 Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative. 2011-10-27 23:55:11 -04:00
Eric Banks cafc245a43 For some reason, a class of Codecs (including TableCodec) require that a GenomeLocParser be passed in to do the position processing. Why can't they just return a Feature with chr, start, stop? Isn't that the right thing? 2011-10-27 23:54:28 -04:00
Guillermo del Angel cbc43683ee Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-27 20:54:18 -04:00
Guillermo del Angel 8907e42007 First fully functional implementation of ValidationSiteSelectorWalker. User gives a) a set of input variants, b) a desired number of output variants, b) Optionally, a set of samples which will restrict sites to be polymorphic in those samples, c) a frequency selection mode: either uniform (no AF matching), or matching AF so that output sites mirror the input AF spectrum as closely as possible.
More testing is needed and docs need improving but so far all functionality seems up and running
2011-10-27 20:53:48 -04:00
Eric Banks ccfd853b34 Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed. 2011-10-27 20:43:50 -04:00
Eric Banks c2f343773e Oops, working too quickly last time. This is the proper fix for the potential NPE in the equals() test. 2011-10-27 15:32:08 -04:00
Khalid Shakir b80d407dc7 No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Eric Banks 8c4dbce6d8 Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing. 2011-10-27 13:58:19 -04:00
Eric Banks 4a7e6fee3f Remove support for BED file interval parsing in the GATK; it should all go through Tribble now. IndelRealigner no longer supports unordered interval input (which shouldn't have been used anyways). Temporarily commenting out serialization of arguments so that tests pass; this whole piece will be deleted soon anyways. 2011-10-27 13:38:08 -04:00
Matt Hanna f7df8bdecc Merged bug fix from Stable into Unstable 2011-10-27 11:31:17 -04:00
Matt Hanna 41ddc7bce7 Make sure we output a full stack trace when we encounter Tribble error messages on VCF header merge. 2011-10-27 11:30:04 -04:00
Eric Banks 44f905b5e5 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-26 23:31:11 -04:00
Eric Banks 68283b1651 Fixing docs and adding GATKdocs for the new interval functionality 2011-10-26 22:14:43 -04:00
Mark DePristo c9978316a3 Merge branch 'FragmentUtils' 2011-10-26 19:51:49 -04:00
Mauricio Carneiro add9ad97ec No scatter gather for VQSR or ApplyVQSR.
These walkers should not be scatter gatherable. Annotating them accordingly so that Queue doesn't allow a less than knowledgeable user to try and scatter/gather VQSR.
2011-10-26 16:35:44 -04:00
Ryan Poplin 74aeb22eeb Merged bug fix from Stable into Unstable 2011-10-26 15:57:30 -04:00
Ryan Poplin 86871bd1e3 Throw a UserException in the BQSR when there is no data instead of creating an empty csv file 2011-10-26 15:56:41 -04:00
Mark DePristo 034a997d07 Generalized Reads -> Fragment calculation
-- Supports ReadBackedPileup -> FragmentCollection as before
-- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller
-- General cleanup, renaming, move to separate package, more extensive unit tests, etc.
-- Added toFragment() function to ReadBackedPileup interface
2011-10-26 15:54:38 -04:00
Eric Banks 2f21b6ecfb Removed debugging output 2011-10-26 15:50:20 -04:00
Eric Banks b39fcb1bea Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-26 15:44:25 -04:00
Eric Banks b6ce6ed3f8 Go around the ROD system for now so that we can just call decodeLoc() for efficiency. Noted that we should go through the ROD system once it gets cleaned up. This means that currently gzipped files are not supported with -L. 2011-10-26 15:42:53 -04:00
Eric Banks 9424e8b2ca Initial working version of new interval system in which the argument for -L (and -XL) is allowed to be a rod file (e.g. VCF). Old samtools-style intervals still behave as before. BTI is no longer supported. The merging (union or intersection) of intervals is now consistently applied to all -L (or -XL) intervals, which is nice. More testing needed. 2011-10-26 14:11:49 -04:00
Mark DePristo 7fa943aef1 Renamed FragmentPileup to FragmentUtils 2011-10-26 14:01:45 -04:00
Laurent Francioli 1f044faedd - Genotype assignment in case of equally likeli combination is now random
- Genotype combinations with 0 confidence are now left unphased
2011-10-26 19:57:09 +02:00
Laurent Francioli 81b163ff4d Indentation 2011-10-26 14:49:12 +02:00
Laurent Francioli 62cff266d4 GQ calculation corrected for most likely genotype 2011-10-26 14:40:04 +02:00
Mark DePristo af3613cc5f GATKSAMRecord commit branch summary
First, I'm sure there's a better way to do this, but I wanted to create a single commit summarizing the changes from my branch SamRecordFactory.  What's the best way to do this?  Rebase?

Now, on to the changes here:

-- Picard added a SamRecordFactory that is used to create instances the subclass SamRecord or BAMRecord.  This factory allows us to have low-level picard readers (SamFileReader) create objects of type GATKSamRecord.  The abomination of the extends and contains GATKSamRecord is now gone.  GATKSamRecords are now produced by this factory, the GATK provides this factory to our SamFileReaders, and everything works with GATKSamRecord just extending BAMRecord.  This results in up to a 2x performance improvement in writing BAMs and a ~10% improvement when reading BAMs files.

-- As a consequence of this, we no longer officially support SAM records.  Attempting to create SAMRecord objects with the factory will throw a user exception.

-- Created a standard NGSPlatform enum, and GATKSamRecords support efficiently obtaining this value.  The real BQSR (not the copy indel version) got the efficient code to use this.  Please add all future platforms to this enum.

-- GATKSamRecord no longer supports using the OQ or defaultBaseQuality.  This is performed in a wrapper iterator that's only added when these command line options are used.

-- ReducedRead code has been moved from ReadUtils until efficiency caching assessors in GATKSamRecord.

-- ArtificialSamUtils creates GATKSamRecords now, just SAMRecords.  Added code here to create artifical pairs and using that code to create artificial ReadBackedPileups with specific properties

-- New smarter algorithm for FragmentPileup.  This new code is up to 3x faster than the previous version, and is lazy so is more efficient when no overlapping pairs are actually in the pileup.  Created extensive DataProvider driven UnitTest.  Added Caliper-based benchmarking system to characterize the performance differences between the old and new algorithms.  TODO still remains to make a efficient version that works for non-pileups for the HaplotypeCaller
2011-10-25 20:52:56 -04:00
Mark DePristo 2822f0dc27 Merge branch 'SamRecordFactory' 2011-10-25 20:34:47 -04:00
Mark DePristo 1b722c21cf merge master 2011-10-25 16:08:39 -04:00
Ryan Poplin 56fdf0b865 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-25 15:58:56 -04:00
Ryan Poplin 4a34c1862e misc cleanup. We now filter out haplotypes when it is obvious that the assembly has failed to find a parsimonious event rather than use haplotypes with large numbers of SNPs and small indels on them. 2011-10-25 15:22:28 -04:00
Guillermo del Angel b559936b7a a)New variant eval stratification module for indel size. b) Next iteration on indel caller runtime optimization: when computing likelihood of each haplotype for a given read, many computations will be redundant since pieces of haplotypes will be common to both REF and ALT haplotypes. So, we keep HMM matrices from one haplotype to the next one and recompute starting at the part where either haplotype is different or GOP/GCP are different. 2011-10-25 09:56:43 -04:00
Khalid Shakir fac9932938 Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Khalid Shakir 89a581a66f Added ability to specify arguments in files via -args/--arg_file
Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()
2011-10-24 15:58:34 -04:00
Mark DePristo 502592671d Cleanup FragmentPileup before main repo commit
-- removed intermiate functions.  Now only original version and best optimized new version remain
-- Moved general artificial read backed pileup creation code into ArtificialSamUtils
2011-10-24 14:40:05 -04:00
Mark DePristo 166174a551 Google caliper example execution script
-- FragmentPileup with final performance testing
2011-10-24 14:04:53 -04:00
Laurent Francioli 62477a0810 Added documentation and comments 2011-10-24 13:45:21 +02:00
Laurent Francioli 38ebf3141a - Now supports parent/child pairs
- Sites with missing genotypes in pairs/trios are handled as follows:
-- Missing child -> Homozygous parents are phased, no transmission probability is emitted
-- Two individuals missing -> Phase if homozygous, no transmission probability is emitted
-- One parent missing -> Phased / transmission probability emitted
- Mutation prior set as argument
2011-10-24 12:30:04 +02:00
Laurent Francioli 7312e35c71 Now makes use of standard Allele and Genotype classes. This allowed quite some code cleaning. 2011-10-24 10:25:53 +02:00
Laurent Francioli 01b16abc8d Genotype quality calculation modified to handle all genotypes the same way. This is inconsistent with GQ output by the UG but is correct even for cases of poor quality genotypes. 2011-10-24 10:24:41 +02:00
Mark DePristo f6ccac889b Merged bug fix from Stable into Unstable 2011-10-23 16:37:12 -04:00
Mark DePristo 585a45b7a3 Bug fix for ClipReadsWalker when stats output isn't provided
-- See http://getsatisfaction.com/gsa/topics/clipreadswalker?utm_content=topic_link&utm_medium=email&utm_source=reply_notification
2011-10-23 16:36:48 -04:00
Ryan Poplin f5d910b8a5 Haplotype caller now sends genotype likelihoods to the exact model to genotype the events found in the best haplotypes. 2011-10-23 13:29:08 -04:00
Mark DePristo 42bf9adede Initial version of "fast" FragmentPileup code
-- Uses mayOverlapRoutine in ReadUtils
-- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations
-- PileupElement now comparable (sorts on offset than on start)
-- Caliper microbenchmark to assess performance
2011-10-22 21:36:37 -04:00
Mauricio Carneiro 4913f8a60f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-21 17:45:07 -04:00
Mauricio Carneiro 102dafdcbc Validation of GATKSamRecord in read filters
Moved the validation of the GATKSamRecord to the MalformedReadFilter with the intent to make the read filter the ultimate validation location for sam records. This way we can opt to filter out malformed reads if we know what we are doing or blow up otherwise.
2011-10-21 17:40:43 -04:00
Guillermo del Angel f4b409fa0d CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result 2011-10-21 14:07:20 -04:00
Mark DePristo b863390cb1 Moving reduced read functionality into GATKSAMRecord
-- More functions take / produce GATKSAMRecords instead of SAMRecord
2011-10-21 13:28:05 -04:00
Mark DePristo 2403e96062 Renamed GATKSamRecord -> GATKSAMRecord for consistency. Better docs. 2011-10-21 09:59:24 -04:00
Mark DePristo 110e13bc1e Merge branch 'master' into SamRecordFactory 2011-10-21 09:43:52 -04:00
Mark DePristo be797a8a1f Recalibrator now uses the much more efficient NGSPlatform in the cycle covariates system 2011-10-21 09:39:21 -04:00
Mark DePristo ed74ebcfa1 GATKSamRecords with efficiency NGSPlatform method 2011-10-21 09:38:41 -04:00
Mark DePristo 94e1898d8f A canonical set of NGS platforms as enums with convenient manipulation methods 2011-10-21 09:37:45 -04:00
Laurent Francioli edea90786a Genotype quality is now recalculated for each of the phased Genotypes. Small problem is that we unnecessarily loose a little precision on the genotypes that do not change after assignment. 2011-10-20 17:04:19 +02:00
Laurent Francioli 1c61a57329 Original rewrite of PhaseByTransmission:
- Adapted to get the trio information from the SampleDB (i.e. from Pedigree file (ped)) => Multiple trios can be passed as argument
- Mendelian violations and trio phasing possibilities are pre-calculated and stored in Maps. => Runtime is ~3x faster
- Genotype combinations possible only given two MVs are now given a squared MV prior (e.g. 0/0+0/0=>1/1 is given 10^-16 prior if the MV prior is 10^-8)
- Corrected bug: In case the best genotype combination is Het/Het/Het, the genotypes are now set appropriately (before original genotypes were left even if they weren't Het/Het/Het)
- Basic reporting added:
-- mvf argument let the user specify a file to report remaining MVs
-- When the walker ends, some basic stats about the genotype reconfiguration and phasing are output

Known problems:
- GQ is not recalculated even if the genotype changes

Possible improvements:
- Phase partially typed trios
- Use standard Allele/Genotype Classes for the storage of the pre-calculated phase
2011-10-20 13:06:44 +02:00
Laurent Francioli ef6a6fdfe4 Added getAsMap -> returns the likelihoods as an EnumMap with Genotypes as keys and likelihoods as values. 2011-10-20 12:49:18 +02:00
Laurent Francioli 76dd816e70 Added getParents() -> returns an arrayList containing the sample's parent(s) if available 2011-10-20 12:47:27 +02:00
Mark DePristo 999a8998ae Constructor for GATKSamRecord with header only, for unit testing 2011-10-19 17:51:48 -04:00
Mark DePristo bba69701b5 Now creates GATKSamRecords now SamRecords 2011-10-19 17:49:17 -04:00
Christopher Hartl cd8a6d62bb You know how the wiki has a big section on commiting local changes to BRANCHES of the repository you clone it from? Yeah. It sucks if you don't do that.
This commit contains:
 - IntronLossGenotyper is brought into its current incarnation
 - A couple of simple new filters (ReadName is super useful for debugging, MateUnmapped is useful for selecting out reads that may have a relevant unaligned mate)
 - RFA now matches my current local repository. It's in flux since I'm transitioning to the new traversal type.
   + the triggering read stash pilot required me to change the scope of some of the variables in the ReadClipping code, private -> protected. Those are all the changes there.
 - MendelianViolation restored to its former glory (and an annotator module that uses the likelihood calculation has been added)
   + use this rather than a hard GQ threshold if you're doing MV analyses.
 - Some miscellaneous QScripts
2011-10-19 17:42:37 -04:00
Mark DePristo 52345f0aec Meaningful documentation string 2011-10-19 15:47:36 -04:00
Mark DePristo 1b38aa1a7e Cleaning up reduced read code accessors 2011-10-19 15:46:44 -04:00
Eric Banks d8d73fe4f2 Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown. 2011-10-19 15:11:13 -04:00
Mark DePristo 7928b287fc GATKSamRecord now produced by SAMFileReaders by default
-- Removed all of the unnecessary caching operations in GATKSAMRecord
-- GATKSAMRecord renamed to GATKSamRecord for consistency
2011-10-19 13:15:27 -04:00
Eric Banks 5a6468c11e Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it. 2011-10-19 11:52:05 -04:00
Eric Banks 48c4a8cb33 Make error messages clearer (even I was confused) 2011-10-19 11:49:16 -04:00
Eric Banks 6cadaa84c9 Just use validate() from super class since it does the same thing 2011-10-19 11:48:23 -04:00
Mark DePristo df3e4e1abd First working code to use SamRecordFactory to produce objects of our own design in SAMFileReader 2011-10-19 11:22:35 -04:00
Mauricio Carneiro c27e2fb676 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-18 15:23:05 -04:00
Mark DePristo f77f2eeb7d Fix for new ID structure 2011-10-18 13:04:43 -04:00
Mark DePristo 1a92ee3593 No longer adds a binding of ID -> . when the ID field is dot in the VCF
-- Really we should make ID a primary key in VariantContext.  Putting it into the attributes is just annoying now
2011-10-18 10:57:02 -04:00
Ryan Poplin e45fcb66eb Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-17 15:56:19 -04:00
Ryan Poplin 1e6794c539 fixing typo in VariantsToTable docs 2011-10-17 15:56:02 -04:00
Mark DePristo 0de8550f17 Merged bug fix from Stable into Unstable 2011-10-17 15:29:53 -04:00
Mark DePristo c1329c4dde Fixing a binary to logical or 2011-10-17 15:29:45 -04:00
Mark DePristo 9e4963efc8 Merged bug fix from Stable into Unstable 2011-10-17 15:27:38 -04:00
Mark DePristo ec911ce5bb Even better error messages 2011-10-17 15:27:22 -04:00
Mark DePristo d065bf1715 Merged bug fix from Stable into Unstable 2011-10-17 15:25:47 -04:00
Mark DePristo a7cf9cdc67 Fixing error message typo 2011-10-17 15:25:35 -04:00
Ryan Poplin 589df6b7cf Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-17 14:35:14 -04:00
Ryan Poplin 6b02354d84 Adding a new getter in VariantsToTable to extract the indel event length. 2011-10-17 14:34:52 -04:00
Mark DePristo 3550798c4c Merged bug fix from Stable into Unstable 2011-10-17 13:58:56 -04:00
Mark DePristo 4108a294f7 Better error message when a RodBinding file doesn't exist 2011-10-17 13:58:46 -04:00
Mark DePristo cc76826f78 Merged bug fix from Stable into Unstable 2011-10-17 13:38:11 -04:00
Mark DePristo fd4540cd32 Fixed extraordinarily subtle race condition with contracts invariant
-- all of the methods in the class must be synchronized or the internal state can be inconsistent with the contract invariant when entering the class in a non-synchronized method, even when that method doesn't care about the object's internal state
2011-10-17 13:37:55 -04:00
Mark DePristo 5a881360df Merged bug fix from Stable into Unstable 2011-10-13 15:54:43 -04:00
Mark DePristo 7cab6f6bb0 Bug fixes for thread unsafe simple timer and bad Ns treatment in AlignmentUtils
-- SimpleTimer is now threadsafe using synchronized method keywords
-- Bug fix for alignmentToByteArray() where the N case was refPos++ not the now correct refPos += elementLength
2011-10-13 15:53:12 -04:00
Mauricio Carneiro e12ffb6547 Updating docs for GCContentByInterval
This walker does not take any BAMs. It only walks over the reference.
2011-10-13 13:27:00 -04:00
Eric Banks 9aecd50473 Adding ability to exclude annotations from the VA and UG lists. As described in the docs, this argument trumps all others (including -all) so that we can get around the SnpEff issue brought up by Menachem. Added integration test for it. 2011-10-12 15:44:54 -04:00
Mauricio Carneiro e53a952aeb Added ION Torrent support to CountCovariates. 2011-10-12 01:57:02 -04:00
Mauricio Carneiro a2733a451f Added NotCalled feature to GAV
Added "not called" and "no status" to the truth table. Very useful.
2011-10-11 19:31:45 -04:00
David Roazen ae83420637 Merged bug fix from Stable into Unstable 2011-10-11 12:26:08 -04:00
David Roazen 794f275871 SnpEff is now marked as a RodRequiringAnnotation instead of an ExperimentalAnnotation.
Having SnpEff grouped with the Experimental annotations was proving problematic, since it
requires a rod. Placing it in its own group should improve the situation somewhat, making it
easier to request "all annotations except for SnpEff".
2011-10-11 12:08:56 -04:00
David Roazen cfd0ac8410 Merged bug fix from Stable into Unstable
Conflicts:
	public/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java
2011-10-11 12:03:51 -04:00
David Roazen 24b72334b3 UnifiedGenotyper now correctly initializes the VariantAnnotator engine.
This allows the annotation classes to perform any necessary initialization/validation.
For example, it allows the SnpEff annotator to (among other things) validate its rod binding.
This will prevent a NullPointerException when SnpEff annotation is requested but no rod binding
is present.

Added an integration test to cover this case so that it doesn't break again.
2011-10-11 12:02:05 -04:00
Guillermo del Angel 0429b38021 Merged bug fix from Stable into Unstable 2011-10-11 11:19:38 -04:00
Guillermo del Angel 1c485d8b5e Forgot that no matter how trivial a change it's a good idea to compile first 2011-10-11 11:18:41 -04:00
Guillermo del Angel 6418f4d69b Merged bug fix from Stable into Unstable 2011-10-11 11:13:18 -04:00
Guillermo del Angel 1975de1b32 Second try: hide --do_indel_quality in AnalyzeCovariates 2011-10-11 11:11:29 -04:00
Guillermo del Angel 6506ea83e8 Revert "Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users"... a hidden passenger change made it through.
This reverts commit 70e10ccb1be90dcff8f4485ae6ee036db2d1ac86.
2011-10-11 11:03:12 -04:00
Guillermo del Angel 4c1d8c8d44 Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users 2011-10-11 11:01:06 -04:00
Eric Banks 77c983c5b5 No one claimed this walker and it doesn't have integration tests or GATKdocs so it doesn't belong in public. 2011-10-10 15:17:54 -04:00
Mark DePristo fb72bcf732 DiffObjects no longer prints out the file name in the status so MD5 are stable 2011-10-10 15:10:57 -04:00
Mark DePristo 46e7370128 this.allele, getAlleles(), and getAltAlleles() now return List not set
-- Changes associated code throughout the codebase
-- Updated necessary (but minimal) UnitTests to reflect new behavior
-- Much better makealleles() function in VC.java that enforces a lot of key constraints in VC
2011-10-09 11:45:55 -07:00
Mark DePristo c67f6c076b simpleMerge now preserves allele order
-- UnitTests for dangerous PL merging cases in the multi-allelic case.  The new behavior is correct
2011-10-08 17:39:53 -07:00
Mark DePristo ec14a4a606 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-07 08:38:50 -07:00
Eric Banks ca9cd9b688 Minor fix for merging intervals which hadn't been necessary when only merging from the left to right. Added integration tests to cover the parallelization of RTC. 2011-10-06 22:38:44 -04:00
Mark DePristo c7864c7256 Filter application order is now deterministic, in the order defined by the walker
-- For no apparent reason we were using a HashSet to store the ReadFilters, so the order of operations was really arbitrarily applied.  The order now is

(1) the order of the walker intrinsic filters
(2) read group black list (if provided)
(3) command line filters (if provided)
2011-10-06 18:51:40 -07:00
Mark DePristo 0b88af4af9 Counts of records failing filters are displayed sorted
-- Stops random ordering of the output, as the counts are returned sorted by string name of the class
-- Deleted now unused sh*tty assessors in Utils
2011-10-06 18:42:26 -07:00
Mark DePristo d1e70d6ec2 Removed Nx counting of reads in metrics with -nt > 1 2011-10-06 18:29:26 -07:00
Eric Banks c61804a450 Rename the long version of the argument name to more accurately reflect its purpose. 2011-10-06 16:14:04 -04:00
Eric Banks 61a3dfae24 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-06 15:58:04 -04:00
Eric Banks 6eb87bf58a RTC now caches all intervals as GenomeLocs (which is expected to take < 1Gb whole genome based on back of the envelope calculations with Matt) so that 1) we don't have to worry about emitting outside of the leaves in the hierarchical reductions and 2) we can emit the intervals in sorted order which is a big performance plus for the realigner. Integration tests change only because intervals whose start=stop are now printed as chr:start instead of chr:start-stop. 2011-10-06 15:57:49 -04:00
Eric Banks 1b0735f0a3 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-06 13:41:45 -04:00
Eric Banks c4dfc1fb8b Temporary commit of parallelization support for RealignerTargetCreator. Tim begged us for this and I got assurances from Khalid/Matt that this would also be extremely helpful for the whole genome calling pipeline, so I spent a while working on this. Needs to be fixed up though because apparently only the leaves in the hierarchical reduce get their output aggregated. Worked out a better solution with Matt. 2011-10-06 13:41:36 -04:00
Mark DePristo 73f9d1f217 GATK read group requirement iron hand
-- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined
-- LIBS again throws an error if the complete list of samples isn't provided
-- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data.
-- Convenience constructors for UserExceptions.MalformedBAM
2011-10-06 08:40:35 -07:00
Mark DePristo 23845ac798 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-06 08:17:08 -07:00
Mark DePristo daa5999489 Fixed typo in argument description 2011-10-06 08:16:25 -07:00
Guillermo del Angel 8a474e38ff Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-06 10:08:39 -04:00
Guillermo del Angel 93f7e632bd Minor fix/enhancement for VariantEval: if a vcf has symbolic alleles, program would crash ungracefully - now we'll just skip record without processing. This is a big issue since we can't process 1000G integration files with code as is. 2011-10-06 10:07:46 -04:00
Mark DePristo 190be4d0d1 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-05 21:27:11 -07:00
Mark DePristo 8e6845806a Allowing empty samples list in LIBS
-- Right now we cannot process BAM files without read groups because we enforce the samples list to not be empty when there's a SAM record.  Now if there are reads and there are no samples we add the "null" sample so that LIBS walks the reads properly
2011-10-05 21:26:21 -07:00
Matt Hanna 180c8f286f Merged bug fix from Stable into Unstable 2011-10-05 20:37:43 -04:00
Matt Hanna 55b9f06527 Ensure that IndelRealigner n-way out option supports MD5 generation. 2011-10-05 20:36:28 -04:00
Mark DePristo be2d29ce69 Final PED documentation 2011-10-05 15:17:41 -07:00
Mark DePristo 3226d5dc0d Merge branch 'master' into ped 2011-10-05 15:03:09 -07:00
Mark DePristo 6a573437af Details documentation arguments for -ped 2011-10-05 15:00:58 -07:00
Mark DePristo e7c80f7c45 Renaming quantitative trait to OtherPhenotype which is now a String not a double
-- we can now use PED file to represent population data or other arbitrary phenotype data, not just doubles
2011-10-05 12:26:33 -07:00
Mark DePristo 51ecc20867 getFamily() and associated methods implemented and tested
-- Sample no longer serializable
-- Sample now implements Comparable
2011-10-05 09:55:05 -07:00
Mark DePristo a45d985818 TODO method stubs 2011-10-04 15:54:09 -07:00
Mark DePristo fee89e47ff Only throws an error when there are no samples but there are reads
-- Handles the case when you are running a ROD traversal and yet the LIBS is still used to return null everywhere.
2011-10-04 06:50:54 -07:00
Mark DePristo f552aede42 Only provide the sample names in the BAM file for efficiency 2011-10-04 06:50:12 -07:00
Mark DePristo a27641e1fc Cleaned up imports 2011-10-04 06:28:36 -07:00
Mark DePristo b20689ff55 No longer supports extraProperties
-- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem
-- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record.  If the two records are inconsistent, an error is thrown
-- addSample() in Sample.class now invokes mergeSample() when appropriate
-- Validation types are now only STRICT or SILENT
-- Validation code implemented in SampleDBBuilder
-- Extensive unit tests for SampleDBBuilder
2011-10-03 19:20:33 -07:00
Mauricio Carneiro 3837aa45b4 Fixing conflicts
Conflicts:
	public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java
2011-10-03 19:07:59 -07:00
Mark DePristo 2e3dc52088 Minor function renaming 2011-10-03 14:41:13 -07:00
Mark DePristo dd71884b0c On path to SampleDB engine integration
-- PedReader tag parser
-- Separation of SampleDBBuilder from SampleDB (now immutable)
-- Removed old sample engine arguments
2011-10-03 12:08:07 -07:00
Eric Banks c3eff7451a Found a small inefficiency while profiling: we were still using String.split instead of ParsingUtils.split to break up array values in the INFO field. There was a noticeable (albeit not big) difference in the change when reading sites only files. 2011-10-03 14:20:39 -04:00
Mark DePristo 8ee0f91904 Remove residual processing tracker arguments 2011-10-03 09:50:01 -07:00