Ryan Poplin
f5d910b8a5
Haplotype caller now sends genotype likelihoods to the exact model to genotype the events found in the best haplotypes.
2011-10-23 13:29:08 -04:00
Mark DePristo
42bf9adede
Initial version of "fast" FragmentPileup code
...
-- Uses mayOverlapRoutine in ReadUtils
-- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations
-- PileupElement now comparable (sorts on offset than on start)
-- Caliper microbenchmark to assess performance
2011-10-22 21:36:37 -04:00
Mauricio Carneiro
4913f8a60f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-21 17:45:07 -04:00
Mauricio Carneiro
102dafdcbc
Validation of GATKSamRecord in read filters
...
Moved the validation of the GATKSamRecord to the MalformedReadFilter with the intent to make the read filter the ultimate validation location for sam records. This way we can opt to filter out malformed reads if we know what we are doing or blow up otherwise.
2011-10-21 17:40:43 -04:00
Guillermo del Angel
f4b409fa0d
CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result
2011-10-21 14:07:20 -04:00
Mark DePristo
b863390cb1
Moving reduced read functionality into GATKSAMRecord
...
-- More functions take / produce GATKSAMRecords instead of SAMRecord
2011-10-21 13:28:05 -04:00
Mark DePristo
2403e96062
Renamed GATKSamRecord -> GATKSAMRecord for consistency. Better docs.
2011-10-21 09:59:24 -04:00
Mark DePristo
110e13bc1e
Merge branch 'master' into SamRecordFactory
2011-10-21 09:43:52 -04:00
Mark DePristo
be797a8a1f
Recalibrator now uses the much more efficient NGSPlatform in the cycle covariates system
2011-10-21 09:39:21 -04:00
Mark DePristo
ed74ebcfa1
GATKSamRecords with efficiency NGSPlatform method
2011-10-21 09:38:41 -04:00
Mark DePristo
94e1898d8f
A canonical set of NGS platforms as enums with convenient manipulation methods
2011-10-21 09:37:45 -04:00
Mark DePristo
999a8998ae
Constructor for GATKSamRecord with header only, for unit testing
2011-10-19 17:51:48 -04:00
Mark DePristo
bba69701b5
Now creates GATKSamRecords now SamRecords
2011-10-19 17:49:17 -04:00
Christopher Hartl
cd8a6d62bb
You know how the wiki has a big section on commiting local changes to BRANCHES of the repository you clone it from? Yeah. It sucks if you don't do that.
...
This commit contains:
- IntronLossGenotyper is brought into its current incarnation
- A couple of simple new filters (ReadName is super useful for debugging, MateUnmapped is useful for selecting out reads that may have a relevant unaligned mate)
- RFA now matches my current local repository. It's in flux since I'm transitioning to the new traversal type.
+ the triggering read stash pilot required me to change the scope of some of the variables in the ReadClipping code, private -> protected. Those are all the changes there.
- MendelianViolation restored to its former glory (and an annotator module that uses the likelihood calculation has been added)
+ use this rather than a hard GQ threshold if you're doing MV analyses.
- Some miscellaneous QScripts
2011-10-19 17:42:37 -04:00
Mark DePristo
52345f0aec
Meaningful documentation string
2011-10-19 15:47:36 -04:00
Mark DePristo
1b38aa1a7e
Cleaning up reduced read code accessors
2011-10-19 15:46:44 -04:00
Eric Banks
d8d73fe4f2
Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown.
2011-10-19 15:11:13 -04:00
Mark DePristo
7928b287fc
GATKSamRecord now produced by SAMFileReaders by default
...
-- Removed all of the unnecessary caching operations in GATKSAMRecord
-- GATKSAMRecord renamed to GATKSamRecord for consistency
2011-10-19 13:15:27 -04:00
Eric Banks
5a6468c11e
Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it.
2011-10-19 11:52:05 -04:00
Eric Banks
48c4a8cb33
Make error messages clearer (even I was confused)
2011-10-19 11:49:16 -04:00
Eric Banks
6cadaa84c9
Just use validate() from super class since it does the same thing
2011-10-19 11:48:23 -04:00
Mark DePristo
df3e4e1abd
First working code to use SamRecordFactory to produce objects of our own design in SAMFileReader
2011-10-19 11:22:35 -04:00
Mauricio Carneiro
c27e2fb676
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-18 15:23:05 -04:00
Mark DePristo
f77f2eeb7d
Fix for new ID structure
2011-10-18 13:04:43 -04:00
Mark DePristo
1a92ee3593
No longer adds a binding of ID -> . when the ID field is dot in the VCF
...
-- Really we should make ID a primary key in VariantContext. Putting it into the attributes is just annoying now
2011-10-18 10:57:02 -04:00
Ryan Poplin
e45fcb66eb
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-17 15:56:19 -04:00
Ryan Poplin
1e6794c539
fixing typo in VariantsToTable docs
2011-10-17 15:56:02 -04:00
Mark DePristo
0de8550f17
Merged bug fix from Stable into Unstable
2011-10-17 15:29:53 -04:00
Mark DePristo
c1329c4dde
Fixing a binary to logical or
2011-10-17 15:29:45 -04:00
Mark DePristo
9e4963efc8
Merged bug fix from Stable into Unstable
2011-10-17 15:27:38 -04:00
Mark DePristo
ec911ce5bb
Even better error messages
2011-10-17 15:27:22 -04:00
Mark DePristo
d065bf1715
Merged bug fix from Stable into Unstable
2011-10-17 15:25:47 -04:00
Mark DePristo
a7cf9cdc67
Fixing error message typo
2011-10-17 15:25:35 -04:00
Ryan Poplin
589df6b7cf
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-17 14:35:14 -04:00
Ryan Poplin
6b02354d84
Adding a new getter in VariantsToTable to extract the indel event length.
2011-10-17 14:34:52 -04:00
Mark DePristo
3550798c4c
Merged bug fix from Stable into Unstable
2011-10-17 13:58:56 -04:00
Mark DePristo
4108a294f7
Better error message when a RodBinding file doesn't exist
2011-10-17 13:58:46 -04:00
Mark DePristo
cc76826f78
Merged bug fix from Stable into Unstable
2011-10-17 13:38:11 -04:00
Mark DePristo
fd4540cd32
Fixed extraordinarily subtle race condition with contracts invariant
...
-- all of the methods in the class must be synchronized or the internal state can be inconsistent with the contract invariant when entering the class in a non-synchronized method, even when that method doesn't care about the object's internal state
2011-10-17 13:37:55 -04:00
Mark DePristo
5a881360df
Merged bug fix from Stable into Unstable
2011-10-13 15:54:43 -04:00
Mark DePristo
7cab6f6bb0
Bug fixes for thread unsafe simple timer and bad Ns treatment in AlignmentUtils
...
-- SimpleTimer is now threadsafe using synchronized method keywords
-- Bug fix for alignmentToByteArray() where the N case was refPos++ not the now correct refPos += elementLength
2011-10-13 15:53:12 -04:00
Mauricio Carneiro
e12ffb6547
Updating docs for GCContentByInterval
...
This walker does not take any BAMs. It only walks over the reference.
2011-10-13 13:27:00 -04:00
Eric Banks
9aecd50473
Adding ability to exclude annotations from the VA and UG lists. As described in the docs, this argument trumps all others (including -all) so that we can get around the SnpEff issue brought up by Menachem. Added integration test for it.
2011-10-12 15:44:54 -04:00
Mauricio Carneiro
e53a952aeb
Added ION Torrent support to CountCovariates.
2011-10-12 01:57:02 -04:00
Mauricio Carneiro
a2733a451f
Added NotCalled feature to GAV
...
Added "not called" and "no status" to the truth table. Very useful.
2011-10-11 19:31:45 -04:00
David Roazen
ae83420637
Merged bug fix from Stable into Unstable
2011-10-11 12:26:08 -04:00
David Roazen
794f275871
SnpEff is now marked as a RodRequiringAnnotation instead of an ExperimentalAnnotation.
...
Having SnpEff grouped with the Experimental annotations was proving problematic, since it
requires a rod. Placing it in its own group should improve the situation somewhat, making it
easier to request "all annotations except for SnpEff".
2011-10-11 12:08:56 -04:00
David Roazen
cfd0ac8410
Merged bug fix from Stable into Unstable
...
Conflicts:
public/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java
2011-10-11 12:03:51 -04:00
David Roazen
24b72334b3
UnifiedGenotyper now correctly initializes the VariantAnnotator engine.
...
This allows the annotation classes to perform any necessary initialization/validation.
For example, it allows the SnpEff annotator to (among other things) validate its rod binding.
This will prevent a NullPointerException when SnpEff annotation is requested but no rod binding
is present.
Added an integration test to cover this case so that it doesn't break again.
2011-10-11 12:02:05 -04:00
Guillermo del Angel
0429b38021
Merged bug fix from Stable into Unstable
2011-10-11 11:19:38 -04:00
Guillermo del Angel
1c485d8b5e
Forgot that no matter how trivial a change it's a good idea to compile first
2011-10-11 11:18:41 -04:00
Guillermo del Angel
6418f4d69b
Merged bug fix from Stable into Unstable
2011-10-11 11:13:18 -04:00
Guillermo del Angel
1975de1b32
Second try: hide --do_indel_quality in AnalyzeCovariates
2011-10-11 11:11:29 -04:00
Guillermo del Angel
6506ea83e8
Revert "Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users"... a hidden passenger change made it through.
...
This reverts commit 70e10ccb1be90dcff8f4485ae6ee036db2d1ac86.
2011-10-11 11:03:12 -04:00
Guillermo del Angel
4c1d8c8d44
Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users
2011-10-11 11:01:06 -04:00
Eric Banks
77c983c5b5
No one claimed this walker and it doesn't have integration tests or GATKdocs so it doesn't belong in public.
2011-10-10 15:17:54 -04:00
Mark DePristo
fb72bcf732
DiffObjects no longer prints out the file name in the status so MD5 are stable
2011-10-10 15:10:57 -04:00
Mark DePristo
46e7370128
this.allele, getAlleles(), and getAltAlleles() now return List not set
...
-- Changes associated code throughout the codebase
-- Updated necessary (but minimal) UnitTests to reflect new behavior
-- Much better makealleles() function in VC.java that enforces a lot of key constraints in VC
2011-10-09 11:45:55 -07:00
Mark DePristo
c67f6c076b
simpleMerge now preserves allele order
...
-- UnitTests for dangerous PL merging cases in the multi-allelic case. The new behavior is correct
2011-10-08 17:39:53 -07:00
Mark DePristo
ec14a4a606
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-07 08:38:50 -07:00
Eric Banks
ca9cd9b688
Minor fix for merging intervals which hadn't been necessary when only merging from the left to right. Added integration tests to cover the parallelization of RTC.
2011-10-06 22:38:44 -04:00
Mark DePristo
c7864c7256
Filter application order is now deterministic, in the order defined by the walker
...
-- For no apparent reason we were using a HashSet to store the ReadFilters, so the order of operations was really arbitrarily applied. The order now is
(1) the order of the walker intrinsic filters
(2) read group black list (if provided)
(3) command line filters (if provided)
2011-10-06 18:51:40 -07:00
Mark DePristo
0b88af4af9
Counts of records failing filters are displayed sorted
...
-- Stops random ordering of the output, as the counts are returned sorted by string name of the class
-- Deleted now unused sh*tty assessors in Utils
2011-10-06 18:42:26 -07:00
Mark DePristo
d1e70d6ec2
Removed Nx counting of reads in metrics with -nt > 1
2011-10-06 18:29:26 -07:00
Eric Banks
c61804a450
Rename the long version of the argument name to more accurately reflect its purpose.
2011-10-06 16:14:04 -04:00
Eric Banks
61a3dfae24
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 15:58:04 -04:00
Eric Banks
6eb87bf58a
RTC now caches all intervals as GenomeLocs (which is expected to take < 1Gb whole genome based on back of the envelope calculations with Matt) so that 1) we don't have to worry about emitting outside of the leaves in the hierarchical reductions and 2) we can emit the intervals in sorted order which is a big performance plus for the realigner. Integration tests change only because intervals whose start=stop are now printed as chr:start instead of chr:start-stop.
2011-10-06 15:57:49 -04:00
Eric Banks
1b0735f0a3
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 13:41:45 -04:00
Eric Banks
c4dfc1fb8b
Temporary commit of parallelization support for RealignerTargetCreator. Tim begged us for this and I got assurances from Khalid/Matt that this would also be extremely helpful for the whole genome calling pipeline, so I spent a while working on this. Needs to be fixed up though because apparently only the leaves in the hierarchical reduce get their output aggregated. Worked out a better solution with Matt.
2011-10-06 13:41:36 -04:00
Mark DePristo
73f9d1f217
GATK read group requirement iron hand
...
-- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined
-- LIBS again throws an error if the complete list of samples isn't provided
-- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data.
-- Convenience constructors for UserExceptions.MalformedBAM
2011-10-06 08:40:35 -07:00
Mark DePristo
23845ac798
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 08:17:08 -07:00
Mark DePristo
daa5999489
Fixed typo in argument description
2011-10-06 08:16:25 -07:00
Guillermo del Angel
8a474e38ff
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 10:08:39 -04:00
Guillermo del Angel
93f7e632bd
Minor fix/enhancement for VariantEval: if a vcf has symbolic alleles, program would crash ungracefully - now we'll just skip record without processing. This is a big issue since we can't process 1000G integration files with code as is.
2011-10-06 10:07:46 -04:00
Mark DePristo
190be4d0d1
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-05 21:27:11 -07:00
Mark DePristo
8e6845806a
Allowing empty samples list in LIBS
...
-- Right now we cannot process BAM files without read groups because we enforce the samples list to not be empty when there's a SAM record. Now if there are reads and there are no samples we add the "null" sample so that LIBS walks the reads properly
2011-10-05 21:26:21 -07:00
Matt Hanna
180c8f286f
Merged bug fix from Stable into Unstable
2011-10-05 20:37:43 -04:00
Matt Hanna
55b9f06527
Ensure that IndelRealigner n-way out option supports MD5 generation.
2011-10-05 20:36:28 -04:00
Mark DePristo
be2d29ce69
Final PED documentation
2011-10-05 15:17:41 -07:00
Mark DePristo
3226d5dc0d
Merge branch 'master' into ped
2011-10-05 15:03:09 -07:00
Mark DePristo
6a573437af
Details documentation arguments for -ped
2011-10-05 15:00:58 -07:00
Mark DePristo
e7c80f7c45
Renaming quantitative trait to OtherPhenotype which is now a String not a double
...
-- we can now use PED file to represent population data or other arbitrary phenotype data, not just doubles
2011-10-05 12:26:33 -07:00
Mark DePristo
51ecc20867
getFamily() and associated methods implemented and tested
...
-- Sample no longer serializable
-- Sample now implements Comparable
2011-10-05 09:55:05 -07:00
Mark DePristo
a45d985818
TODO method stubs
2011-10-04 15:54:09 -07:00
Mark DePristo
fee89e47ff
Only throws an error when there are no samples but there are reads
...
-- Handles the case when you are running a ROD traversal and yet the LIBS is still used to return null everywhere.
2011-10-04 06:50:54 -07:00
Mark DePristo
f552aede42
Only provide the sample names in the BAM file for efficiency
2011-10-04 06:50:12 -07:00
Mark DePristo
a27641e1fc
Cleaned up imports
2011-10-04 06:28:36 -07:00
Mark DePristo
b20689ff55
No longer supports extraProperties
...
-- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem
-- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record. If the two records are inconsistent, an error is thrown
-- addSample() in Sample.class now invokes mergeSample() when appropriate
-- Validation types are now only STRICT or SILENT
-- Validation code implemented in SampleDBBuilder
-- Extensive unit tests for SampleDBBuilder
2011-10-03 19:20:33 -07:00
Mauricio Carneiro
3837aa45b4
Fixing conflicts
...
Conflicts:
public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java
2011-10-03 19:07:59 -07:00
Mark DePristo
2e3dc52088
Minor function renaming
2011-10-03 14:41:13 -07:00
Mark DePristo
dd71884b0c
On path to SampleDB engine integration
...
-- PedReader tag parser
-- Separation of SampleDBBuilder from SampleDB (now immutable)
-- Removed old sample engine arguments
2011-10-03 12:08:07 -07:00
Eric Banks
c3eff7451a
Found a small inefficiency while profiling: we were still using String.split instead of ParsingUtils.split to break up array values in the INFO field. There was a noticeable (albeit not big) difference in the change when reading sites only files.
2011-10-03 14:20:39 -04:00
Mark DePristo
8ee0f91904
Remove residual processing tracker arguments
2011-10-03 09:50:01 -07:00
Mark DePristo
89ac50e86e
SampleDataSource -> SampleDB
2011-10-03 09:33:30 -07:00
Mark DePristo
93fba06cb5
Support for whitespace only lines
2011-10-03 09:30:10 -07:00
Mark DePristo
0604ce55d1
PedReader support for ; separated lines, not only newline
2011-10-03 09:19:58 -07:00
Mark DePristo
52f670c8b8
100% version of PedReader
...
-- Passes all unit tests
-- Added unit tests for missing fields
2011-10-03 06:12:58 -07:00
Mark DePristo
dd75ad9f49
95% PedReader
...
-- Passes significiant unit tests
-- Implicit sample creation for mom / dad when you create single samples
-- Continuing cleanup of Sample and SampleDataSource
2011-09-30 18:03:34 -04:00
Andrey Sivachenko
c7898a9be7
inconsequential change in string constants printed into the vcf which noone uses anyway...
2011-09-30 16:40:21 -04:00
Mark DePristo
010899f886
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-30 15:51:09 -04:00