Eric Banks
f62af0291b
Check for invalid VCF records (not enough tokens) instead of assuming they are there.
2011-10-31 14:09:51 -04:00
Andrey Sivachenko
bed0acaed4
nWayOut now adds PG tag to the header as it should. Also, additional hidden option added: keepPGTags. If invoked, IndelRealigner PG tags from previous runs (if any) are kept in the header and the new PG tag is simply added, instead of overriding them
2011-10-31 12:28:28 -04:00
Mauricio Carneiro
389380a590
ReduceReads ref bases are now output as '=' to save space
...
Restructured the sliding window framework to manipulate a wrapped version of the SAMRecord that contains information about the reference.
2011-10-30 12:04:39 -04:00
Mauricio Carneiro
dbd8c25787
No more R resources in the DPP
...
updating the DPP to conform with Analyze Covariates changes.
2011-10-28 16:57:01 -04:00
Khalid Shakir
e25d40882a
Swapping Thread.sleep(0) with Object.wait(0) caused Queue to lock up. Thanks to rpoplin for pointing it out.
2011-10-28 15:51:03 -04:00
Eric Banks
0ca7428e76
Allow processing of empty intervals, but warn user when this case is encountered.
2011-10-28 12:12:14 -04:00
Eric Banks
649dfe98f0
Add VCF header for any expressions that are requested
2011-10-28 10:22:19 -04:00
Eric Banks
8b1a62da27
Adding unit test to cover overlapping intervals from the same source with the intersection rule.
2011-10-28 09:59:43 -04:00
Eric Banks
057a79f598
This argument should be annotated as @Input
2011-10-28 09:44:49 -04:00
Eric Banks
4ba7c0cecd
Moving to private
2011-10-28 09:29:28 -04:00
Eric Banks
1bdd76c2f2
These tools now use the IntervalBinding system to handle intervals instead of doing it all manually
2011-10-28 09:28:12 -04:00
Eric Banks
6ba08a103d
Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory.
2011-10-28 09:23:25 -04:00
Eric Banks
3d04bb5608
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-27 23:55:18 -04:00
Eric Banks
19e27d4568
Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative.
2011-10-27 23:55:11 -04:00
Eric Banks
cafc245a43
For some reason, a class of Codecs (including TableCodec) require that a GenomeLocParser be passed in to do the position processing. Why can't they just return a Feature with chr, start, stop? Isn't that the right thing?
2011-10-27 23:54:28 -04:00
Guillermo del Angel
cbc43683ee
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-27 20:54:18 -04:00
Guillermo del Angel
8907e42007
First fully functional implementation of ValidationSiteSelectorWalker. User gives a) a set of input variants, b) a desired number of output variants, b) Optionally, a set of samples which will restrict sites to be polymorphic in those samples, c) a frequency selection mode: either uniform (no AF matching), or matching AF so that output sites mirror the input AF spectrum as closely as possible.
...
More testing is needed and docs need improving but so far all functionality seems up and running
2011-10-27 20:53:48 -04:00
Eric Banks
ccfd853b34
Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed.
2011-10-27 20:43:50 -04:00
Eric Banks
c2f343773e
Oops, working too quickly last time. This is the proper fix for the potential NPE in the equals() test.
2011-10-27 15:32:08 -04:00
Khalid Shakir
4d0e34109f
Compacting pdfs when running under R 2.13+.
2011-10-27 14:51:56 -04:00
Khalid Shakir
b80d407dc7
No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
...
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Eric Banks
8c4dbce6d8
Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing.
2011-10-27 13:58:19 -04:00
Eric Banks
4a7e6fee3f
Remove support for BED file interval parsing in the GATK; it should all go through Tribble now. IndelRealigner no longer supports unordered interval input (which shouldn't have been used anyways). Temporarily commenting out serialization of arguments so that tests pass; this whole piece will be deleted soon anyways.
2011-10-27 13:38:08 -04:00
Matt Hanna
f7df8bdecc
Merged bug fix from Stable into Unstable
2011-10-27 11:31:17 -04:00
Matt Hanna
41ddc7bce7
Make sure we output a full stack trace when we encounter Tribble error messages on VCF header merge.
2011-10-27 11:30:04 -04:00
Eric Banks
44f905b5e5
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-26 23:31:11 -04:00
Eric Banks
68283b1651
Fixing docs and adding GATKdocs for the new interval functionality
2011-10-26 22:14:43 -04:00
Mark DePristo
c9978316a3
Merge branch 'FragmentUtils'
2011-10-26 19:51:49 -04:00
Mauricio Carneiro
add9ad97ec
No scatter gather for VQSR or ApplyVQSR.
...
These walkers should not be scatter gatherable. Annotating them accordingly so that Queue doesn't allow a less than knowledgeable user to try and scatter/gather VQSR.
2011-10-26 16:35:44 -04:00
Ryan Poplin
74aeb22eeb
Merged bug fix from Stable into Unstable
2011-10-26 15:57:30 -04:00
Ryan Poplin
86871bd1e3
Throw a UserException in the BQSR when there is no data instead of creating an empty csv file
2011-10-26 15:56:41 -04:00
Mark DePristo
034a997d07
Generalized Reads -> Fragment calculation
...
-- Supports ReadBackedPileup -> FragmentCollection as before
-- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller
-- General cleanup, renaming, move to separate package, more extensive unit tests, etc.
-- Added toFragment() function to ReadBackedPileup interface
2011-10-26 15:54:38 -04:00
Eric Banks
2f21b6ecfb
Removed debugging output
2011-10-26 15:50:20 -04:00
Eric Banks
b39fcb1bea
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-26 15:44:25 -04:00
Eric Banks
b6ce6ed3f8
Go around the ROD system for now so that we can just call decodeLoc() for efficiency. Noted that we should go through the ROD system once it gets cleaned up. This means that currently gzipped files are not supported with -L.
2011-10-26 15:42:53 -04:00
Eric Banks
3273c20c98
Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes.
2011-10-26 15:29:18 -04:00
Eric Banks
9424e8b2ca
Initial working version of new interval system in which the argument for -L (and -XL) is allowed to be a rod file (e.g. VCF). Old samtools-style intervals still behave as before. BTI is no longer supported. The merging (union or intersection) of intervals is now consistently applied to all -L (or -XL) intervals, which is nice. More testing needed.
2011-10-26 14:11:49 -04:00
Mark DePristo
7fa943aef1
Renamed FragmentPileup to FragmentUtils
2011-10-26 14:01:45 -04:00
Mark DePristo
af3613cc5f
GATKSAMRecord commit branch summary
...
First, I'm sure there's a better way to do this, but I wanted to create a single commit summarizing the changes from my branch SamRecordFactory. What's the best way to do this? Rebase?
Now, on to the changes here:
-- Picard added a SamRecordFactory that is used to create instances the subclass SamRecord or BAMRecord. This factory allows us to have low-level picard readers (SamFileReader) create objects of type GATKSamRecord. The abomination of the extends and contains GATKSamRecord is now gone. GATKSamRecords are now produced by this factory, the GATK provides this factory to our SamFileReaders, and everything works with GATKSamRecord just extending BAMRecord. This results in up to a 2x performance improvement in writing BAMs and a ~10% improvement when reading BAMs files.
-- As a consequence of this, we no longer officially support SAM records. Attempting to create SAMRecord objects with the factory will throw a user exception.
-- Created a standard NGSPlatform enum, and GATKSamRecords support efficiently obtaining this value. The real BQSR (not the copy indel version) got the efficient code to use this. Please add all future platforms to this enum.
-- GATKSamRecord no longer supports using the OQ or defaultBaseQuality. This is performed in a wrapper iterator that's only added when these command line options are used.
-- ReducedRead code has been moved from ReadUtils until efficiency caching assessors in GATKSamRecord.
-- ArtificialSamUtils creates GATKSamRecords now, just SAMRecords. Added code here to create artifical pairs and using that code to create artificial ReadBackedPileups with specific properties
-- New smarter algorithm for FragmentPileup. This new code is up to 3x faster than the previous version, and is lazy so is more efficient when no overlapping pairs are actually in the pileup. Created extensive DataProvider driven UnitTest. Added Caliper-based benchmarking system to characterize the performance differences between the old and new algorithms. TODO still remains to make a efficient version that works for non-pileups for the HaplotypeCaller
2011-10-25 20:52:56 -04:00
Mark DePristo
2822f0dc27
Merge branch 'SamRecordFactory'
2011-10-25 20:34:47 -04:00
Mark DePristo
1b722c21cf
merge master
2011-10-25 16:08:39 -04:00
Ryan Poplin
56fdf0b865
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-25 15:58:56 -04:00
Ryan Poplin
4a34c1862e
misc cleanup. We now filter out haplotypes when it is obvious that the assembly has failed to find a parsimonious event rather than use haplotypes with large numbers of SNPs and small indels on them.
2011-10-25 15:22:28 -04:00
David Roazen
2794e5c1d4
Modified the VCFJarClassLoadingUnitTest to play nice with the packaged-jar test targets.
2011-10-25 14:47:15 -04:00
Guillermo del Angel
b559936b7a
a)New variant eval stratification module for indel size. b) Next iteration on indel caller runtime optimization: when computing likelihood of each haplotype for a given read, many computations will be redundant since pieces of haplotypes will be common to both REF and ALT haplotypes. So, we keep HMM matrices from one haplotype to the next one and recompute starting at the part where either haplotype is different or GOP/GCP are different.
2011-10-25 09:56:43 -04:00
Khalid Shakir
fac9932938
Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
...
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Khalid Shakir
89a581a66f
Added ability to specify arguments in files via -args/--arg_file
...
Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()
2011-10-24 15:58:34 -04:00
Mark DePristo
502592671d
Cleanup FragmentPileup before main repo commit
...
-- removed intermiate functions. Now only original version and best optimized new version remain
-- Moved general artificial read backed pileup creation code into ArtificialSamUtils
2011-10-24 14:40:05 -04:00
Mark DePristo
166174a551
Google caliper example execution script
...
-- FragmentPileup with final performance testing
2011-10-24 14:04:53 -04:00
Mark DePristo
f6ccac889b
Merged bug fix from Stable into Unstable
2011-10-23 16:37:12 -04:00
Mark DePristo
585a45b7a3
Bug fix for ClipReadsWalker when stats output isn't provided
...
-- See http://getsatisfaction.com/gsa/topics/clipreadswalker?utm_content=topic_link&utm_medium=email&utm_source=reply_notification
2011-10-23 16:36:48 -04:00
Ryan Poplin
f5d910b8a5
Haplotype caller now sends genotype likelihoods to the exact model to genotype the events found in the best haplotypes.
2011-10-23 13:29:08 -04:00
Mark DePristo
42bf9adede
Initial version of "fast" FragmentPileup code
...
-- Uses mayOverlapRoutine in ReadUtils
-- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations
-- PileupElement now comparable (sorts on offset than on start)
-- Caliper microbenchmark to assess performance
2011-10-22 21:36:37 -04:00
Mauricio Carneiro
4913f8a60f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-21 17:45:07 -04:00
Mauricio Carneiro
86305a5dcf
Adjusting the memory limits of the MDCP
...
Indel caller needs more than 3G for large datasets.
2011-10-21 17:41:52 -04:00
Mauricio Carneiro
102dafdcbc
Validation of GATKSamRecord in read filters
...
Moved the validation of the GATKSamRecord to the MalformedReadFilter with the intent to make the read filter the ultimate validation location for sam records. This way we can opt to filter out malformed reads if we know what we are doing or blow up otherwise.
2011-10-21 17:40:43 -04:00
Guillermo del Angel
f4b409fa0d
CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result
2011-10-21 14:07:20 -04:00
Mark DePristo
b863390cb1
Moving reduced read functionality into GATKSAMRecord
...
-- More functions take / produce GATKSAMRecords instead of SAMRecord
2011-10-21 13:28:05 -04:00
Mark DePristo
2403e96062
Renamed GATKSamRecord -> GATKSAMRecord for consistency. Better docs.
2011-10-21 09:59:24 -04:00
Mark DePristo
110e13bc1e
Merge branch 'master' into SamRecordFactory
2011-10-21 09:43:52 -04:00
Mark DePristo
be797a8a1f
Recalibrator now uses the much more efficient NGSPlatform in the cycle covariates system
2011-10-21 09:39:21 -04:00
Mark DePristo
ed74ebcfa1
GATKSamRecords with efficiency NGSPlatform method
2011-10-21 09:38:41 -04:00
Mark DePristo
94e1898d8f
A canonical set of NGS platforms as enums with convenient manipulation methods
2011-10-21 09:37:45 -04:00
Mauricio Carneiro
9f867d77ca
no sort order
...
subtle bug fixed.
2011-10-20 18:44:09 -04:00
Mauricio Carneiro
c9d8b22092
Added BWASW support to the pipeline
...
Data Processing Pipeline can now use BWASW for realigning the reads. Useful for Ion Torrent data.
2011-10-20 18:36:28 -04:00
Mauricio Carneiro
093cd95c5d
Merged bug fix from Stable into Unstable
2011-10-20 17:03:22 -04:00
Mauricio Carneiro
d7367c152a
Fixing 'revert' when not realigning
...
RevertSam was reverting the alignment information and that was screwing up the pipeline if you didn't want to run it with BWA. Fixed.
2011-10-20 17:01:54 -04:00
Mauricio Carneiro
558a7a81f0
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-20 16:23:32 -04:00
Mauricio Carneiro
ed402588cc
Adding the "gold standard NA12878" target
2011-10-20 16:19:13 -04:00
Mark DePristo
999a8998ae
Constructor for GATKSamRecord with header only, for unit testing
2011-10-19 17:51:48 -04:00
Mark DePristo
3227143a1c
Systematic test code for FragmentPileup
...
-- Creates all combinatinos of overlapping and non-overlapping read pair pileups in all orientations and first/second pairings to validate fragment detection.
2011-10-19 17:50:27 -04:00
Mark DePristo
bba69701b5
Now creates GATKSamRecords now SamRecords
2011-10-19 17:49:17 -04:00
Christopher Hartl
cd8a6d62bb
You know how the wiki has a big section on commiting local changes to BRANCHES of the repository you clone it from? Yeah. It sucks if you don't do that.
...
This commit contains:
- IntronLossGenotyper is brought into its current incarnation
- A couple of simple new filters (ReadName is super useful for debugging, MateUnmapped is useful for selecting out reads that may have a relevant unaligned mate)
- RFA now matches my current local repository. It's in flux since I'm transitioning to the new traversal type.
+ the triggering read stash pilot required me to change the scope of some of the variables in the ReadClipping code, private -> protected. Those are all the changes there.
- MendelianViolation restored to its former glory (and an annotator module that uses the likelihood calculation has been added)
+ use this rather than a hard GQ threshold if you're doing MV analyses.
- Some miscellaneous QScripts
2011-10-19 17:42:37 -04:00
Mark DePristo
52345f0aec
Meaningful documentation string
2011-10-19 15:47:36 -04:00
Mark DePristo
1b38aa1a7e
Cleaning up reduced read code accessors
2011-10-19 15:46:44 -04:00
Eric Banks
d8d73fe4f2
Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown.
2011-10-19 15:11:13 -04:00
Mark DePristo
7928b287fc
GATKSamRecord now produced by SAMFileReaders by default
...
-- Removed all of the unnecessary caching operations in GATKSAMRecord
-- GATKSAMRecord renamed to GATKSamRecord for consistency
2011-10-19 13:15:27 -04:00
Eric Banks
5a6468c11e
Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it.
2011-10-19 11:52:05 -04:00
Eric Banks
48c4a8cb33
Make error messages clearer (even I was confused)
2011-10-19 11:49:16 -04:00
Eric Banks
6cadaa84c9
Just use validate() from super class since it does the same thing
2011-10-19 11:48:23 -04:00
Mark DePristo
df3e4e1abd
First working code to use SamRecordFactory to produce objects of our own design in SAMFileReader
2011-10-19 11:22:35 -04:00
Mauricio Carneiro
c27e2fb676
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-18 15:23:05 -04:00
Menachem Fromer
2125c4f38f
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-18 14:49:00 -04:00
Menachem Fromer
e5fc828546
With Khalid's implicit approval, I have removed this line that overrides the memory limit of the VCF-gathering function, so that the inherited limit remains
2011-10-18 14:47:39 -04:00
Mark DePristo
f77f2eeb7d
Fix for new ID structure
2011-10-18 13:04:43 -04:00
Mark DePristo
1a92ee3593
No longer adds a binding of ID -> . when the ID field is dot in the VCF
...
-- Really we should make ID a primary key in VariantContext. Putting it into the attributes is just annoying now
2011-10-18 10:57:02 -04:00
Ryan Poplin
e45fcb66eb
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-17 15:56:19 -04:00
Ryan Poplin
1e6794c539
fixing typo in VariantsToTable docs
2011-10-17 15:56:02 -04:00
Mark DePristo
0de8550f17
Merged bug fix from Stable into Unstable
2011-10-17 15:29:53 -04:00
Mark DePristo
c1329c4dde
Fixing a binary to logical or
2011-10-17 15:29:45 -04:00
Mark DePristo
9e4963efc8
Merged bug fix from Stable into Unstable
2011-10-17 15:27:38 -04:00
Mark DePristo
ec911ce5bb
Even better error messages
2011-10-17 15:27:22 -04:00
Mark DePristo
d065bf1715
Merged bug fix from Stable into Unstable
2011-10-17 15:25:47 -04:00
Mark DePristo
a7cf9cdc67
Fixing error message typo
2011-10-17 15:25:35 -04:00
Ryan Poplin
589df6b7cf
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-17 14:35:14 -04:00
Ryan Poplin
6b02354d84
Adding a new getter in VariantsToTable to extract the indel event length.
2011-10-17 14:34:52 -04:00
Mark DePristo
3550798c4c
Merged bug fix from Stable into Unstable
2011-10-17 13:58:56 -04:00
Mark DePristo
4108a294f7
Better error message when a RodBinding file doesn't exist
2011-10-17 13:58:46 -04:00
Mark DePristo
cc76826f78
Merged bug fix from Stable into Unstable
2011-10-17 13:38:11 -04:00
Mark DePristo
09a09cacef
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable
2011-10-17 13:38:00 -04:00