Mark DePristo
e2311294c0
Removed unused ManualSortingVCFWriter
2012-05-24 10:56:59 -04:00
Mark DePristo
93cef82637
BCF2 header encoding decoding at final spec
2012-05-24 10:56:58 -04:00
Mark DePristo
ce9e9eebb1
No dictionary in header. Now built dynamically from the header in the writer and codec
...
-- Created BCF2Utils and moved BCF2Constants and TypeDescriptor methods there
2012-05-24 10:56:58 -04:00
Mark DePristo
f0b081a85f
Update VCF.jar loading test
...
-- to reflect new path to VCFWriter
2012-05-24 10:56:58 -04:00
Mark DePristo
c3b8048e2e
Moving around classes in VCF and BCF2
...
-- Refactored VCF writers into vcf.writers package
-- Moved BCF2Writer to bcf2.writer
-- Updates to all of the walkers using VCFWriter to reflect new packages
-- A large number of files had their headers cleaned up because of this as well
2012-05-24 10:56:58 -04:00
Mark DePristo
679ffdd333
Move BCF2 from private utils to public codecs
2012-05-24 10:56:56 -04:00
Mark DePristo
450f098a61
BCF2 encoder / decoder implement new site / genotype block organization
...
-- Supports final organization of data blocks into sites data and genotypes data
2012-05-24 10:56:55 -04:00
Mark DePristo
27b51d4dea
Enable on the fly indexing of BCF2
2012-05-24 10:56:54 -04:00
Mark DePristo
81bd7646d6
Fix for MISSING floats
...
-- Restructured code to separate the MISSING value in java (currently everywhere a null) from the byte representation on disk (an int).
-- Now handles correctly MISSING qual fields
2012-05-24 10:56:53 -04:00
Mark DePristo
3afbc50511
More BCF2 improvements
...
-- Refactored setting of contigs from VCFWriterStub to VCFUtils. Necessary for proper BCF working
-- Added VCFContigHeaderLine that manages the order for sorting, so we now emit contigs in the proper order.
-- Cleaned up VCFHeader operations
-- BCF now uses the right header files correctly when encoding / decoding contigs
-- Clean up unused tools
-- Refactored header parsing routines to make them more accessible
-- More minor header changes from Intellij
2012-05-24 10:56:52 -04:00
Mark DePristo
0799855479
Archiving GCF
...
-- Rider update to CramByPiece.scala
2012-05-24 10:56:51 -04:00
Guillermo del Angel
43919078cd
Merged bug fix from Stable into Unstable
2012-05-23 21:21:01 -04:00
Guillermo del Angel
4bc04e2a9e
Correct way in which start/stop positions in a VC are computed when creating an indel VC. Old way was incorrect in case GENOTYPE_GIVEN_ALLELES was specified with a complex record. New way should work in general for all cases and is simpler.
2012-05-23 21:19:30 -04:00
Ryan Poplin
08dfd6cab6
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-21 16:47:07 -04:00
Ryan Poplin
04000d920c
Bug fix in BadCigar read filter for index out of bounds exception when used with a bam file that contains unmapped reads.
2012-05-21 16:46:59 -04:00
Eric Banks
666862af19
Added @Hidden option for GSA production use to cap the max alleles for indels at a lower number than for SNPs
2012-05-21 16:03:29 -04:00
Khalid Shakir
e57cd78bba
Killed two more resource leakers that ignored requests to close wrapped file pointers, and added Unit Tests for each.
...
This bug will happen in all adapter/wrapper classes that are passed a resource, and then in their close method they ignore requests to close the wrapped resource, causing a leak when the adapter is the only one left with a reference to the resource.
Ex:
public Wrapper getNewWrapper(File path) {
FileStream myStream = new FileStream(path); // This stream must be eventually closed.
return new Wrapper(myStream);
}
public void close(Wrapper wrapper) {
wrapper.close(); // If wrapper.close() does nothing, NO ONE else has a reference to close myStream.
}
2012-05-21 15:41:56 -04:00
Eric Banks
7f5ec17d22
Fixed up the comments in the GATKReportTable code and added some sanity checks to make sure that the user doesn't inconsistently add rows and corresponding IDs to the table.
2012-05-21 14:16:13 -04:00
Eric Banks
92d8aa3d4c
Don't exception out in these VE modules if the VCF has records that aren't just SNPs or indels
2012-05-21 09:38:52 -04:00
Eric Banks
3af3834d50
Fixing 2 bugs in the SAMRecord printing argument descriptor code (as reported by Kristian):
...
* For some reason, the original implementor decided to use Booleans instead of booleans and didn't always check for null so we'd occasionally get a NPE. Switched over to booleans.
* We'd also generate a NPE if SAMRecord writing specific arguments (e.g. --simplifyBAM) were used while writing to sdout.
2012-05-18 11:55:41 -04:00
Eric Banks
26968ae8eb
Forgot that the VCFStreamingOntegrationTest uses VE
2012-05-18 02:51:53 -04:00
Eric Banks
52c206d5db
Has anyone else ever noticed that the DiffEngine outputs were always doubled for some reason? That no longer happens with the new reports.
2012-05-18 02:32:20 -04:00
Eric Banks
03d40272c8
Removed old GATKReport code and moved the new stuff in its place.
2012-05-18 01:44:31 -04:00
Eric Banks
a26b04ba17
Extensive refactoring of the GATKReports. This was a beast.
...
The practical differences between version 1.0 and this one (v1.1) are:
* the underlying data structure now uses arrays instead of hashes, which should drastically reduce the memory overhead required to create large tables.
* no more primary keys; you can still create arbitrary IDs to index into rows, but there is no special cased primary key column in the table.
* no more dangerous/ugly table operations supported except to increment a cell's value (if an int) or to concatenate 2 tables.
Integration tests change because table headers are different.
Old classes are still lying around. Will clean those up in a subsequent commit.
2012-05-18 01:11:26 -04:00
Guillermo del Angel
5189b06468
New annotation for indels that describe if they're STR's and their characteristics. If an indel is a STR, 3 fields are added to INFO: STR (boolean), RU = repeat unit (String), RPA = number of repetitions per allele. So, for example, if ATATAT* context gets changed to ATAT and ATATATAT, then RU=AT and RPA=3,2,4. Will be made standard annotation shortly. Added unit tests for new functionality. Pending: refactor VariantContextUtils.isRepeat() to unify code, and fix VariantEval functionality.
2012-05-17 15:28:19 -04:00
Eric Banks
0f7c917e7a
Better error checking and messages for bad alleles
2012-05-17 13:36:42 -04:00
Eric Banks
d44886d9e8
Very naughty bug: VE output is not at all gatherable but no one told this to Queue. Fixed.
2012-05-15 10:29:04 -04:00
Eric Banks
819c3d0c15
Adding to the Hrun docs
2012-05-15 10:27:52 -04:00
Guillermo del Angel
5fc3adbb04
One more VariantsToTable bug fix
2012-05-14 14:10:07 -04:00
Guillermo del Angel
04d691f04a
Forgot to update MD5's due to new Exact AF model in pool caller (all changes legit, minor QUAL/QD/SB differences). Fixed bug in VariantsToTable from previous commit
2012-05-14 14:01:29 -04:00
Guillermo del Angel
ae26f0fe14
a) Fully functional and working multiallelic exact model for pools. Needs cleanup/more testing. b) Better unit test for pool genotype likelihoods - it now optionally generates actual noisy pileups that can be used for assessing GL accuracy, c) Totally experimental, hidden option in VariantsToTable to output genotype fields. Specifying -GF will output columns of form Sample.FieldName - needs also more testing
2012-05-14 10:55:35 -04:00
Ryan Poplin
c9dd0f3173
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-10 13:09:10 -04:00
Ryan Poplin
0cdadffe14
Committing the best of the frantic pre-CSHL experiments: Better algorithm for partioning reads amongst the alleles they support. Require the read's original alignment to actually overlap the variant. QD uses the non-informative reads when calculating D. More HC-specific annotations for potential use in a statistical filtering strategy. Increasing the minimum kmer length in the assembly graphs. Misc minor bug fixes.
2012-05-10 13:09:03 -04:00
Guillermo del Angel
89f8a6b2e6
Revert bad part of last commit that shouldn't have been pushed
2012-05-10 10:41:08 -04:00
Guillermo del Angel
27b1aa5dd3
Don't allow N's in insertions when discovering indels. Maybe better solution will be to use them as wildcards and merge them with compatible regular insertion alleles but for now it's easier to ignore them. Minor refactoring of Allele.accepableAlleleBases to support this. Added unit test to test consensus allele counter in presence of N's
2012-05-10 10:29:19 -04:00
Eric Banks
4f37d6d399
Fixing docs
2012-05-10 00:56:00 -04:00
Mark DePristo
c81acfc15d
Working implementation of BCF2
...
-- Nearly complete on spec implementation. Slow but clean
-- Some refactoring of VariantContext to support common functions for BCF and VCF
2012-05-08 19:46:51 -04:00
Mark DePristo
a5193c2399
Mostly complete reference implementation of BCF2
...
-- Can run VariantEval on 3000 sample exome VCF and get the same output as the original VCF
2012-05-08 19:46:51 -04:00
Eric Banks
473d07b0c5
fixing up docs from previous Pool Caller commit
2012-05-08 11:02:55 -04:00
Eric Banks
b4999d14c1
updating docs
2012-05-08 10:58:46 -04:00
Guillermo del Angel
33a1dd2048
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-08 10:42:12 -04:00
Eric Banks
5cf4fd63c2
Catch malformed base qualities and throw as a User Error
2012-05-08 09:34:57 -04:00
Guillermo del Angel
a4f4b5007b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-08 09:34:33 -04:00
Guillermo del Angel
605984353f
Pool Caller improvements: a) New non-standard private annotation Heteroplasmy which measures mean heteroplasmy (pool AF) across called samples, meant for easier mtDNA calling. Pure homoplasmic variants (pool AF = 1 or 0) would have heteroplasmy=1. b) Don't output pool genotypes by default for large pool sizes because it makes file sizes explode and they're unreadable. c) Refactored classes ExactACCounts and ExactACSet and moved to superclass AlleleFrequencyCalculationModel because both Pool and Exact AF calculation models will use it. d) Initial refactorings and skeleton for linearized multi-allelic exact model (not done yet). e) Unit test for Pool AF calculation model.
2012-05-08 09:33:38 -04:00
Eric Banks
c40cda7e3c
Nope, loads of integration tests had to be changed.
2012-05-07 14:30:42 -04:00
Eric Banks
66838a073e
Very annoying: we have been emitting an extra TAB in the header of the VCF (which breaks some parsers) for sites-only file. Hopefully not too many integration tests will need to be fixed...
2012-05-07 12:20:11 -04:00
David Roazen
6b769e91d8
BCF2: third checkpoint
...
* writer mostly implemented
* walkers to convert BCF2 <-> VCF
* almost working for sites-only files; genotypes still need work
* initial performance tests this afternoon will be on sites-only files
2012-05-04 13:00:15 -04:00
Eric Banks
f3433201b1
Merged bug fix from Stable into Unstable
2012-05-03 11:11:00 -04:00
Eric Banks
557da77a1a
Don't compute QD if there is no QUAL; added integration test for this
2012-05-03 11:02:37 -04:00
Eric Banks
1fc7b5d58b
Merged bug fix from Stable into Unstable
2012-05-03 10:37:58 -04:00