hanna
fb5d595ef0
Disable VCF header output in the Beagle integrationtest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4327 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 16:50:03 +00:00
hanna
0c99c97685
The engine now automatically adds the command-line arguments to the header of every VCF, unless -NO_HEADER is specified.
...
Changed integration tests, adding the -NO_HEADER argument, for walkers that previously did not include the command-line
arg headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4326 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 15:27:58 +00:00
aaron
1af9ca6d45
enabling tests that now pass with the conitg length validation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4325 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 22:20:50 +00:00
depristo
522830fb01
Support for --assume-single-sample in UG, better malformated bam exceptions, and ignoring out of order contigs in seqdictutils. All for the CG bam file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4323 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 20:33:34 +00:00
aaron
3938d53738
one broken build short of the hat trick. Fixing the unix test which expects the sequence dictionary of the Tribble track to equal the reference; we actually return the sequence dictionary of the track iself, with each contig set to the length of the sequence dictionary contig entry.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4322 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 18:47:20 +00:00
aaron
b968af5db5
The tribble indexes are now updated with correct sequence lengths for each contig they have in their sequence dictionary. Also clean-up in the RMD track builder.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4321 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 18:21:22 +00:00
aaron
2586f0a1ca
fix for the build I broke - the original file got corrupted, which I replaced with a version that didn't have the header stripped off. Other integration tests passed, but this test relied on the header being stripped off.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4320 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 15:35:25 +00:00
rpoplin
547763b230
Better error message for Petr's null pointer exception. Also added an exception integration test because I'm certain this used to work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4319 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 13:44:40 +00:00
depristo
8719dde59d
Now prints out PASS when a variant is unfiltered
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4318 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 13:16:41 +00:00
delangel
205fc0b636
Cleanup: Use Tribble's version of createVariantContextWithPaddedAlleles (no real functional difference) to avoid duplicated code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4315 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 19:53:30 +00:00
delangel
a10cfe213b
Small bug fix in simple indel genotyper: Likelihood of case where best haplotype pair was (REF,REF) was not computed correctly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4314 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 17:04:39 +00:00
ebanks
f5a30d0248
I just spoke to Andrey & Kiran (the original authors of these tools), and they voted to kill these in favor of Picard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4313 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 13:27:35 +00:00
delangel
f64b6fddc1
Major changes/improvements to indel genotyper:
...
a) Redid way to compute path metrics in indel error model. Paper formulation where we have an anchor point in the alignemt between read and haplotype won't work in practice except in nice data sets that are perfectly indel-realigned and that are well mapped by aligner. New formulation doesn't assume this, and it's actually simpler and uses less code. It now resembles more a classic SW dynamic programming formulation but it still preserves the HMM probabilistic formulation.
b) Added a programmable call threshold, set by command line.
c) Use now sample name from BAM file, remove -sampleName argument.
d) Simplify loop to compute read-haplotype likelihoods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4311 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-19 23:47:31 +00:00
rpoplin
c6351a11d6
Clearer logger output when not using by-hapmap
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4308 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-18 16:10:42 +00:00
rpoplin
7e58d8ed61
CombineVariants now outputs the command line in the VCF header. Added a new hidden argument to VR walkers called --NoByHapMapValidationStatus to turn off the by-hapmap dbsnp rod behavior. Very useful for experimenting with which sets to use as training data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4307 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-18 16:06:50 +00:00
kshakir
a3f31e5df0
When QScript writers use the RodBind, then the File version of the same argument should be optional, i.e. should not always try to output the file, which when unpopulated will be null.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4305 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 18:22:07 +00:00
bthomas
c6c6d32b46
Quickly adding a new convenience method for retreiving a group of samples. The method is getSamples(Collection<String>) and returns a set of sample objects. There's also a test there.
...
Ryan is using this to modify VCF code today...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4303 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 15:55:17 +00:00
kshakir
a898908918
The output BAM file optional arguments of compression and whether to write an index are not outputs themselves.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4302 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 15:35:54 +00:00
bthomas
bc12055fcf
Quick patch to fix the sample code. It wasn't actually initializing the sample data source, so I added a call to initializeSampleDataSource() in GenomeAnalysisEngine. I think there was just an error resolving the versions of GenomeAnalysisEngine
...
Also added a new error message that I thought would be helpful...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4301 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 14:05:26 +00:00
ebanks
a10b2a00a5
Moving the util VariantContext 'modifying' routines into VC itself (as opposed to VCUtils) so that we can pass the genotype data directly into it and are no longer forced to decode the genotypes for no reason. This means that any walker that takes in a VCF and modifies the records without touching the genotypes never have to decode them. I've hooked this into the other two Variant Recalibrator walkers for Ryan. One side effect, though, is that we no longer can sort the sample names in the VCF (i.e. if the input VCF doesn't have samples in alphabetical order, then we used to sort them when writing a new VCF but no longer do that), because if we don't decode then we can't re-order the genotypes. I don't think this is a big concern given that the Unified Genotyper does emit sorted samples and that's the main source for most of the VCFs we use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4300 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 07:09:58 +00:00
bthomas
f66ef4626e
Fixing two minor issues: 1) adding a new error message if the user adds a fasta file in a directory that doesn't exist; 2) renaming my sample unit tests so they actually run.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4299 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 20:45:51 +00:00
rpoplin
3a400e3dc0
Added CountCovariates integration test to ensure that it throws an exception if a variant mask isn't provided.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4298 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 19:18:38 +00:00
rpoplin
2eb5d9b2d2
CountCovariates makes sure that it sees a rod type that it expects for use as a variant mask (accepted types are dbsnp, vcf, and bed)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4296 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 18:53:42 +00:00
aaron
de56568ce4
Adding the appropriate DbSNP file to the performance tests so they don't exception out.
...
The exception: "org.broadinstitute.sting.utils.exceptions.UserException$CommandLineException: Invalid command line: This calculation is critically dependent on being able to skip over known variant sites. Please provide a dbSNP ROD or a VCF file containing known sites of genetic variation."
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4293 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 16:30:54 +00:00
aaron
782e0018e4
removal of most of the old GATK ROD system; also a fix for -Dsingle so we can again run just a single unit or integration test (single tests in tribble can be run with the -DsingleTest option now). More to come.
...
*** Three integration tests had to change: ***
RecalibarationWalkersIntegrationTest:
One of the tests was using the interval as the snp track, and wasn't supplying a DbSNP track (for CountCovariates)
SequenomValidationConverterIntegrationTest:
relies on Plink ROD which we've removed.
PileupWalkerIntegrationTest:
we no longer have implicit interval tracks, so there isn't a rod name over the specified region. Otherwise the same result.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4292 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 22:54:49 +00:00
delangel
c604ed9440
Several improvements to new indel genotyper (more to come soon):
...
a) Turns out previous change of centering haplotype around indel was a bad idea. Context to the left of indel is important but not as important as right one, because by definition all alleles start at the same location, so haplotype is the same to the left of indel regardless of allele. So, go back to having a constant size window to the left of event.
b) Expand reference context so we can test larger haplotypes.
c) Optimize computation of read likelihoods by doing them in linear array instead of in a matrix - no difference in biallelic sites but could be significantly faster in multiallelic sites.
d) Bug fix: read alignment wasn't being computed correctly if, a) we were at an insertion, b) read started right at the insertion, c) read CIGAR didn't include insertion - more of these corner conditions are lurking, so a revamped computation of how reads align to candidate haplotypes is in the works.
e) Add debug option not to use prior haplotype likelihoods.
f) Don't hard-code NA12878 for genotyping, now sample name is a required input argument.
g) Bug fix: if there are no reads covering a candidate indel event, just output NO_CALL (didn't notice this in HiSeq, but in P1 data it happens all the time). I need to add a confidence threshold for calling later on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4291 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 21:53:08 +00:00
depristo
fb6d7d19f9
Better window size error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4290 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 20:40:56 +00:00
rpoplin
b5d2e299d2
Make it more clear what is going on with the by-hapmap validation status in the dbSNP rod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4289 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 17:29:31 +00:00
rpoplin
0a06fbdb94
Adding header lines to output of VR walkers to settle validator warnings. Command lines are added to the VCF header. GATK version numbers will be added to the header lines by Matt.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4288 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 16:45:03 +00:00
depristo
41fa323e63
Added iterator for tribble, fixing GS bug report. Removed unnecessary tabix double wrapping. Intergation tests to ensure the BTI works with both vcfs and vcf.gz
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4287 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 16:38:04 +00:00
asivache
d7b5baf8e5
Now uses tagging of -I arguments. Multiple -I options (merging) is now allowed. In somatic mode 'tumor' and 'normal' tags are required for each input bam, the order does not matter anymore (since we use tags!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4286 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 13:58:51 +00:00
bthomas
e5f81d25d4
Adding the --sample-metadata (-SM) command line argument and associated functionality. This is something Matt and I have been working on for a while. Basically, it allows you to integrate sample metadata into an analysis, by including a sample file. More detailed documentation is on the wiki: http://www.broadinstitute.org/gsa/wiki/index.php/Adding_Sample_data_to_an_analysis
...
This commit adds two important classes: Sample, which contains data about one sample; and SampleDataSource, which manages sample data a la ReferenceDataSource and ReadsDataSource.
This code should be stable, but it has not been integrated with existing walkers yet. That's the next commit.
In the meantime, feel free to experiment with the code - there are two basic example walkers in the playground.sample package. And PLEASE let me know if you see any errors/inconsistencies.
Note that this also adds a new dependency on SnakeYaml, a YAML parser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4285 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 11:50:22 +00:00
ebanks
dd23f204ab
Making the UG args that allow users to proceed with insufficient bam headers (no SM or PL tags) @Hidden; removed them from wiki.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4283 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:54:50 +00:00
ebanks
514b28210e
Have VF write to sdout when no -o is supplied
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4282 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:48:33 +00:00
ebanks
1901e3208e
Oops, ran integration tests before Guillermo committed his change to the Beagle code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4281 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:41:02 +00:00
ebanks
4e83ba411f
We now do lazy loading for the genotype data in VCF. Practically, almost all walkers end of loading the genotype data because we need to be smarter about transfering the unparsed genotype string when modifying VariantContexts; however, this does solve the problem for VR's piece to generate clusters (shaved off 75% of runtime for Ryan's large case). That further optimization will happen later.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4279 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 00:18:17 +00:00
depristo
74d4f124b1
Bug fixes to allow us to generate GATKRunReports for very early errors that leave the engine in a corrupt state. Vastly better error handling of common command line problems. Analysis output now notes whether an exception is a a UserException or a StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4278 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 22:45:15 +00:00
delangel
2be5e862f1
forgot to commit change to MD5
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4277 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 19:28:03 +00:00
delangel
6d07181dc9
When processing Beagle output and creating new vcf, output the filtered records in the original input vcf as is, so that we don't lose the information on them when we run Beagle.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4276 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 19:18:45 +00:00
hanna
7fa6b2135b
Added a back door so that integration tests can reset the sequence dictionary
...
in the reference. Reset routine is not accessible to any class outside
GenomeLocParser's package.
We'll have to do something more intelligent with this when the GATK goes
distributed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4275 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 18:58:08 +00:00
depristo
dbb641280e
CycleCovariate now tolerates SOLEXA as machine type. Also, exception handling is now written to stderr.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4274 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 12:35:57 +00:00
ebanks
71d2d69b41
Better error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4273 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 05:04:26 +00:00
fromer
248cc308b2
ReadBackedPhasing silently ignores sites with ploidy != 2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4272 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 21:14:17 +00:00
fromer
528f6344af
Moved ReadBackedPhasingWalker to phasing sub-directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4271 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 19:36:41 +00:00
depristo
fa3be2209f
Improvements to the error display code to print out the SVN number in all messages. Fixes to CallableLoci and tests to check for that case
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4270 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 18:36:45 +00:00
depristo
4d0ff336c2
Missed update input
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4269 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:46:13 +00:00
depristo
7880863eb7
Final step in error refactoring. GATK exception is now ReviewedStingException, indicating that this exception is really what one wants. Only use this exception when you have thought about StingException vs. UserException and made a real decision.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4267 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:07:38 +00:00
depristo
7ad8fbdd5a
Moved GATKException to exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4266 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:47:19 +00:00
depristo
1876c9856a
Moved stingexception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4265 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:39:22 +00:00
depristo
bccebf8899
Newly placed StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4264 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:38:46 +00:00