bthomas
f66ef4626e
Fixing two minor issues: 1) adding a new error message if the user adds a fasta file in a directory that doesn't exist; 2) renaming my sample unit tests so they actually run.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4299 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 20:45:51 +00:00
rpoplin
3a400e3dc0
Added CountCovariates integration test to ensure that it throws an exception if a variant mask isn't provided.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4298 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 19:18:38 +00:00
rpoplin
2eb5d9b2d2
CountCovariates makes sure that it sees a rod type that it expects for use as a variant mask (accepted types are dbsnp, vcf, and bed)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4296 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 18:53:42 +00:00
aaron
de56568ce4
Adding the appropriate DbSNP file to the performance tests so they don't exception out.
...
The exception: "org.broadinstitute.sting.utils.exceptions.UserException$CommandLineException: Invalid command line: This calculation is critically dependent on being able to skip over known variant sites. Please provide a dbSNP ROD or a VCF file containing known sites of genetic variation."
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4293 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 16:30:54 +00:00
aaron
782e0018e4
removal of most of the old GATK ROD system; also a fix for -Dsingle so we can again run just a single unit or integration test (single tests in tribble can be run with the -DsingleTest option now). More to come.
...
*** Three integration tests had to change: ***
RecalibarationWalkersIntegrationTest:
One of the tests was using the interval as the snp track, and wasn't supplying a DbSNP track (for CountCovariates)
SequenomValidationConverterIntegrationTest:
relies on Plink ROD which we've removed.
PileupWalkerIntegrationTest:
we no longer have implicit interval tracks, so there isn't a rod name over the specified region. Otherwise the same result.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4292 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 22:54:49 +00:00
delangel
c604ed9440
Several improvements to new indel genotyper (more to come soon):
...
a) Turns out previous change of centering haplotype around indel was a bad idea. Context to the left of indel is important but not as important as right one, because by definition all alleles start at the same location, so haplotype is the same to the left of indel regardless of allele. So, go back to having a constant size window to the left of event.
b) Expand reference context so we can test larger haplotypes.
c) Optimize computation of read likelihoods by doing them in linear array instead of in a matrix - no difference in biallelic sites but could be significantly faster in multiallelic sites.
d) Bug fix: read alignment wasn't being computed correctly if, a) we were at an insertion, b) read started right at the insertion, c) read CIGAR didn't include insertion - more of these corner conditions are lurking, so a revamped computation of how reads align to candidate haplotypes is in the works.
e) Add debug option not to use prior haplotype likelihoods.
f) Don't hard-code NA12878 for genotyping, now sample name is a required input argument.
g) Bug fix: if there are no reads covering a candidate indel event, just output NO_CALL (didn't notice this in HiSeq, but in P1 data it happens all the time). I need to add a confidence threshold for calling later on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4291 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 21:53:08 +00:00
depristo
fb6d7d19f9
Better window size error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4290 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 20:40:56 +00:00
rpoplin
b5d2e299d2
Make it more clear what is going on with the by-hapmap validation status in the dbSNP rod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4289 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 17:29:31 +00:00
rpoplin
0a06fbdb94
Adding header lines to output of VR walkers to settle validator warnings. Command lines are added to the VCF header. GATK version numbers will be added to the header lines by Matt.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4288 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 16:45:03 +00:00
depristo
41fa323e63
Added iterator for tribble, fixing GS bug report. Removed unnecessary tabix double wrapping. Intergation tests to ensure the BTI works with both vcfs and vcf.gz
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4287 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 16:38:04 +00:00
asivache
d7b5baf8e5
Now uses tagging of -I arguments. Multiple -I options (merging) is now allowed. In somatic mode 'tumor' and 'normal' tags are required for each input bam, the order does not matter anymore (since we use tags!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4286 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 13:58:51 +00:00
bthomas
e5f81d25d4
Adding the --sample-metadata (-SM) command line argument and associated functionality. This is something Matt and I have been working on for a while. Basically, it allows you to integrate sample metadata into an analysis, by including a sample file. More detailed documentation is on the wiki: http://www.broadinstitute.org/gsa/wiki/index.php/Adding_Sample_data_to_an_analysis
...
This commit adds two important classes: Sample, which contains data about one sample; and SampleDataSource, which manages sample data a la ReferenceDataSource and ReadsDataSource.
This code should be stable, but it has not been integrated with existing walkers yet. That's the next commit.
In the meantime, feel free to experiment with the code - there are two basic example walkers in the playground.sample package. And PLEASE let me know if you see any errors/inconsistencies.
Note that this also adds a new dependency on SnakeYaml, a YAML parser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4285 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 11:50:22 +00:00
ebanks
dd23f204ab
Making the UG args that allow users to proceed with insufficient bam headers (no SM or PL tags) @Hidden; removed them from wiki.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4283 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:54:50 +00:00
ebanks
514b28210e
Have VF write to sdout when no -o is supplied
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4282 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:48:33 +00:00
ebanks
1901e3208e
Oops, ran integration tests before Guillermo committed his change to the Beagle code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4281 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:41:02 +00:00
ebanks
4e83ba411f
We now do lazy loading for the genotype data in VCF. Practically, almost all walkers end of loading the genotype data because we need to be smarter about transfering the unparsed genotype string when modifying VariantContexts; however, this does solve the problem for VR's piece to generate clusters (shaved off 75% of runtime for Ryan's large case). That further optimization will happen later.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4279 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 00:18:17 +00:00
depristo
74d4f124b1
Bug fixes to allow us to generate GATKRunReports for very early errors that leave the engine in a corrupt state. Vastly better error handling of common command line problems. Analysis output now notes whether an exception is a a UserException or a StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4278 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 22:45:15 +00:00
delangel
2be5e862f1
forgot to commit change to MD5
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4277 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 19:28:03 +00:00
delangel
6d07181dc9
When processing Beagle output and creating new vcf, output the filtered records in the original input vcf as is, so that we don't lose the information on them when we run Beagle.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4276 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 19:18:45 +00:00
hanna
7fa6b2135b
Added a back door so that integration tests can reset the sequence dictionary
...
in the reference. Reset routine is not accessible to any class outside
GenomeLocParser's package.
We'll have to do something more intelligent with this when the GATK goes
distributed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4275 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 18:58:08 +00:00
depristo
dbb641280e
CycleCovariate now tolerates SOLEXA as machine type. Also, exception handling is now written to stderr.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4274 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 12:35:57 +00:00
ebanks
71d2d69b41
Better error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4273 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 05:04:26 +00:00
fromer
248cc308b2
ReadBackedPhasing silently ignores sites with ploidy != 2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4272 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 21:14:17 +00:00
fromer
528f6344af
Moved ReadBackedPhasingWalker to phasing sub-directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4271 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 19:36:41 +00:00
depristo
fa3be2209f
Improvements to the error display code to print out the SVN number in all messages. Fixes to CallableLoci and tests to check for that case
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4270 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 18:36:45 +00:00
depristo
4d0ff336c2
Missed update input
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4269 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:46:13 +00:00
depristo
7880863eb7
Final step in error refactoring. GATK exception is now ReviewedStingException, indicating that this exception is really what one wants. Only use this exception when you have thought about StingException vs. UserException and made a real decision.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4267 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:07:38 +00:00
depristo
7ad8fbdd5a
Moved GATKException to exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4266 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:47:19 +00:00
depristo
1876c9856a
Moved stingexception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4265 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:39:22 +00:00
depristo
bccebf8899
Newly placed StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4264 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:38:46 +00:00
depristo
3964e02fb6
Newly placed StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4263 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:38:32 +00:00
depristo
595907e98e
Moving StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4262 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:34:15 +00:00
depristo
40e6179911
Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:02:43 +00:00
delangel
da2e879bbc
Miscellaneous improvements to indel genotyper:
...
- Add a simple calculation model for Pr(R|H) that doesn't rely on Dindel's HMM model. MUCH faster, at a cost of slightly worse performance since we're more sensitive to bad reads coming from sequencing artifacts (add -simple to command line to activate).
- Add debug option to calculation model so that we can optionally output useful info on current read being evaluated. (add -debugout to commandline).
- Small performance improvement: instead of evaluating haplotype to the right of indel (just with a 5 base addition to the left), it seems better to center the indel and to add context evenly around event.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4257 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 13:50:28 +00:00
ebanks
61d511f601
Small memory performance improvement: remove the mapping from the hash instead of setting the value to null (i.e. remove the key too)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4256 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 05:19:09 +00:00
ebanks
a0231f073f
Damnit. Enabling the Picard code to recalculate all of the relevant SAMRecord attribute tags means that I need to have reference bases over all read bases even after realignment (and there are some big indels in dbsnp). Fortunately, I have my trusty IndexedFastaSequenceFile reader handy! Re-enabling the previously broken performance test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4255 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 05:06:37 +00:00
hanna
87aca64716
Jumped the gun a bit on bam on-the-fly indexing -- Tim says it's not ready yet.
...
Turned it off by default and added a property to turn it back on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4254 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 21:16:03 +00:00
rpoplin
7b113a4886
Truncate the floating point numbers coming out of the variant recalibration walkers. Integration tests now work with both 1.6.0_16-b01 and 1.6.0_21-b06
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4253 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 18:37:49 +00:00
depristo
8f1a32acae
All exceptions thrown by the GATK have been reviewed and UserErrors replaced where appropriate. Shazam. Another check-in will remove the GATKException and restore the StingException.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4252 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 15:25:30 +00:00
aaron
cf33614ddc
remove the test that's failing the performance tests, please don't release until this is figured out
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4251 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 06:30:40 +00:00
aaron
4adb07683d
all fixed..thanks Matt!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4250 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 06:18:59 +00:00
aaron
bd4bc84abd
comment out the broken aligner test again - I'll take a crack at fixing it tomorrow. Each software engineer is going to take a pass at fixing it, and we'll see who can do it with the most style.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4249 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 05:22:24 +00:00
rpoplin
61e848c4f0
It's clear from Sendu's calling and my own calling that -qScale 100.0 is a much better default value for low pass data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4248 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 01:47:21 +00:00
hanna
e183b6598c
- Fix our private repository of bwa reference support files.
...
- Update the test to point to our repository.
- Update the md5 to reflect new Picard tag ordering.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4247 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 00:29:26 +00:00
depristo
1de713f354
Massive review of maybe 50% of the exceptions in the GATK. GATKException is a tmp. tracker so that I can tell which StingExceptions I've reviewed. Please don't use it. If you are working on new code and are considering throwing exceptions, it's either UserError or StingException, please
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4246 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 23:21:17 +00:00
kshakir
4183e8805a
Fixed reference (via busted symlink) /broad/1KG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4245 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 20:34:51 +00:00
aaron
f5c295b6b2
add a little bit of documentation to the RMD track builder and wrap any exceptions thrown in tribble with the file source and line that caused the error.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4243 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 17:56:36 +00:00
rpoplin
aeb897db7f
VR walkers look at by-hapmap validation status by default. Eric will be updating the syntax to allow for more flexibility here.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4242 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:40:56 +00:00
kshakir
d7f55574e2
Re-enabling aligner integration test now that we're back to having more than 1 or 2GB memory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4241 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:09:48 +00:00
rpoplin
d625186796
I think the VR integration tests are fine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4240 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:00:41 +00:00
depristo
6a30617a60
Initial implementation of UserError exceptions and error message overhaul. UserErrors and their subclasses UserError.MalFormedBam for example should be used when the GATK detects errors on part of the user. The output for errors is now much clearer and hopefully will reduce GS posts. Please start using UserError and its subclasses in your code. I've replace some, but not all, of the StingExceptions in the GATK with UserError where appropriate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4239 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 11:32:20 +00:00
depristo
ca9c7389ee
Not useful
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4238 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 02:33:03 +00:00
depristo
8708753a6a
checkin for removal
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4237 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 02:32:46 +00:00
hanna
5119bdb55e
- Update DoC to support output to /dev/null.
...
- Add a release sanity check for DoC.
- Update release sanity checks with new command-line argument system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4236 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 23:43:18 +00:00
fromer
1b1ec7e52d
Changed default phasing window size to 10
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4235 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 21:28:36 +00:00
fromer
ce031b2f05
PhasingEvaluator prints out interesting sites (only 1 phased, or phases disagree)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4233 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 18:21:21 +00:00
ebanks
40283f6456
Success! TranscriptToGenomicInfo now works without the delicate hacks that Ben had put in.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4232 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 18:06:00 +00:00
ebanks
cd091d7309
This walker can NOT be tree-reducible (in its current state). Given that it's meant to be run just once for any given transcript set, this is not at all a problem.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4231 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 16:47:51 +00:00
ebanks
ae9cba1c73
After an epic battle with this code until 3am last night, I have discovered that it is tragically and fatally busted. Ben clearly didn't understand how the ROD system works when writing it and so it is unusable in its current state. I've ripped out all code and it now gracefully exits telling the user that we are actively working on a replacement for this tool. Sigh.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4230 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 16:39:41 +00:00
ebanks
29f7b1e6d6
Trivial update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4229 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 14:02:38 +00:00
ebanks
cd2bfb09ef
Change for Tim: invalidate the MD tag (temporarily) if it exists in a read that gets realigned
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4228 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 13:59:09 +00:00
ebanks
65edbced36
Addition for Tim: recalculate the NM and UQ tags after realignment. Also, don't fix the insert size calculation, since that's done by fix mate information.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4227 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 04:02:14 +00:00
chartl
71046e650e
Added a more robust check for Jishu -- am pretty sure the .bam header is busticated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4223 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 01:11:22 +00:00
fromer
ae3f7026a4
Corrected phasing quality evaluation to correctly account for hom sites that break phase
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4222 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 22:43:54 +00:00
hanna
501f6a0e14
Temporary hack to disable index creation when target BAM is /dev/null. Tim
...
promises me that Picard will put in a real solution next week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4220 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 16:57:51 +00:00
fromer
754c2c761e
Added minimum phasing quality for phasing evaluation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4219 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 14:29:11 +00:00
ebanks
5d0d9c7dce
My parallel version of TranscriptToInfo now emits 'chr start end' instead of 'chr:start-end' for records so that 1) they can be easily sorted in coordinate order (allowing me to emit records out of order if I choose) and 2) the file can be tabix indexed (when we stop finding 'critical' bugs in that code).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4218 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 05:20:40 +00:00
ebanks
4d4ef5b42c
In the end, it's not worth rewriting TranscriptToInfo from scratch. I'm keeping the old one around for a bit so I can play with this new version which 1. doesn't store the records in memory so can be run in under 1Gb of memory, 2. actually emits all of the records (the original fails in some cases), and 3. is refactored to cut out ~20% of the code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4215 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-06 02:37:34 +00:00
kiran
0dd5a0990d
Now annotates sites marked as filtered out (this is important if sites are in a lower-quality tranche).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4214 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-04 00:36:55 +00:00
delangel
ef7454a241
Minor improvements to indel genotyper:
...
a) Ability to specify haplotype size from command line
b) Expand reference context window so we can form haplotypes for longer indel events.
c) small bug fix in temp output writer (to be removed once I can emit vcfs)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4212 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 22:52:08 +00:00
depristo
7eeabe534a
QSample walker for 1KG -- measures aggregate quality of sequencing. Includes misc. improvements throughtout the code, including using the new Tribble GenotypeLikelihoods class for working with VCF GLs from the UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4211 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 18:21:43 +00:00
rpoplin
e3962c0d13
VR integration tests are longer but much more useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4210 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 15:50:19 +00:00
hanna
da11efa1a2
Automatically write BAM file indices for coordinate-sorted BAMs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4209 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 14:10:44 +00:00
fromer
529eecd4dc
Added phasing sub-directory to keep walkers directory clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4208 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:38:46 +00:00
fromer
c0ce9ca8cc
Added phasing sub-directory to keep walkers directory clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4207 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:32:30 +00:00
rpoplin
60003aeaca
Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4206 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:31:49 +00:00
fromer
c119f64514
Added phasing sub-directory to keep walkers directory clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4205 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:24:18 +00:00
depristo
3c9597d45a
OnTraversalDone writes output to out now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4203 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:55:03 +00:00
depristo
73d41bfa24
CountLoci nows writes out to a file for Queue status tracking. VariantAnnotatorEngine has a special group None that doesn't add any annotations; useful for those who are testing UG performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4202 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:52:33 +00:00
ebanks
b59d62927e
Fix busted performance test (-outputBam has been deprecated in the BQ recalibrator in favor of -o)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4201 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:51:53 +00:00
hanna
70bb480939
The battle is over. Picard is revved.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4200 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 05:28:01 +00:00
ebanks
fdaac4aa78
As the VCF guru, I'll take this one for Andrey. Someone has actually found a deletion at the beginning of the chromosome. Instead of failing with an ArrayIndexOutOfBoundsException, just don't try to print out the record. Our VCF writer doesn't really support this case (yet).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4199 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 03:27:43 +00:00
ebanks
c45ffcdaed
Changing documentation (temporarily) to warn people that -U is not supported.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4198 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 03:18:07 +00:00
delangel
8a7f5aba4b
First more or less sort of functional framework for statistical Indel error caller. Current implementation computes Pr(read|haplotype) based on Dindel's error model. A simple walker that takes an existing vcf, generates haplotypes around calls and computes genotype likelihoods is used to test this as first example. No attempt yet to use prior information on indel AF, nor to use multi-sample caller abilities.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4197 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 00:25:34 +00:00
fromer
a1cf3398a5
Added basic version of phasing evaluation: GenotypePhasingEvaluator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4196 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 22:09:50 +00:00
kshakir
fd5970fdd4
At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them.
...
No longer generating deprecated GATK arguments on the Queue extensions.
Emitting deprecation warnings to Queue compile to help debugging issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:30:48 +00:00
rpoplin
0bb05fb472
Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4194 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:12:09 +00:00
chartl
3a4844ebde
Additional partition types into DepthOfCoverage:
...
- Sequencing Center
- Platform
- Sample by Center
- Sample by Platform
- Sample by Platform by Center <---- needed for analysis I'm doing
The fact that the latter three needed their own partition types, rather than being dictatable from the command line, combined with the new hierarchical traversal types, and new output formatting engine, suggest that DepthOfCoverageV3 is about ready to be retired in favor of a newer, sleeker version.
For now, this will do.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4193 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 19:30:03 +00:00
chartl
590bb50d16
Test for missing read group
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4192 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 14:22:13 +00:00
kiran
acd6bd2430
Experimental tool to annotates indels that are provided in a VCF file based on RefGene. Specifies gene, transcript, strand, type (Non-frameshift, frameshift, 5'-UTR, 3'-UTR, SpliceSiteDisruption, Intron, or Unknown).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4191 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 23:30:28 +00:00
hanna
dc5f858d29
Replaced placeholder support for splitting by read group with read support (sorry everyone), and added relatively comprehensive unit tests to ensure that splitting by read group works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4190 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:24:50 +00:00
rpoplin
b28f63a948
Base recalibrator now uses -o and deprecates -outputBam
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4189 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:13:50 +00:00
kshakir
33400074fa
Updated tribble BED parsing code to use the official UCSC spec, and updated tests to match expected results.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4188 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 21:49:06 +00:00
depristo
995cfe34fe
You can have an error so early that some engine fields are uninitialized. Commit protects RunReport from these errors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4185 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 19:00:25 +00:00
rpoplin
a975db2c2e
Bug fix for the case of reads with no read bases!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4184 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 16:58:54 +00:00
rpoplin
469bbaa240
Added more integration tests for the variant quality score recalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4181 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:31:24 +00:00
depristo
8c4009ee18
Oops, don't enable reporting in integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4179 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 22:56:18 +00:00
rpoplin
5b94c926c8
More precise language.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4178 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:44:22 +00:00
rpoplin
96040726ac
Better exception text for the common error of providing only dbsnp but giving dbsnp sites zero clustering weight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4177 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:36:43 +00:00
depristo
32c6b48106
Proper memory metrics in the file. Please use -et if at all possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4175 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:30:09 +00:00