hanna
8f75d88519
Fix for GATK run report ids:
...
mOVsxGfDiiSMxVs2PPTVjzYTVbizlD6e
f9kUHUADFsZ0LiTGxRL5zPmq9kZcA4cQ
8eGHWJFAlBVmgxwPi3sMd1RmiN2PwHOf
iLhvHWveypKb2F8vKS5irHylc3pYvlOb
HDttXKUMEVoPrvVeWrH7E0htxYyNydMx
plus a bit of cleanup of custom exceptions in the sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4330 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 19:49:25 +00:00
kshakir
20b38b38f3
Updated from SnakeYAML 1.6 to 1.7.
...
Added a pipeline java bean and YAML utility to serialize java beans.
Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format.
Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference.
More changes to come as this code gets tested out in the fullCallingPipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 19:47:49 +00:00
hanna
fb5d595ef0
Disable VCF header output in the Beagle integrationtest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4327 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 16:50:03 +00:00
hanna
0c99c97685
The engine now automatically adds the command-line arguments to the header of every VCF, unless -NO_HEADER is specified.
...
Changed integration tests, adding the -NO_HEADER argument, for walkers that previously did not include the command-line
arg headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4326 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 15:27:58 +00:00
aaron
1af9ca6d45
enabling tests that now pass with the conitg length validation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4325 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 22:20:50 +00:00
aaron
3938d53738
one broken build short of the hat trick. Fixing the unix test which expects the sequence dictionary of the Tribble track to equal the reference; we actually return the sequence dictionary of the track iself, with each contig set to the length of the sequence dictionary contig entry.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4322 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 18:47:20 +00:00
aaron
b968af5db5
The tribble indexes are now updated with correct sequence lengths for each contig they have in their sequence dictionary. Also clean-up in the RMD track builder.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4321 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 18:21:22 +00:00
aaron
2586f0a1ca
fix for the build I broke - the original file got corrupted, which I replaced with a version that didn't have the header stripped off. Other integration tests passed, but this test relied on the header being stripped off.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4320 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 15:35:25 +00:00
rpoplin
547763b230
Better error message for Petr's null pointer exception. Also added an exception integration test because I'm certain this used to work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4319 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 13:44:40 +00:00
ebanks
f5a30d0248
I just spoke to Andrey & Kiran (the original authors of these tools), and they voted to kill these in favor of Picard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4313 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 13:27:35 +00:00
rpoplin
7e58d8ed61
CombineVariants now outputs the command line in the VCF header. Added a new hidden argument to VR walkers called --NoByHapMapValidationStatus to turn off the by-hapmap dbsnp rod behavior. Very useful for experimenting with which sets to use as training data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4307 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-18 16:06:50 +00:00
bthomas
c6c6d32b46
Quickly adding a new convenience method for retreiving a group of samples. The method is getSamples(Collection<String>) and returns a set of sample objects. There's also a test there.
...
Ryan is using this to modify VCF code today...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4303 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 15:55:17 +00:00
ebanks
a10b2a00a5
Moving the util VariantContext 'modifying' routines into VC itself (as opposed to VCUtils) so that we can pass the genotype data directly into it and are no longer forced to decode the genotypes for no reason. This means that any walker that takes in a VCF and modifies the records without touching the genotypes never have to decode them. I've hooked this into the other two Variant Recalibrator walkers for Ryan. One side effect, though, is that we no longer can sort the sample names in the VCF (i.e. if the input VCF doesn't have samples in alphabetical order, then we used to sort them when writing a new VCF but no longer do that), because if we don't decode then we can't re-order the genotypes. I don't think this is a big concern given that the Unified Genotyper does emit sorted samples and that's the main source for most of the VCFs we use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4300 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-17 07:09:58 +00:00
bthomas
f66ef4626e
Fixing two minor issues: 1) adding a new error message if the user adds a fasta file in a directory that doesn't exist; 2) renaming my sample unit tests so they actually run.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4299 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 20:45:51 +00:00
rpoplin
3a400e3dc0
Added CountCovariates integration test to ensure that it throws an exception if a variant mask isn't provided.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4298 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 19:18:38 +00:00
aaron
de56568ce4
Adding the appropriate DbSNP file to the performance tests so they don't exception out.
...
The exception: "org.broadinstitute.sting.utils.exceptions.UserException$CommandLineException: Invalid command line: This calculation is critically dependent on being able to skip over known variant sites. Please provide a dbSNP ROD or a VCF file containing known sites of genetic variation."
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4293 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 16:30:54 +00:00
aaron
782e0018e4
removal of most of the old GATK ROD system; also a fix for -Dsingle so we can again run just a single unit or integration test (single tests in tribble can be run with the -DsingleTest option now). More to come.
...
*** Three integration tests had to change: ***
RecalibarationWalkersIntegrationTest:
One of the tests was using the interval as the snp track, and wasn't supplying a DbSNP track (for CountCovariates)
SequenomValidationConverterIntegrationTest:
relies on Plink ROD which we've removed.
PileupWalkerIntegrationTest:
we no longer have implicit interval tracks, so there isn't a rod name over the specified region. Otherwise the same result.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4292 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 22:54:49 +00:00
rpoplin
0a06fbdb94
Adding header lines to output of VR walkers to settle validator warnings. Command lines are added to the VCF header. GATK version numbers will be added to the header lines by Matt.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4288 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 16:45:03 +00:00
depristo
41fa323e63
Added iterator for tribble, fixing GS bug report. Removed unnecessary tabix double wrapping. Intergation tests to ensure the BTI works with both vcfs and vcf.gz
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4287 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 16:38:04 +00:00
bthomas
e5f81d25d4
Adding the --sample-metadata (-SM) command line argument and associated functionality. This is something Matt and I have been working on for a while. Basically, it allows you to integrate sample metadata into an analysis, by including a sample file. More detailed documentation is on the wiki: http://www.broadinstitute.org/gsa/wiki/index.php/Adding_Sample_data_to_an_analysis
...
This commit adds two important classes: Sample, which contains data about one sample; and SampleDataSource, which manages sample data a la ReferenceDataSource and ReadsDataSource.
This code should be stable, but it has not been integrated with existing walkers yet. That's the next commit.
In the meantime, feel free to experiment with the code - there are two basic example walkers in the playground.sample package. And PLEASE let me know if you see any errors/inconsistencies.
Note that this also adds a new dependency on SnakeYaml, a YAML parser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4285 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 11:50:22 +00:00
ebanks
1901e3208e
Oops, ran integration tests before Guillermo committed his change to the Beagle code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4281 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:41:02 +00:00
ebanks
4e83ba411f
We now do lazy loading for the genotype data in VCF. Practically, almost all walkers end of loading the genotype data because we need to be smarter about transfering the unparsed genotype string when modifying VariantContexts; however, this does solve the problem for VR's piece to generate clusters (shaved off 75% of runtime for Ryan's large case). That further optimization will happen later.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4279 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 00:18:17 +00:00
delangel
2be5e862f1
forgot to commit change to MD5
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4277 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 19:28:03 +00:00
hanna
7fa6b2135b
Added a back door so that integration tests can reset the sequence dictionary
...
in the reference. Reset routine is not accessible to any class outside
GenomeLocParser's package.
We'll have to do something more intelligent with this when the GATK goes
distributed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4275 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 18:58:08 +00:00
depristo
fa3be2209f
Improvements to the error display code to print out the SVN number in all messages. Fixes to CallableLoci and tests to check for that case
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4270 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 18:36:45 +00:00
depristo
4d0ff336c2
Missed update input
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4269 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:46:13 +00:00
depristo
7880863eb7
Final step in error refactoring. GATK exception is now ReviewedStingException, indicating that this exception is really what one wants. Only use this exception when you have thought about StingException vs. UserException and made a real decision.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4267 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:07:38 +00:00
depristo
7ad8fbdd5a
Moved GATKException to exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4266 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:47:19 +00:00
depristo
1876c9856a
Moved stingexception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4265 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:39:22 +00:00
depristo
595907e98e
Moving StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4262 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:34:15 +00:00
depristo
40e6179911
Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:02:43 +00:00
ebanks
a0231f073f
Damnit. Enabling the Picard code to recalculate all of the relevant SAMRecord attribute tags means that I need to have reference bases over all read bases even after realignment (and there are some big indels in dbsnp). Fortunately, I have my trusty IndexedFastaSequenceFile reader handy! Re-enabling the previously broken performance test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4255 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 05:06:37 +00:00
rpoplin
7b113a4886
Truncate the floating point numbers coming out of the variant recalibration walkers. Integration tests now work with both 1.6.0_16-b01 and 1.6.0_21-b06
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4253 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 18:37:49 +00:00
aaron
cf33614ddc
remove the test that's failing the performance tests, please don't release until this is figured out
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4251 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 06:30:40 +00:00
aaron
4adb07683d
all fixed..thanks Matt!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4250 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 06:18:59 +00:00
aaron
bd4bc84abd
comment out the broken aligner test again - I'll take a crack at fixing it tomorrow. Each software engineer is going to take a pass at fixing it, and we'll see who can do it with the most style.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4249 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 05:22:24 +00:00
rpoplin
61e848c4f0
It's clear from Sendu's calling and my own calling that -qScale 100.0 is a much better default value for low pass data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4248 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 01:47:21 +00:00
hanna
e183b6598c
- Fix our private repository of bwa reference support files.
...
- Update the test to point to our repository.
- Update the md5 to reflect new Picard tag ordering.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4247 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 00:29:26 +00:00
kshakir
4183e8805a
Fixed reference (via busted symlink) /broad/1KG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4245 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 20:34:51 +00:00
rpoplin
aeb897db7f
VR walkers look at by-hapmap validation status by default. Eric will be updating the syntax to allow for more flexibility here.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4242 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:40:56 +00:00
kshakir
d7f55574e2
Re-enabling aligner integration test now that we're back to having more than 1 or 2GB memory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4241 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:09:48 +00:00
rpoplin
d625186796
I think the VR integration tests are fine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4240 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:00:41 +00:00
depristo
6a30617a60
Initial implementation of UserError exceptions and error message overhaul. UserErrors and their subclasses UserError.MalFormedBam for example should be used when the GATK detects errors on part of the user. The output for errors is now much clearer and hopefully will reduce GS posts. Please start using UserError and its subclasses in your code. I've replace some, but not all, of the StingExceptions in the GATK with UserError where appropriate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4239 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 11:32:20 +00:00
ebanks
65edbced36
Addition for Tim: recalculate the NM and UQ tags after realignment. Also, don't fix the insert size calculation, since that's done by fix mate information.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4227 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 04:02:14 +00:00
rpoplin
e3962c0d13
VR integration tests are longer but much more useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4210 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 15:50:19 +00:00
ebanks
b59d62927e
Fix busted performance test (-outputBam has been deprecated in the BQ recalibrator in favor of -o)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4201 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:51:53 +00:00
hanna
70bb480939
The battle is over. Picard is revved.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4200 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 05:28:01 +00:00
rpoplin
0bb05fb472
Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4194 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:12:09 +00:00
hanna
dc5f858d29
Replaced placeholder support for splitting by read group with read support (sorry everyone), and added relatively comprehensive unit tests to ensure that splitting by read group works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4190 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:24:50 +00:00
rpoplin
b28f63a948
Base recalibrator now uses -o and deprecates -outputBam
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4189 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:13:50 +00:00
kshakir
33400074fa
Updated tribble BED parsing code to use the official UCSC spec, and updated tests to match expected results.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4188 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 21:49:06 +00:00
rpoplin
469bbaa240
Added more integration tests for the variant quality score recalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4181 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:31:24 +00:00
depristo
8c4009ee18
Oops, don't enable reporting in integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4179 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 22:56:18 +00:00
ebanks
3d6c4fc55f
Removing the obsolete --hapmap and --hapmap_chip options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4172 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:57:05 +00:00
depristo
b33873206a
GATKRunReport now has an ID (random 32 char string) that uniquely identifies the JOB run and can be used to find a run in the run repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4171 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:18:57 +00:00
depristo
3fd2392090
Improved interface to getting command line options. Now fully traverses all objects to get all internal argument collections. Preliminary (but disabled version) of phoning home (see -et argument for more information). Captures correct and erroring out runs and writes out gzipped, xml report with lots of useful information. Needs a bit more information but is approximately working. Reports going to /humgen/gsa-hpprojects/GATK/reports/ in submitted directory that will be collated by some external tool. Only operating if -et STANDARD or -et STDOUT are provided currently and REPORT_DIR contains a file called ENABLE. WalkerTest now adds -et NO_ET to tests to avoid populating the reports with tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4155 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-28 22:53:32 +00:00
rpoplin
9c3f403307
Add the calculated lod value to the info field of each recalibrated VCF record.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4153 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 21:33:58 +00:00
ebanks
bfcac33e80
Cleaning up playground utils and tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:25:47 +00:00
hanna
d773b3264b
Eliminated -mrl option.
...
Eliminated -fmq0 option.
Eliminated read group hallucination.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4133 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 21:38:03 +00:00
ebanks
dfae48cee0
Moving supported tools to core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 13:56:19 +00:00
ebanks
45d895dcf4
Remove the check in the Unified Genotyper for hitting the max reads at locus value. Instead, simply add a flag to the INFO field if any of the samples has been downsampled. 95% hooked up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4126 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:50:47 +00:00
ebanks
79cd716671
More cleanup of the Genomic Annotator. Also, we now require join tables to have unique entries for the column keyed on the join.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4124 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 04:43:52 +00:00
ebanks
dd7f136298
Office-mate courtesy: fixing Andrey's busted integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4123 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 02:00:06 +00:00
ebanks
4678613893
Significant fixes for the Genomic Annotator.
...
1. Rip out all of Ben's code intended to circumvent the stable VCF Writer output system in multi-threaded mode (I threw up a little when
I saw this code). This will improve memory consumption when running with -nt.
2. Don't annotate indels or > bi-allelic sites.
3. Fix bug where not all records were making it into the output VCF.
4. General code clean up.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4118 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:16:50 +00:00
rpoplin
5623e01602
GenerateVariantClusters and VariantRecalibrator now uses hapmap and 1kg ROD bindings (in addition to dbsnp) to distinguish between knowns and novels. It no longer looks at by-hapmap validation status so providing hapmap is highly recommended. Example on the wiki. Input variants tracks now must start with input.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4113 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:33:40 +00:00
hanna
bf0b6bd486
Update integration tests to use the new ROD syntax.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4112 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:13:30 +00:00
hanna
3dc78855fd
Command-line argument tagging is in, and the ROD system is hacked slightly to support the new syntax
...
(-B:name,type file) as well as the old syntax. Also, a bonus feature: BAMs can now be tagged at the
command-line, which should allow us to get rid of some of the hackier calls in GenomeAnalysisEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4105 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 03:47:57 +00:00
rpoplin
85007ffa87
Some clean up for the variant recalibrator. Now uses @Input and @Output so that it can join the Queue party. Users now specify a -o, -clusterFile, -tranchesFile, and -reportDatFile. Example on the wiki. ApplyVariantCuts now has an integration test. Base quality recalibrator now requires a dbsnp rod or vcf file. Now that the base quality recalibrator is using @Output the PrintStream shouldn't be closed in OnTraversalDone.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4101 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:14:58 +00:00
ebanks
90aef66ec5
Minor fixes for my last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:25:29 +00:00
ebanks
ccda4f6ec1
More output consistency changes (updating wiki docs as I go along).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:46:08 +00:00
ebanks
c9c6ff49c2
Deprecated 'O' in favor of 'o' in the cleaner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4085 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:09:24 +00:00
aaron
2d3b6d89dc
adding the ability in Tribble to create indexes from a stream of features, so that we can create multiple indexes from one pass of the file. In the GATK we now create multiple indexes, and choose the
...
most appropriate based on feature density, and the longest feature in the file. Also:
- Converted Tribble to TestNG; it has better features and is about 6x faster.
- As much code clean-up as I could get done. More to do, especially in the example code.
- Moved asserts in the code to throw exceptions.
- Added getBinSize to the index interface; both indexes already implemented this.
- Removed the abstract parts of the indexCreator interface; this is now more simple.
- Added an IndexType enumeration; might be overkill but it is at least a single point of entry for index information.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4082 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 06:54:59 +00:00
hanna
8252494fa9
Forgot to update UG performance test to reflect the new -o argument.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4079 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 00:57:16 +00:00
hanna
c177801d81
Add deprecated command-line arguments, and switched over UG to output to
...
-o/--out instead of -varout. Let's watch as our intrepid support engineer
gracefully responds to all the incoming questions of the form: "the GATK told
me to use -o instead of -varout. What do I do?"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4078 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 21:01:44 +00:00
hanna
b80cf7d1d9
Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 14:27:05 +00:00
kiran
121b4f23b6
Simple change to allow a list of samples or regular expressions to be provided in a text file (one line per sample).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4074 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 00:01:48 +00:00
aaron
fa36731faf
fixes for VariantEval integration tests affected by the spaces to underscores change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4070 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 22:43:20 +00:00
ebanks
1ec305cd15
Fix for running the cleaner at the lane-level for known indels only: instead of relying on the reads to get the reference sequence, we now use an IndexedFastaSequenceFile in all cases and pad the reference with bases on either end. This allows us to deal with cases in which we are trying to clean just a single deletion-containing read with tiny LOD (so the read needs to be pushed off the seen reference; @Reference doesn't yet work for Read Walkers) and has the added benefit of allowing us now to get much larger known indels that aren't completely covered with reads.
...
Thanks to Matt for the advice.
Also, for Guillermo: while I was at it, I changed the .stats debug output to emit the original interval instead of the cleaned region.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4058 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 11:31:13 +00:00
rpoplin
8f15b2ba72
Memory optimization for the VariantRecalibrator. Only add variants to the list if they pass the novelty and qual filters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4051 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 21:57:28 +00:00
aaron
e632d9b83d
remove some dependencies on out of date methods from the tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4047 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 00:07:26 +00:00
aaron
c1df293feb
remove testing code from tribble track builder, set the command line program in walker test to null to reclaim memory in integration tests, and removed some orphaned intergration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4046 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 23:52:01 +00:00
rpoplin
578e7fa36d
Don't output -0 as qual value in VariantRecalibrator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4044 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 16:47:58 +00:00
aaron
cc58a27b00
fix for broken unit test; make sure when we can't get an index off of disk, the internal method returns null
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4040 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:12:32 +00:00
kshakir
4710015c17
Disabled AlignerIntegrationTest while addressing build machine memory issues.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4033 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 01:23:21 +00:00
hanna
cb144734c0
Getting rid of GenotypeWriter interface. Of note:
...
- GATKVCFWriter deleted, to be replaced if absolutely necessary when VCF writing goes into Tribble.
- VCFWriter is now an interface, for easier redirection.
- VCFWriterImpl fleshes out the VCFWriter interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4026 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 16:33:22 +00:00
ebanks
71c4d3f33d
Moving pointer to b36 reference from /broad/1KG to /humgen/1kg
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4021 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 00:54:34 +00:00
kshakir
f39dce1082
Exposed CommandLineFunction defaults to the Queue.jar command line (see -help).
...
Added ability to skip up-to-date jobs where the outputs are older than the inputs.
Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names.
Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile
Moved Hidden from the GATK to StingUtils.
Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7
Added Queue to javadoc and testing build targets.
Added first Queue unit test.
Another pass at avoiding cycles in the DAG thanks to all function I/O being files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 21:58:26 +00:00
hanna
41d57b7139
Massive cleanup of read filtering.
...
- Eliminate reduncancy of filter application.
- Track filter metrics per-shard to facitate per merging.
- Flatten counting iterator hierarchy for easier debugging.
- Rename Reads class to ReadProperties and track it outside of the Sting iterators.
Note: because shards are currently tied so closely to reads and not the merged triplet of <reads,ref,RODs>, the metrics
classes are managed by the SAMDataSource when they should be managed by something more general. For now, we're hacking
the reads data source to manage the metrics; in the future, something more general should manage the metrics classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4015 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:17:11 +00:00
aaron
0a8ebcb4f9
moving tests over from the GATK to Tribble, and added a speed-up to the readNextRecord() that Mark suggested. Also removed the contained flag from the queries to Tribble in the GATK.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4003 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 17:54:59 +00:00
ebanks
3ff6e3404e
Alleles are now returned in a consistent order, so we can deal with tri-allelic sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4002 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 15:21:10 +00:00
aaron
d514c424fd
adding tests for BTI in the ROD validation tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3997 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 06:05:40 +00:00
ebanks
ca5b274f16
Unit, integration, and performance tests are all busted, so this is a good time to make a big commit...
...
Major cleanup of the genotype writer code from the calling end. UG no longer supports making calls in anything but VCF, and that allows us to use the VCFWriter more generically now. Putting the ball in Matt's court to finish collapsing everything.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3996 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 04:18:29 +00:00
aaron
0f78f70ed4
fix for feature source in Tribble; we need to check that the record coming back isn't null. Also in the GATK added code to set the default logging level in integration tests to WARN, with the default level change they were spewing a bunch of text.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3995 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 02:57:23 +00:00
ebanks
419a36f74c
Starting the clean up of the sting.utils.genotype code which is all either moving to Tribble, moving to sting.utils.vcf, or being removed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3994 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 02:16:05 +00:00
aaron
0f29f2ae3f
fixes for the Tree index, and some small clean-up in the GATK.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3991 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:50 +00:00
rpoplin
3eee3183fd
Checking in the tiger team changes. LOD calculation modified. -qScale is back in case people need it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3990 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:03 +00:00
aaron
30178c05c5
providing a way to specify how you'd like -BTI combined with your -L options; set BTIMR to either UNION (default) or INTERSECTION.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3983 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 14:00:52 +00:00
kiran
e242a8f143
Put single quotes around the regex. This isn't strictly necessary through the integration test machinery, but *is* necessary at the console, and it's convenient to be able to cut and paste this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3977 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:56:57 +00:00
kiran
13f29660bb
Integration test for SelectVariants. Tests a complex case with an explicit sample selection, sample selection by regex, exclusion of non-variant and filtered loci, and JEXL selection on low allele-frequency variants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3976 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:49:47 +00:00
ebanks
bd6d5a8d51
Adding command-line header to VA and VF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3974 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:21:15 +00:00