kshakir
618c69f8dc
More updates to the CleanBamFile pipeline.
...
Added the a CommandLineFunction.jobDependencies that will explicitly force a function to wait for a file, even if the value isn't otherwise listed on an @Input.
More bug fixes and refactoring of functions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4048 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 14:59:42 +00:00
aaron
e632d9b83d
remove some dependencies on out of date methods from the tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4047 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 00:07:26 +00:00
aaron
c1df293feb
remove testing code from tribble track builder, set the command line program in walker test to null to reclaim memory in integration tests, and removed some orphaned intergration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4046 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 23:52:01 +00:00
chartl
3a4977c75e
Re-add the 1KG trigger as a comp as well
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4045 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 18:19:47 +00:00
rpoplin
578e7fa36d
Don't output -0 as qual value in VariantRecalibrator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4044 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 16:47:58 +00:00
kiran
3d63302b70
Deprecated. Use SelectVariants instead.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4043 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 15:07:50 +00:00
depristo
20db00a3e8
Lazy reference loading; the engine doesn't fetch the reference bases until you actually call ref.getBases(). With the new hidden --dontUpdateUG to table recalibrator this is 2-3x faster than before. Enabled for locus, read, and rod walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4042 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:46:22 +00:00
aaron
9ab647b730
adding checks to the RefSeq rod for line's that contain less than the required number of columns (we expect there to be 16 columns)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4041 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:34:32 +00:00
aaron
cc58a27b00
fix for broken unit test; make sure when we can't get an index off of disk, the internal method returns null
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4040 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:12:32 +00:00
aaron
b23545fafa
re-enable the check for up-to-date versions in the Tribble index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4039 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 12:47:58 +00:00
ebanks
37586d3a43
Don't exception out when bad aligners emit wonky alignments; instead, just don't clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4038 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 02:36:04 +00:00
depristo
a36951f11a
@output and @input arguments for table recalibration for use with Q
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4037 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 18:36:28 +00:00
depristo
61064d7075
GenotypeConcordance log file -- if provided, GC module will write FN/FP information to this file by context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4036 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 18:35:57 +00:00
depristo
0d209d5442
Nicer printing out of clustering
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4035 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 16:02:13 +00:00
depristo
c85ab9db37
functional recalibrate script
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4034 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 16:01:37 +00:00
kshakir
4710015c17
Disabled AlignerIntegrationTest while addressing build machine memory issues.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4033 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 01:23:21 +00:00
kshakir
307c8ca027
Created a new playground script for cleaning bams in Firehose.
...
Some refactoring of Queue extensions for reusability in scripts.
Putting the extensions into the Queue.jar after building them.
More updates to GATK walker arguments specifying @Input and @Output for Queue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4032 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 23:52:24 +00:00
fromer
dfe2922b5e
First working version of statistical haplotype phaser
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4031 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 21:29:45 +00:00
ebanks
f36c0ed613
Stop building obsolete VCFTools and CGUtilities
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4030 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 19:28:36 +00:00
kshakir
8e46d5de04
Printing to INFO where to find the job output files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4029 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 19:26:53 +00:00
rpoplin
222f61df87
Bug fix for damoskow in TableRecalibration. Shouldn't try to update the reference mismatch rate tag for an unmapped read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4028 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 18:57:07 +00:00
kshakir
80a70ccf03
Repopulating rodsToSamples. Code reviewed by Eric.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4027 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 17:07:18 +00:00
hanna
cb144734c0
Getting rid of GenotypeWriter interface. Of note:
...
- GATKVCFWriter deleted, to be replaced if absolutely necessary when VCF writing goes into Tribble.
- VCFWriter is now an interface, for easier redirection.
- VCFWriterImpl fleshes out the VCFWriter interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4026 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 16:33:22 +00:00
kshakir
542d394e09
Cleaning up Queue debugging output.
...
-l DEBUG with local programs now prints out the stdout/stderr of the programs as they are run.
More documentation in the examples with a new even simpler CountReads example.
Took out unused option to build Queue GATK extensions separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4025 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:54:08 +00:00
chartl
49a3db9dfe
A brief implementation of a QD calculation that is not quite so bimodal for known variants (multiplicatively penalizes QD by (n variant samples)/(n variant alleles) ). Not sure how helpful this will be (which is why it is in oneoffs). Seems nice on MCKD1, but I'm still playing with the optimization.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4024 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:42:37 +00:00
chartl
c6a8fba922
Occasionally if a JEXL expression results in no variants being captured (like "QD > 20.0" on filtered variants) the per-sample mapping from samples to eval objects can be empty. This semi-hacky fix prevents null pointer exceptions in setting up the resulting empty table (by jumping straight to it in this case)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4023 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:37:45 +00:00
ebanks
f874e548aa
Shame on us. FlagStat used ints instead of longs, so we ended up getting negative read counts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4022 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 03:00:57 +00:00
ebanks
71c4d3f33d
Moving pointer to b36 reference from /broad/1KG to /humgen/1kg
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4021 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 00:54:34 +00:00
kshakir
162febdef8
Added Queue packages, which must be run with 'ant queue package'.
...
To assist with the above no longer removing jars during a new build, so 'ant queue dist' will still have the Queue.jar.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4020 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-12 17:45:53 +00:00
kshakir
25a23218c6
Trying a build server fix via google: only running the ivy taskdef once.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4019 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-12 05:10:26 +00:00
kshakir
cd5d42618f
Killing old version of ivy jar.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4018 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-12 01:32:03 +00:00
kshakir
f39dce1082
Exposed CommandLineFunction defaults to the Queue.jar command line (see -help).
...
Added ability to skip up-to-date jobs where the outputs are older than the inputs.
Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names.
Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile
Moved Hidden from the GATK to StingUtils.
Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7
Added Queue to javadoc and testing build targets.
Added first Queue unit test.
Another pass at avoiding cycles in the DAG thanks to all function I/O being files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 21:58:26 +00:00
chartl
8c08f47923
1) Make sure that the table size is set correctly in finalize()
...
2) Make sure variants are biallelic before asking for isTransversion()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4016 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:32:22 +00:00
hanna
41d57b7139
Massive cleanup of read filtering.
...
- Eliminate reduncancy of filter application.
- Track filter metrics per-shard to facitate per merging.
- Flatten counting iterator hierarchy for easier debugging.
- Rename Reads class to ReadProperties and track it outside of the Sting iterators.
Note: because shards are currently tied so closely to reads and not the merged triplet of <reads,ref,RODs>, the metrics
classes are managed by the SAMDataSource when they should be managed by something more general. For now, we're hacking
the reads data source to manage the metrics; in the future, something more general should manage the metrics classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4015 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:17:11 +00:00
ebanks
86bd55408e
no INFO output now that it's the default
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4014 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 18:56:28 +00:00
ebanks
7385cce494
Useful tool for calculating the perentage of misaligned reads at homozygous non-ref indel sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4013 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:57:44 +00:00
ebanks
cc9e6b4ad9
Moved into Tribble to be with VC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4012 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:14:32 +00:00
aaron
14e492fa80
fix for a problem in readNextRecord() of BFS, where we'd go looking for the next record far into in the next contig because (f.getEnd() >= start) was never true once we cycled to a new conitg. Added a check for contig identity. Also, removed duplicate HW calculation classes in the GATK and Tribble.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4011 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:01:38 +00:00
depristo
e0abb73fd7
plot now assumes 1 / 1000 is the min error rate, not 1/100
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4010 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 14:48:22 +00:00
kiran
6037443e55
Handle interactive and non-interactive modes more elegantly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4009 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 02:38:53 +00:00
kiran
a7409df1a6
Be more robust to missing or empty files in VariantEval output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4008 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 02:22:50 +00:00
kiran
23b5d71e76
A quickly hacked together replacement for AnnotateVCFwithMAF.py, which doesn't work anymore with Cancer's updated annotator. Takes an annotated MAF file and imports the annotations into the VCF file. For the MAF annotator's DNP and TNP annotations (which I think are likely to not be correct, given the lack of phasing information or even proper association to the same sample), just propagate the annotation from the previous annotated variant to which the multinucleotide polymorphism was associated.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4007 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 00:20:08 +00:00
flannick
cd4cd6db81
Added option to print out discordant sites in GenotypeConcordance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4006 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:55:19 +00:00
flannick
18fc5c8c3e
Initial implementation of annotator to compute allele balance for each sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4005 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:40:17 +00:00
flannick
1dc373b9d0
Initial implementation of evaluator to compute popgen theta statistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4004 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:36:34 +00:00
aaron
0a8ebcb4f9
moving tests over from the GATK to Tribble, and added a speed-up to the readNextRecord() that Mark suggested. Also removed the contained flag from the queries to Tribble in the GATK.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4003 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 17:54:59 +00:00
ebanks
3ff6e3404e
Alleles are now returned in a consistent order, so we can deal with tri-allelic sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4002 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 15:21:10 +00:00
depristo
67063deb16
Removed coloring by mixture weight. Each cluster gets a distinct color, and the legend indicates which cluster has which id and its weight
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4001 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 14:28:24 +00:00
depristo
672bee295c
now plots tranches separately from optimizer
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4000 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:02:52 +00:00
depristo
cd2d051209
full path to Rscript
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3999 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:02:38 +00:00