Commit Graph

4548 Commits (cbce3e3c83c72a8c7dff7b8fec00f00a2b419e83)

Author SHA1 Message Date
depristo c85ab9db37 functional recalibrate script
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4034 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 16:01:37 +00:00
kshakir 4710015c17 Disabled AlignerIntegrationTest while addressing build machine memory issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4033 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 01:23:21 +00:00
kshakir 307c8ca027 Created a new playground script for cleaning bams in Firehose.
Some refactoring of Queue extensions for reusability in scripts.
Putting the extensions into the Queue.jar after building them.
More updates to GATK walker arguments specifying @Input and @Output for Queue.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4032 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 23:52:24 +00:00
fromer dfe2922b5e First working version of statistical haplotype phaser
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4031 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 21:29:45 +00:00
ebanks f36c0ed613 Stop building obsolete VCFTools and CGUtilities
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4030 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 19:28:36 +00:00
kshakir 8e46d5de04 Printing to INFO where to find the job output files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4029 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 19:26:53 +00:00
rpoplin 222f61df87 Bug fix for damoskow in TableRecalibration. Shouldn't try to update the reference mismatch rate tag for an unmapped read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4028 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 18:57:07 +00:00
kshakir 80a70ccf03 Repopulating rodsToSamples. Code reviewed by Eric.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4027 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 17:07:18 +00:00
hanna cb144734c0 Getting rid of GenotypeWriter interface. Of note:
- GATKVCFWriter deleted, to be replaced if absolutely necessary when VCF writing goes into Tribble.
- VCFWriter is now an interface, for easier redirection.
- VCFWriterImpl fleshes out the VCFWriter interface.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4026 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 16:33:22 +00:00
kshakir 542d394e09 Cleaning up Queue debugging output.
-l DEBUG with local programs now prints out the stdout/stderr of the programs as they are run.
More documentation in the examples with a new even simpler CountReads example.
Took out unused option to build Queue GATK extensions separately.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4025 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:54:08 +00:00
chartl 49a3db9dfe A brief implementation of a QD calculation that is not quite so bimodal for known variants (multiplicatively penalizes QD by (n variant samples)/(n variant alleles) ). Not sure how helpful this will be (which is why it is in oneoffs). Seems nice on MCKD1, but I'm still playing with the optimization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4024 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:42:37 +00:00
chartl c6a8fba922 Occasionally if a JEXL expression results in no variants being captured (like "QD > 20.0" on filtered variants) the per-sample mapping from samples to eval objects can be empty. This semi-hacky fix prevents null pointer exceptions in setting up the resulting empty table (by jumping straight to it in this case)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4023 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:37:45 +00:00
ebanks f874e548aa Shame on us. FlagStat used ints instead of longs, so we ended up getting negative read counts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4022 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 03:00:57 +00:00
ebanks 71c4d3f33d Moving pointer to b36 reference from /broad/1KG to /humgen/1kg
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4021 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 00:54:34 +00:00
kshakir 162febdef8 Added Queue packages, which must be run with 'ant queue package'.
To assist with the above no longer removing jars during a new build, so 'ant queue dist' will still have the Queue.jar.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4020 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-12 17:45:53 +00:00
kshakir 25a23218c6 Trying a build server fix via google: only running the ivy taskdef once.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4019 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-12 05:10:26 +00:00
kshakir cd5d42618f Killing old version of ivy jar.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4018 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-12 01:32:03 +00:00
kshakir f39dce1082 Exposed CommandLineFunction defaults to the Queue.jar command line (see -help).
Added ability to skip up-to-date jobs where the outputs are older than the inputs.
Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names.
Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile
Moved Hidden from the GATK to StingUtils.
Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7
Added Queue to javadoc and testing build targets.
Added first Queue unit test.
Another pass at avoiding cycles in the DAG thanks to all function I/O being files.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 21:58:26 +00:00
chartl 8c08f47923 1) Make sure that the table size is set correctly in finalize()
2) Make sure variants are biallelic before asking for isTransversion()



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4016 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:32:22 +00:00
hanna 41d57b7139 Massive cleanup of read filtering.
- Eliminate reduncancy of filter application.
- Track filter metrics per-shard to facitate per merging.
- Flatten counting iterator hierarchy for easier debugging.
- Rename Reads class to ReadProperties and track it outside of the Sting iterators.
Note: because shards are currently tied so closely to reads and not the merged triplet of <reads,ref,RODs>, the metrics
classes are managed by the SAMDataSource when they should be managed by something more general.  For now, we're hacking
the reads data source to manage the metrics; in the future, something more general should manage the metrics classes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4015 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:17:11 +00:00
ebanks 86bd55408e no INFO output now that it's the default
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4014 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 18:56:28 +00:00
ebanks 7385cce494 Useful tool for calculating the perentage of misaligned reads at homozygous non-ref indel sites
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4013 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:57:44 +00:00
ebanks cc9e6b4ad9 Moved into Tribble to be with VC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4012 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:14:32 +00:00
aaron 14e492fa80 fix for a problem in readNextRecord() of BFS, where we'd go looking for the next record far into in the next contig because (f.getEnd() >= start) was never true once we cycled to a new conitg. Added a check for contig identity. Also, removed duplicate HW calculation classes in the GATK and Tribble.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4011 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:01:38 +00:00
depristo e0abb73fd7 plot now assumes 1 / 1000 is the min error rate, not 1/100
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4010 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 14:48:22 +00:00
kiran 6037443e55 Handle interactive and non-interactive modes more elegantly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4009 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 02:38:53 +00:00
kiran a7409df1a6 Be more robust to missing or empty files in VariantEval output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4008 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 02:22:50 +00:00
kiran 23b5d71e76 A quickly hacked together replacement for AnnotateVCFwithMAF.py, which doesn't work anymore with Cancer's updated annotator. Takes an annotated MAF file and imports the annotations into the VCF file. For the MAF annotator's DNP and TNP annotations (which I think are likely to not be correct, given the lack of phasing information or even proper association to the same sample), just propagate the annotation from the previous annotated variant to which the multinucleotide polymorphism was associated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4007 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 00:20:08 +00:00
flannick cd4cd6db81 Added option to print out discordant sites in GenotypeConcordance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4006 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:55:19 +00:00
flannick 18fc5c8c3e Initial implementation of annotator to compute allele balance for each sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4005 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:40:17 +00:00
flannick 1dc373b9d0 Initial implementation of evaluator to compute popgen theta statistics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4004 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:36:34 +00:00
aaron 0a8ebcb4f9 moving tests over from the GATK to Tribble, and added a speed-up to the readNextRecord() that Mark suggested. Also removed the contained flag from the queries to Tribble in the GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4003 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 17:54:59 +00:00
ebanks 3ff6e3404e Alleles are now returned in a consistent order, so we can deal with tri-allelic sites
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4002 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 15:21:10 +00:00
depristo 67063deb16 Removed coloring by mixture weight. Each cluster gets a distinct color, and the legend indicates which cluster has which id and its weight
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4001 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 14:28:24 +00:00
depristo 672bee295c now plots tranches separately from optimizer
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4000 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:02:52 +00:00
depristo cd2d051209 full path to Rscript
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3999 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:02:38 +00:00
depristo 9b432d0801 1kg script now works
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3998 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:01:18 +00:00
aaron d514c424fd adding tests for BTI in the ROD validation tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3997 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 06:05:40 +00:00
ebanks ca5b274f16 Unit, integration, and performance tests are all busted, so this is a good time to make a big commit...
Major cleanup of the genotype writer code from the calling end.  UG no longer supports making calls in anything but VCF, and that allows us to use the VCFWriter more generically now.  Putting the ball in Matt's court to finish collapsing everything.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3996 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 04:18:29 +00:00
aaron 0f78f70ed4 fix for feature source in Tribble; we need to check that the record coming back isn't null. Also in the GATK added code to set the default logging level in integration tests to WARN, with the default level change they were spewing a bunch of text.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3995 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 02:57:23 +00:00
ebanks 419a36f74c Starting the clean up of the sting.utils.genotype code which is all either moving to Tribble, moving to sting.utils.vcf, or being removed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3994 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 02:16:05 +00:00
depristo 2a4a4b0aab VariantRecalibrator now calls plot_Tranches directly so it works on the farm
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3993 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 23:17:16 +00:00
depristo c2c0c1f57c Removing used --enable_overlap_filters argument; Eric assures me this won't break the currently broken tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3992 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 22:27:13 +00:00
aaron 0f29f2ae3f fixes for the Tree index, and some small clean-up in the GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3991 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:50 +00:00
rpoplin 3eee3183fd Checking in the tiger team changes. LOD calculation modified. -qScale is back in case people need it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3990 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:03 +00:00
ebanks 0eeb659aa3 Useful utility function to print out the Allele as a String since toString prints out * for refs. It was annoying to keep seeing new String(Allele.getBases()).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3989 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:35:56 +00:00
chartl d0ecb8875a Added - a class to count functional annotations by sample (currently for the MAF annotation strings, soon to be migrated to genomic annotator once it is up and running)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3988 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:09:13 +00:00
aaron 5b0b9e79ba protect against nulls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3987 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 19:21:39 +00:00
depristo 8944800f60 Minor refactoring for Ryan
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3986 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 18:05:23 +00:00
kshakir c66e93d86e Fixed a problem when mixing queue with other targets, such as 'ant clean oneoffs queue' and the STING_BUILD_TYPE environment is set.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3985 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 17:59:51 +00:00