Commit Graph

3346 Commits (dfe2922b5e8fd529ab819911a6cbd58f837ec3ca)

Author SHA1 Message Date
fromer dfe2922b5e First working version of statistical haplotype phaser
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4031 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 21:29:45 +00:00
ebanks f36c0ed613 Stop building obsolete VCFTools and CGUtilities
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4030 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 19:28:36 +00:00
rpoplin 222f61df87 Bug fix for damoskow in TableRecalibration. Shouldn't try to update the reference mismatch rate tag for an unmapped read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4028 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 18:57:07 +00:00
kshakir 80a70ccf03 Repopulating rodsToSamples. Code reviewed by Eric.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4027 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 17:07:18 +00:00
hanna cb144734c0 Getting rid of GenotypeWriter interface. Of note:
- GATKVCFWriter deleted, to be replaced if absolutely necessary when VCF writing goes into Tribble.
- VCFWriter is now an interface, for easier redirection.
- VCFWriterImpl fleshes out the VCFWriter interface.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4026 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 16:33:22 +00:00
kshakir 542d394e09 Cleaning up Queue debugging output.
-l DEBUG with local programs now prints out the stdout/stderr of the programs as they are run.
More documentation in the examples with a new even simpler CountReads example.
Took out unused option to build Queue GATK extensions separately.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4025 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:54:08 +00:00
chartl 49a3db9dfe A brief implementation of a QD calculation that is not quite so bimodal for known variants (multiplicatively penalizes QD by (n variant samples)/(n variant alleles) ). Not sure how helpful this will be (which is why it is in oneoffs). Seems nice on MCKD1, but I'm still playing with the optimization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4024 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:42:37 +00:00
chartl c6a8fba922 Occasionally if a JEXL expression results in no variants being captured (like "QD > 20.0" on filtered variants) the per-sample mapping from samples to eval objects can be empty. This semi-hacky fix prevents null pointer exceptions in setting up the resulting empty table (by jumping straight to it in this case)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4023 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:37:45 +00:00
ebanks f874e548aa Shame on us. FlagStat used ints instead of longs, so we ended up getting negative read counts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4022 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 03:00:57 +00:00
ebanks 71c4d3f33d Moving pointer to b36 reference from /broad/1KG to /humgen/1kg
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4021 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 00:54:34 +00:00
kshakir f39dce1082 Exposed CommandLineFunction defaults to the Queue.jar command line (see -help).
Added ability to skip up-to-date jobs where the outputs are older than the inputs.
Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names.
Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile
Moved Hidden from the GATK to StingUtils.
Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7
Added Queue to javadoc and testing build targets.
Added first Queue unit test.
Another pass at avoiding cycles in the DAG thanks to all function I/O being files.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 21:58:26 +00:00
chartl 8c08f47923 1) Make sure that the table size is set correctly in finalize()
2) Make sure variants are biallelic before asking for isTransversion()



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4016 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:32:22 +00:00
hanna 41d57b7139 Massive cleanup of read filtering.
- Eliminate reduncancy of filter application.
- Track filter metrics per-shard to facitate per merging.
- Flatten counting iterator hierarchy for easier debugging.
- Rename Reads class to ReadProperties and track it outside of the Sting iterators.
Note: because shards are currently tied so closely to reads and not the merged triplet of <reads,ref,RODs>, the metrics
classes are managed by the SAMDataSource when they should be managed by something more general.  For now, we're hacking
the reads data source to manage the metrics; in the future, something more general should manage the metrics classes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4015 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:17:11 +00:00
ebanks 7385cce494 Useful tool for calculating the perentage of misaligned reads at homozygous non-ref indel sites
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4013 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:57:44 +00:00
ebanks cc9e6b4ad9 Moved into Tribble to be with VC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4012 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:14:32 +00:00
aaron 14e492fa80 fix for a problem in readNextRecord() of BFS, where we'd go looking for the next record far into in the next contig because (f.getEnd() >= start) was never true once we cycled to a new conitg. Added a check for contig identity. Also, removed duplicate HW calculation classes in the GATK and Tribble.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4011 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:01:38 +00:00
flannick cd4cd6db81 Added option to print out discordant sites in GenotypeConcordance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4006 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:55:19 +00:00
flannick 18fc5c8c3e Initial implementation of annotator to compute allele balance for each sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4005 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:40:17 +00:00
flannick 1dc373b9d0 Initial implementation of evaluator to compute popgen theta statistics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4004 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:36:34 +00:00
aaron 0a8ebcb4f9 moving tests over from the GATK to Tribble, and added a speed-up to the readNextRecord() that Mark suggested. Also removed the contained flag from the queries to Tribble in the GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4003 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 17:54:59 +00:00
ebanks 3ff6e3404e Alleles are now returned in a consistent order, so we can deal with tri-allelic sites
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4002 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 15:21:10 +00:00
aaron d514c424fd adding tests for BTI in the ROD validation tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3997 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 06:05:40 +00:00
ebanks ca5b274f16 Unit, integration, and performance tests are all busted, so this is a good time to make a big commit...
Major cleanup of the genotype writer code from the calling end.  UG no longer supports making calls in anything but VCF, and that allows us to use the VCFWriter more generically now.  Putting the ball in Matt's court to finish collapsing everything.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3996 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 04:18:29 +00:00
aaron 0f78f70ed4 fix for feature source in Tribble; we need to check that the record coming back isn't null. Also in the GATK added code to set the default logging level in integration tests to WARN, with the default level change they were spewing a bunch of text.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3995 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 02:57:23 +00:00
ebanks 419a36f74c Starting the clean up of the sting.utils.genotype code which is all either moving to Tribble, moving to sting.utils.vcf, or being removed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3994 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 02:16:05 +00:00
depristo 2a4a4b0aab VariantRecalibrator now calls plot_Tranches directly so it works on the farm
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3993 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 23:17:16 +00:00
depristo c2c0c1f57c Removing used --enable_overlap_filters argument; Eric assures me this won't break the currently broken tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3992 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 22:27:13 +00:00
aaron 0f29f2ae3f fixes for the Tree index, and some small clean-up in the GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3991 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:50 +00:00
rpoplin 3eee3183fd Checking in the tiger team changes. LOD calculation modified. -qScale is back in case people need it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3990 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:03 +00:00
ebanks 0eeb659aa3 Useful utility function to print out the Allele as a String since toString prints out * for refs. It was annoying to keep seeing new String(Allele.getBases()).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3989 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:35:56 +00:00
chartl d0ecb8875a Added - a class to count functional annotations by sample (currently for the MAF annotation strings, soon to be migrated to genomic annotator once it is up and running)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3988 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:09:13 +00:00
aaron 5b0b9e79ba protect against nulls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3987 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 19:21:39 +00:00
depristo 8944800f60 Minor refactoring for Ryan
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3986 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 18:05:23 +00:00
kshakir 4f51a02dea Changed logging level to default at INFO instead of WARN.
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 16:42:48 +00:00
aaron 30178c05c5 providing a way to specify how you'd like -BTI combined with your -L options; set BTIMR to either UNION (default) or INTERSECTION.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3983 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 14:00:52 +00:00
hanna 6b4a1e3b9f Reenabling code that was commented out after it was confirmed to work by many participating in this thread:
http://getsatisfaction.com/gsa/topics/error_thrown_when_reading_reference_file


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3981 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 00:12:09 +00:00
kiran 48e311a5ea Added copyright notice.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3980 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:11:51 +00:00
kiran 9aa70d9c7c Replaced by SelectVariants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3979 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:07:42 +00:00
kiran 758ab428f5 Better logging info for the samples being selected and the sample expressions being ignored.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3978 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:03:37 +00:00
kiran e242a8f143 Put single quotes around the regex. This isn't strictly necessary through the integration test machinery, but *is* necessary at the console, and it's convenient to be able to cut and paste this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3977 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:56:57 +00:00
kiran 13f29660bb Integration test for SelectVariants. Tests a complex case with an explicit sample selection, sample selection by regex, exclusion of non-variant and filtered loci, and JEXL selection on low allele-frequency variants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3976 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:49:47 +00:00
ebanks 637a1e5055 Updating to use the new VA interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3975 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:31:01 +00:00
ebanks bd6d5a8d51 Adding command-line header to VA and VF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3974 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:21:15 +00:00
kiran 64446f0ddf Avoid NaNs in the final output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3973 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:16:52 +00:00
ebanks 3f6e44dc71 Updated recalibrator and cleaner to output full command-lines in the bam header
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3972 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:39:18 +00:00
kiran 0da0dfa1da Cosmetic change - lower-case for all command-line arguments' short names.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3971 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:12:01 +00:00
kiran eb1bb94d1c Moved the evaluation of the JEXL expressions to a point *after* the samples are subset and the INFO-field annotations are updated. I think this makes more sense than having the evaluations happen beforehand, since it seems jarring to have the JEXL expressions operate on the annotations before they're updated, and have the file contain the annotations after they're updated. Now, selecting on something like allele frequency will actually apply to the annotations that actually end up in the file, while selection on other annotations (which are carried over without modification) will act exactly the same regardless.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3970 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:09:02 +00:00
ebanks 594b7912f1 Added a generic method for returning the complete command-line used when calling a walker, to be used in the bam/vcf headers. As requested, every possible engine/walker argument is included. I've added it to the Unified Genotyper output, so people can try it out and let me know what they think. Something that needs to be discussed in group meeting: what happens when we merge VCFs? Do we keep all of the command-lines?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3969 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 03:53:07 +00:00
kiran 6e389059cf An improved version of VariantSubset and VariantSelect, meant to replace those walkers. Takes in a VCF and creates a subsetted VCF by sample(s), JEXL expressions, or both.
When subsetting by sample, the -SN argument is treated as a literal sample name and, if no match is found, as a regular expression.  This allows for a large number of samples to be selected at once (useful when, for instance, cases are given one sample name prefix and controls are given another).

After the subsetting procedure, the INFO-field annotations AC, AN, AF, and DP are all recalculated to properly reflect the new contents of the VCF.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3968 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 02:57:06 +00:00
ebanks ac4699a650 Re-enabling this test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3962 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:20:37 +00:00