gatk-3.8

Commit Graph

Author	SHA1	Message	Date
ebanks	0d71dff928	Small bug fix to the new UG (need to initialize the entire posteriors array) means that we also get identical results as old UG when calling with 60 samples in the pilot1 data. Now that I'm happier with UGv2, I've transitioned it to use the correct AF priors instead of the busted ones still in the old UG. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4379 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 14:24:50 +00:00
hanna	eee134baf2	Chris found a bug in the downsampler where, if the number of reads entering the pileup at the next alignment start is large, we don't add as many of those incoming reads as we should. No integration tests were affected. Thanks, Chris! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4378 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 11:18:12 +00:00
ebanks	0ec07ad99a	Initial version of refactored Unified Genotyper. Using SNP genotype likelihoods and GRID_SEARCH AF estimation models, achieves the exact same results as original UG on 1-2 samples with the exception of strand bias (not implemented yet); other than that I have no idea. Needs tons more testing. Do not use. For Guillermo only. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4377 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 08:42:25 +00:00
kshakir	6df7f9318f	For enums generate the full path to the Enum type to avoid collisions such as enum Model and enum Model used in the same class. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4376 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 05:28:59 +00:00
aaron	cfebe5c731	clean-up the docs a little git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4375 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 05:02:41 +00:00
aaron	702449d835	adding a Python script to roll back tribble to the correct version, for users who want to checkout historical versions of the GATK code. This code cross references the current checkout date with the Tribble logs, and pulls the closest (price-is-right style) revision. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4374 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 05:01:46 +00:00
fromer	e322e71c2f	Restored SVN history for phasing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4373 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 00:02:02 +00:00
fromer	720aaca8a0	Trying to restore SVN history for phasing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4372 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 23:50:28 +00:00
fromer	bf88117ead	Trying to restore SVN history for phasing directory git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4371 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 23:48:24 +00:00
fromer	dfb5143a41	Restore folder git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4370 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 23:46:07 +00:00
fromer	7c909bef82	Moved phasing classes out of playground! The code is still under production, though... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4369 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 23:21:28 +00:00
fromer	8d8980e8eb	Fixed phasing algorithm to: 1. More correctly weed out irrelevant reads and sites; 2. Crudely flag sites with large phase discrepancies betweens reads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4368 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 23:02:53 +00:00
chartl	5a5c72c80d	Accidentally commited some debug output to PackageUtils, reverting change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4367 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 21:58:42 +00:00
kiran	a90fb64c03	Added 'Cron' to subject for easier message filtering git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4366 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 21:03:50 +00:00
chartl	862c94c8ce	Small change for Matt -- output partition types in lexicographic order. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4365 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 20:08:03 +00:00
ebanks	7ad87d328d	Make sure to uppercase ref bases since they aren't coming from the engine git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4364 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 19:05:46 +00:00
chartl	37613810bf	Tired of writing vcf_hg18_to_b36 over and over again when necessary. Added a -r flag to this script that does it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4363 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 14:51:57 +00:00
bthomas	96cccafb0d	Adding a few helper methods for accessing sample metadata, and associated unit tests. These are motivated by discussion with Ryan about how he'll use sample metadata in VariantEvalwalker - hopefully will make it easier for him. Methods are: -- getToolkit().subContextFromSampleProperty(): filters a VariantContext to genotypes that come from samples that have a given property value -- getToolkit().getSamplesWithProperty(): gets all samples with a given property -- getToolkit().getSamplesFromVariantContext(): sample objects that are referenced by name in a VariantContext git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4361 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-28 02:16:25 +00:00
kiran	51fdf9d701	Default memory limit is now 4g (apparently necessary when testing on full 100-sample Autism_Daly dataset) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4359 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-27 05:43:08 +00:00
kiran	a7815b4268	Nightly test for Queue-based pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4358 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-27 05:24:58 +00:00
ebanks	1034853a84	Adding 'solexa' to list of known/supported platforms git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4357 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-27 02:38:38 +00:00
kiran	bcc09f5d8c	Simplifications: removed command-line arguments to control SNP cluster filter parameters. Infer the number of contigs to scatter indel cleaning from the contig list (which we should get rid of too). Changed the PY argument to just Y for specifying the path to the YAML file. Cleaned up command-line argument documentation. See http://iwww.broadinstitute.org/gsa/wiki/index.php/Queue-based_pipeline for a list of remaining issues. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4356 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-26 22:50:30 +00:00
kiran	9820a12fa5	Removed unnecessary dbSNP big-table dependency. Ti/Tv is now required. Consistent downsampling level for all programs. Spelling corrections. VariantEval now generates R-style output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4355 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-26 16:55:58 +00:00
kiran	145fb0df8b	Changed the wait job's dispatch queue from short (which doesn't exist anymore) to hour git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4354 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 23:36:49 +00:00
kiran	9bfbc3b784	Commented out changes to ADPR and VariantEval modules that are causing this script to not compile. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4353 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 15:12:10 +00:00
aaron	70f03a7113	first pass of well-formatted tribble exceptions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4352 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 03:29:33 +00:00
kshakir	edaa278edd	Removed cases where various toolkit functions were accessing GenomeAnalysisEngine.instance. This will allow other programs like Queue to reuse the functionality. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4351 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 02:49:30 +00:00
hanna	497bcbcbb7	Recent changes to the build system make the build system complain loudly about pieces of core that depend on playground. Most of these have been eliminated by (temporarily) promoting Aaron's report system to core in this checkin. I'll follow up with other changes in separately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4350 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 22:09:12 +00:00
hanna	6ebca5d219	Enhancements to build external projects for walker sharing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4348 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 21:17:16 +00:00
corin	eb1fa4bff3	changes an argument to an output so I can use it to track dependencies in queue git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4347 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 21:07:09 +00:00
corin	9cf079e1bb	Ready for integration with queue script git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4346 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 19:46:01 +00:00
corin	3ec0e09edd	ADPR is now included in the full calling pipeline. The most up to date version of the ADPR is about to be committed and should be used with the script for now. The qscript now calls for two additional strings as inputs: the sequencing machines used and the sequencing protocol. In order for ADPR to finish successfully, a squid file for both the lane and sample level data needs to be produced, reformatted and named <projectBase>_lanes.txt or <projectBase>_samps.txt, respectively. These files need to be in the working directory. When database access is ready, this and the protocol and sequencer parameters of the r script will go away. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4345 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 19:28:43 +00:00
kshakir	0cc48d46ec	Escaping quotes in dot files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4344 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 17:13:12 +00:00
depristo	745b8cc6d3	GATK now detects and UserExceptions when human lexicographically sorted data is provided git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4343 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 15:19:48 +00:00
kshakir	67bcf3a7e4	Fixed VariantEval rod binding names. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4342 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 14:52:51 +00:00
rpoplin	1931b2e1bd	Three fixes for VariantFiltrationWalker: Trying to filter an empty VCF file will produce a well-formed VCF file with zero records instead of a blank file, needed for pipelines. The first record's genotype info fields are now in the same order as all the others. The VCF header lines are pulled from just the input variant rod instead of from all rods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4341 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 13:52:56 +00:00
chartl	c355afc320	Queue now does job tracking (replace -run with -status in the command line). Produces output that looks like: INFO 20:58:17,827 QCommandLine - Checking pipeline status INFO 20:58:23,234 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_MergeIndels [DONE] INFO 20:58:23,236 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_158.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,237 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_929.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,238 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_SNP_calls [NOT DONE] 5t/0d/0r/5p/0f INFO 20:58:23,239 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_HandFilter [NOT DONE] INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1122.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantRecalibrator [NOT DONE] INFO 20:58:23,241 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_913.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,242 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_2037.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,243 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantEval [NOT DONE] INFO 20:58:23,244 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster [NOT DONE] INFO 20:58:23,245 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_106.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,246 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster_and_Indel_filter [NOT DONE] INFO 20:58:23,247 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_ApplyVariantCuts [NOT DONE] INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_GenomicAnnotator [NOT DONE] INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1713.bam [DONE] 5t/5d/0r/0p/0f git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4340 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 00:59:09 +00:00
kshakir	4ed9f437e9	Sliced the GAE in half like a gordian knot to avoid the constant merge conflicts. The GAE half has all the walker specific code. The new "Abstract" GAE has the rest of the logic. More refactoring to come, with the end goal of having a tool that other java analysis programs (Queue, etc.) can use to read in genomic data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4339 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 23:28:55 +00:00
rpoplin	0c9fabb06f	Fix in AnalyzeAnnotations, somebody changed it look for ID in the vc's info field. This dinosaur desperately needs integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4338 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 19:48:44 +00:00
hanna	0c781968fb	Tried to do a bit of pre-commit refactoring and screwed it up. Fixed. Thanks to Ryan for identifying the problem. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4336 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 18:17:29 +00:00
corin	d6bd1debeb	This is an updated version of the automated data processing report. Each page in the report is a stand alone function, which are linked together with a function which pulls all appropriate data (assuming a standard naming convention) and generates the pdf. This script still need to respond appropriately when it doesn't find the data it needs, database access, and a way of getting some information from sequencing for the tearsheet. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4335 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 18:08:16 +00:00
depristo	d081b9b352	Improvements to error messages about @Requires and @Allows git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4334 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-23 12:08:27 +00:00
hanna	7841b301c4	Added more diagnostics so that I have some idea of what a 'general' exception is. Required to fix bug ZjhCJAdwhtFq1x54ZlmlN8pFNcbrRpdJ and similar. We might want to change this particular case to a ReviewedStingException after we gain a bit more experience with it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4333 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 21:32:01 +00:00
fromer	44ccfc3531	Updated Phasing algorithm + evaluation module to properly implement haplotypes [including homozygous genotypes]; Implemented dynamic window phasing model for LARGE increase in efficiency git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4332 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 21:29:58 +00:00
kshakir	192757d1e0	Added the new pipeline classes to the StingUtils.jar so that ant test picks them up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4331 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 20:05:22 +00:00
hanna	8f75d88519	Fix for GATK run report ids: mOVsxGfDiiSMxVs2PPTVjzYTVbizlD6e f9kUHUADFsZ0LiTGxRL5zPmq9kZcA4cQ 8eGHWJFAlBVmgxwPi3sMd1RmiN2PwHOf iLhvHWveypKb2F8vKS5irHylc3pYvlOb HDttXKUMEVoPrvVeWrH7E0htxYyNydMx plus a bit of cleanup of custom exceptions in the sharding system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4330 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 19:49:25 +00:00
kshakir	20b38b38f3	Updated from SnakeYAML 1.6 to 1.7. Added a pipeline java bean and YAML utility to serialize java beans. Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format. Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference. More changes to come as this code gets tested out in the fullCallingPipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 19:47:49 +00:00
hanna	d9b8fa2acc	Up the memory required for integrationtests until we can figure out why memory isn't being freed correctly when multiple integration tests run as part of a single class. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4328 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 18:54:24 +00:00
hanna	fb5d595ef0	Disable VCF header output in the Beagle integrationtest. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4327 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 16:50:03 +00:00
hanna	0c99c97685	The engine now automatically adds the command-line arguments to the header of every VCF, unless -NO_HEADER is specified. Changed integration tests, adding the -NO_HEADER argument, for walkers that previously did not include the command-line arg headers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4326 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 15:27:58 +00:00

1 2 3 4 5 ...

4339 Commits (0d71dff928928fecd2117ea4591c01025885e653) All Branches Search

4339 Commits (0d71dff928928fecd2117ea4591c01025885e653)

All Branches