gatk-3.8

Commit Graph

Author	SHA1	Message	Date
hanna	e0092bb160	Experimental feature: change the rate at which log messages appear on-the-fly and enable/disable performance logs from outside the JVM process. Making this available for the moment; we'll see whether it ends up being useful. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4983 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 04:20:53 +00:00
carneiro	9e93091e9a	-baqGOP now takes phred scaled scores instead of probabilities in the command line. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4982 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 00:06:38 +00:00
hanna	5736d2e2bb	Something I should have done a long time ago: attempt to detect whitespace after the line continuation backslash and enhance the error message if it appears. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4981 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-12 23:15:08 +00:00
hanna	edebbb5aa0	Fixed long-standing bug reported by Mauricio where @Arguments assigned to primitive types were properly validated and throw the proper MissingArgumentValue UserException. Before this fix, the error reported was the infamous DePristo BSOD (Could not create module String because an exception of type NullPointerException occurred caused by exception null). Thanks Mauricio! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4980 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-12 22:18:24 +00:00
hanna	6d855041ec	Oops...forgot to commit the changes that allow primitive VCF streaming. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4979 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-12 21:54:51 +00:00
delangel	8a6b126ea8	Several cleanups to IndelMetricsByAC: - No longer a standard eval module to keep integration tests happy - Remove class name overlaps with SimpleMetricsByAC so that modules don't overwrite each other's files, and to make it easier to grep results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4978 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-12 18:35:24 +00:00
depristo	8fe5641b2e	can explicitly set the now required ReferenceDataSource in unit tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4977 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-12 18:25:12 +00:00
aaron	7916ab0ed5	remove the index each run git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4976 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-12 17:38:22 +00:00
depristo	468ef382b7	vastly improved progress meter that estimates % of work done and time until the job finishes and time remaining. Reordered GATK core initialization order -- intervals are created before the scheduler. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4975 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-12 17:32:27 +00:00
delangel	bdd382198c	Necessary changes to enable HaplotypeScore annotation for indels git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4974 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-12 01:09:12 +00:00
delangel	23597a2bde	Variant Eval module that collects indel statistics (basic counts and event sizes) and partitions by AC (similar to SimpleMetricsByAC in the SNP case) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4973 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-12 01:08:09 +00:00
fromer	48052907a6	A hom genotype can always be considered phased git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4972 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-11 18:48:48 +00:00
fromer	c2dd956888	Moved PrintReferenceVariantsWalker to playground git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4971 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-10 22:07:41 +00:00
kshakir	8ba3a5a43f	Command lines for locally run Queue jobs no longer have to be escaped differently than bsub'ed jobs. GSA-410 Local job runs now can run command lines longer than than 4096 on our linux machines. When determining if the help text and Queue extensions need to be rebuilt, use the .class files not the .java so that GATK oneoffs are picked up correctly. Added the most basic of all example QScripts for debugging, Hello World. Minor updates to copy/pasted LSF code to reduce ant javadoc warnings by a third. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4970 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-10 21:07:29 +00:00
ebanks	ee348ac9d4	Add a hidden mode to the realigner to turn off SW but still use indels other than known ones (i.e. those already in the reads) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4969 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-10 20:27:04 +00:00
fromer	01c2091cd9	A LocusWalker to print the haploid reference genome as a VCF file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4968 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-10 16:59:41 +00:00
delangel	9648399630	Boneheaded silly bug in indel caller - posterior probability computation was using priors gotten from SNP heterozygosity, not indel heterozygosity. Added then indel het. argument to command line and hook it up (not a radical change in calls though, just a few dubious calls around the edges fall off) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4967 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-10 14:56:28 +00:00
aaron	b24e1134f9	unfortunately samrecord pileup also uses zero length intervals to indicate deletions; this will have to be a BED specific exception. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4964 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 22:32:50 +00:00
kshakir	b34e2f733f	Removed stochasticity from IndelRealigner by random sampling using and seed based on the read list. Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set. Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found. Now building all scala code at the same time, just like all java code is compiled at the same time. Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile. Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles. Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified. Fixed a couple errors when the <javadoc> task is run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 22:03:36 +00:00
ebanks	60f45a7c49	Stupid me. Forgot to put this check in the last commit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4959 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 19:16:41 +00:00
aaron	56b87da8f9	a better error message for the situation where a RMD track generates a negitive length interval; the user will now see a message like "Bad input: A feature produced by the reference metadata track named "bed" at position chr1:10434-10433 has a start greater than the stop; this is an invalid position " git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4958 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 19:06:04 +00:00
ebanks	4272b824d6	unused imports git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4957 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 18:33:12 +00:00
chartl	3e7802a3e0	Minor changes to a qscript and the GQ constants on PrivatePermutations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4956 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 18:26:21 +00:00
kiran	79fcff13ff	Fixed import statement that was erroneously referring to VE3 rather than VE2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4955 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 03:22:25 +00:00
ebanks	f3ca2cc9de	Add safety net to BAQ calculation: explicitly cast to byte/int and check for bad values git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4954 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-06 18:09:12 +00:00
ebanks	2ac5c52281	Better error message as per Mark git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4953 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-06 15:44:02 +00:00
ebanks	e0d091b3db	Die gracefully if the bam is malformed with quals that are too high git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4952 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-06 15:39:08 +00:00
kiran	3163970ad5	Updates that slipped from my last commit: fixed some imports and calls to super(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4951 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-06 15:34:40 +00:00
kiran	d88fd7212f	Changes to allow the primary key of a table to be hidden. Formatting changes to account for when that column is hidden. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4948 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-06 15:27:19 +00:00
kiran	307c41c128	Changes to allow the primary key of a table to be hidden. Formatting changes to account for when that column is hidden. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4947 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-06 15:26:38 +00:00
kiran	fdc514ded3	Intermediate commit for VariantEval 3.0. Among the changes: * Stratifications (by comp rod, by eval rod, novelty, filter status, etc.) have been generalized. They are very symmetric with evaluators now. Each stratification can have multiple states (e.g. known, novel, all). New stratifications can be added and optionally applied. Some new stratifications include: - by sample - by functional class - by CpG status * Output is to a single file in GATKReport format, rather than having the options of CSV, R, table, etc. * Rather than needing to state up front that the allowable variant type is a SNP or an indel, each eval record is inspected and the appropriate record type is fetched from the comp track. (This will require a bit more testing...) * Evaluation context (basically a single row in a VariantEval report) generation and retrieval has been overhauled. Now, every possible configuration of stratification state is generated recursively and stored in a HashMap. The key of the HashMap is a key that represents that exact state configuration. When examining a comp track and eval track, this key is computed based on the data, providing easy lookup for the appropriate evaluation context. When there are only a handful of stratification configurations, this isn't a big deal. But when operating on a file with hundreds of samples, multipled by 3 states for novelty, 3 states for filtration, 3 states for CpG status, etc., it becomes a very big deal. There are still some known issues: * When the per-sample stratification is turned off, things are getting overcounted (too many variants are showing up when compared to the VariantEval 2.0 code). It's probably because I break out the VariantContext by sample even when not necessary, and those irrelevant contexts are still being counted. Or my recursion is overaggressively creating evaluation contexts, and they all get added up in a weird way. But that's why I'm committing now - so I can track down this issue without losing my work so far. * The Jexl expressions are sometimes throwing an exception that I don't yet understand (they complain of an incorrect specification on the command-line... after the program has made it through a few thousand records. * The request to have evaluations be smart enough to reject certain stratification states is not implemented yet. There's still some work to do before I can replace VariantEval 2.0 with VariantEval 3.0, but feel free to take a look. I'd love comments on the new code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4946 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-06 15:20:24 +00:00
kiran	e9201b81d1	A more general method for specifying samples to act on from the command-line. Supports samples specified individually on the console, a file of samples, or regular expressions to select multiple samples. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4945 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-06 14:54:56 +00:00
carneiro	5e9a8f9cb3	Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base. Adding the first version of the techdev pipeline (tdPipeline) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 22:25:08 +00:00
aaron	cba436fa2f	small fix for the table codec; if you see a header line, you know you've finished parsing the header. Also also some changes to return the ref ordered data pool test to using MappedStreamSegment instead of EntireStream git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4942 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 21:20:26 +00:00
fromer	4b37710bcd	Added validator for phasing using read information, e.g., PacBio: ReadBasedPhasingValidationWalker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4940 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 20:05:56 +00:00
delangel	d203f5e39a	Experimental change in how we classify indels - up to now, an indel of say AA was counted as a 2-mer repeat expansion. But in reality, if the event is sounded by A's it's really a multiple monomer expansion. So, we first reduce the indel bases in case they are made of repeated elements before classifying them. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4939 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 17:13:18 +00:00
rpoplin	4ac0590744	Fix for NaNs in the rank sum tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4938 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 15:21:30 +00:00
chartl	445ae06a7a	Re-add PrivatePermutations since ACTransitionTable is a little too memory-intensive to generate all the cuts that I need git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4937 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 06:11:18 +00:00
hanna	7cdaffbe5c	Create tmpdir if it doesn't exist. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4936 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 03:07:11 +00:00
hanna	0982d35f5b	Bug fixes in streaming in Tribble data via /dev/stdin. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4935 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 02:43:04 +00:00
rpoplin	23dbc5ccf3	HaplotypeScore is revamped. It now uses reads' Cigar strings when building the haplotype blocks to skip over soft-clipped bases and factor in insertions and deletions. The statistic now uses only the reads from the filtered context to build the haplotypes but it scores all reads against the two best haplotypes. The score is now computed individually for each sample's reads and then averaged together. Bug fixes throughout. The math for the base quality and mapping quality rank sum tests is fixed. The annotations remain as ExperimentalAnnotations pending more investigation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4934 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 00:28:05 +00:00
ebanks	85714621be	Better interface to Genotypelikelihoods class. Now you need to specify the format (GL vs PL) of the output string when calling getAsString(). All likelihoods are represented as GLs internally. QualByDepth no longer does its own conversion. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4933 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-04 21:48:14 +00:00
ebanks	96729acd0d	Optional argument to put the original position into the INFO field git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4930 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-04 19:22:44 +00:00
delangel	caedfed860	Fix bug where indels being incorrectly classified in VariantEval module git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4929 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-04 18:01:48 +00:00
hanna	8d2c14b29c	Update Picard / sam-jdk at Tim's request. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4925 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-03 02:17:25 +00:00
depristo	d31c658c2e	Organized performance monitoring passes unit tests and is more efficient git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4924 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-03 02:09:08 +00:00
depristo	c51e745bae	The engine can be null in a unit test, so check for it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4923 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-03 01:00:52 +00:00
depristo	75a7d8a76e	Trivial formatting error git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4922 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-02 23:44:36 +00:00
depristo	5539c2d9f3	--performanceLog (-PF) X.dat argument now enabled. Writes out a table (R-friendly) of the performance of the GATK over time, exactly as a more detailed version of the INFO progress meter. R script for useful plotting of the performance of the GATK over time. Will be helpful for upcoming scalability testing and debugging of memory leaks and other incremental performance problems git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4921 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-02 23:34:21 +00:00
depristo	4c9746f463	Disabled performance log intermediate commit. Will be refactored and committed to the responsiblity along with documentation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4919 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-02 22:18:12 +00:00
hanna	3fc9862964	Unit test fixed - Tribble codecs aren't designed to be stateless, but I was using one as though it was. Fixed, and debug code reverted. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4917 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-31 17:47:52 +00:00
hanna	b9cb57f4b9	A unit test is failing on bamboo in a way I can't reproduce (or even explain). Checking in some debugging info. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4916 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-31 16:35:04 +00:00
hanna	cba18116e4	A significant refactoring of the ROD system, done largely to simplify the process of streaming/piping VCFs into the GATK. Notable changes: - Public interface to RMDTrackBuilder is greatly simplified; users can use it only to build RMDTracks and lookup codecs. - RODDataSource and RMDTrack are no longer functionally at the same level; RODDataSources now manage RMDTracks on behalf of the GATK, and the only direct consumers of the RMDTrack class are the walkers that feel the need to access the ROD system directly. (We need to stamp out this access pattern. A few minor warts were introduced as part of this process, labeled with TODOs. These'll be fixed as part of the VCF streaming project. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4915 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-31 04:52:22 +00:00
ebanks	d70483c50a	Automatically filter out reads with consecutive indel operators in the CIGAR string git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4914 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-31 04:42:54 +00:00
ebanks	848977678d	No reason to convert the GLs to a String for formatting when they're just going to be converted to PLs later. That was 5% of the UG runtime... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4913 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-29 22:06:19 +00:00
aaron	85f2968104	add convenience methods for RODs-for-reads: the ability to get all the RODs covering the read, regardless of their type or position on the read. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4912 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-29 20:46:03 +00:00
depristo	d7e74f8be6	Temporary phasing evalution walker that needs to be incorporated into the newest VariantEval, whenever it is available git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4911 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-29 20:43:15 +00:00
ebanks	a31f6e4e99	Need to check isBiallelic before calling getSNPSubstitutionType for the allele swap warning git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4909 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-27 20:17:14 +00:00
ebanks	8a0c07b865	Support for indels in hapmap. This was non-trivial because not only does hapmap not tell you whether the allele is an insertion or deletion, but it also has a completely different positioning strategy (rightmost base). I'll send out an email tomorrow when the new HapMap3.3 VCF is ready. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4908 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-27 07:37:46 +00:00
chartl	6ebf5b30de	Transposing the table, and fixing some null pointer exceptions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4906 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-23 16:22:57 +00:00
ebanks	cebfd01857	Properly output .bed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4905 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-23 14:49:24 +00:00
depristo	464d0e18e3	Bringing us back to passing integrationtests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4904 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-23 14:36:11 +00:00
depristo	8c583ea405	RBP now operates correctly at non-variant sites so we can phase hom-ref genotypes with -sampleToPhase git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4903 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-23 13:11:22 +00:00
delangel	376bc563d4	Trivial change to allow GenerateVariantClusters to be run on indels - not that VQSR now works on indels, far from it, but at least it's a first step and it allows us to generate cluster plots to see how well known/novel sites differentiate in their covariates (short answer: no difference/separation :( ). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4902 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 22:39:09 +00:00
hanna	e313eeede8	Push command-line expansions, such as BAM list unpacking and -B tag parsing, out into the CommandLine* classes. This makes it easier for external functionality (such as the VCF streamer) to use GenomeAnalysisEngine directly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4897 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 19:00:17 +00:00
depristo	66cca7de0f	renamed genotypesArePhased to isPhased, as the previous name was incorrect for several reasons. Added setPhase() to MutableGenotype. Other classes changed to reflect renaming to isPhased(). CombineVariants now supports an experimental MASTER mode where it consumes -B:master,vcf and -B:xi,vcf for any number i and updates the master with phasing information in xi. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4896 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 17:42:05 +00:00
chartl	2235245af0	PrivatePermutations generalized to compute transition counts and average probabilities (and thus was renamed). Changes in some pipelines to reflect the change. Bugfix in the batch merging pipeline (it would halt because the allele VCF for genotyping batches could become off-spec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4894 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 15:16:15 +00:00
delangel	a1653f0c83	Another major redo for indel genotyper: this time, add ability to do allele and variant discovery, and don't rely necessarily on external vcf's to provide candidate variants and alleles (e.g. by using IndelGenotyperV2). This has two major advantages: speed, and more fine-grained control of discovery process. Code is still under test and analysis but this version should be hopefully stable. Ability to genotype candidate variants from input vcf is retained and can be turned on by command line argument but is disabled by default. Code, by default, will build a consensus of the most common indel event at a pileup. If that consensus allele has a count bigger than N (=5 by default), we proceed to genotype by computing probabilistic realigmment, AF distribution etc. and possibly emmiting a call. Needed for this, also added ability to build haplotypes from list of alleles instead of from a variant context. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4893 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 02:38:06 +00:00
hanna	09c7ea879d	Merging GenomeAnalysisEngine and AbstractGenomeAnalysisEngine back together. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4889 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 02:09:46 +00:00
depristo	b3ac47812c	No longer emits records at filtered sites, in sub-sampling mode git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4883 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 16:43:50 +00:00
depristo	60880b925f	VC utils prune method now will keep genotype attributes as well as info keys. RBP now emits a far reduce (NO INFO, only GT:GQ:PG) records, further reducing size of phasing output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4882 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 16:33:14 +00:00
depristo	8604335566	Minor improvements to further reduce debugging output. When running in -samplesToPhase mode, now only including the samples to phase in the output VCF, making it very much smaller. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4881 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 16:19:47 +00:00
depristo	ff90c24f28	RBP now supports operating on a subset of samples, outputting a much reduced VCF file appropriate for merging later. Also, general optimization to avoid printing enormous amounts of data to logger.debug by using a glocal static variable DEBUG that conditionally allows writing to the variable. Passes integration tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4880 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 16:03:28 +00:00
depristo	a3729bd59c	Now I call BeforeMethod correctly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4872 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 22:45:45 +00:00
depristo	b7e4a015c0	static thread cache reset in UnitTest git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4870 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 21:53:10 +00:00
depristo	3bbc6a0540	Slightly more thread safe CachingIndexedFastaSequenceFile.java. Likely passes parallel testing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4869 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 21:05:17 +00:00
depristo	5dd0e8388b	Fixed a bug in UnitTest git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4867 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 19:44:35 +00:00
depristo	4a54f3f230	ThreadLocal version of CachingIndexedFastaSequenceFile. More efficient support for shared memory BAQ calculations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4865 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 15:44:48 +00:00
depristo	32d5397c01	Experimental support for sided annotations. Currently not more/less valuable than two-tailed testing. Future experiments are needed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4864 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 15:08:31 +00:00
handsake	21dc05138a	Bug fixes for the bwa aligner and changes to support compiling against newer releases of the bwa code base. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4863 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 14:49:15 +00:00
chartl	2bd2667516	Another privately-owned class to add before re-checking out repository git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4858 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 18:14:51 +00:00
chartl	e406eb0f95	Adding a useful accessor method to TableFeature git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4856 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 18:11:51 +00:00
ebanks	8ab4704b4c	Adding a command-line argument to allow missing values to evaluate as false instead of true git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4854 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 05:18:12 +00:00
ebanks	9f3e56e487	VariantAnnotator shouldn't die when multiple records occur at the same position git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4853 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 04:05:47 +00:00
hanna	acfe83920b	'-L unmapped': adding integration tests for explicitly including (-L unmapped) unmapped reads and explicitly excluding (-XL unmapped) unmapped reads, augmenting the suite of unit tests already put in place. '-L unmapped' seems safe to use; go for it, but please validate results against samtools flagstat when the process finishes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4849 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 23:11:46 +00:00
ebanks	dabdeb729e	Eric broke the build. Eric broke the build. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4847 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 17:01:38 +00:00
ebanks	5c0b66cb7c	3 big changes that all kill the integration tests: 1. Don't cap the PLs by 255 anymore. 2. Move over to the 3state model as the only available base model for UG (no more base transition tables). 3. New QD implementation when GLs/PLs are available. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4846 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 16:24:28 +00:00
chartl	5a27d231fa	Rename it so that nobody else falls into the trap laid out (the test is VariantToTable, the walker is Variant[s]ToTable) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4844 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 11:43:00 +00:00
chartl	5e27e9162f	Huh? I thought we parsed out comma-separated command line arguments into list automatically...just change the syntax of the integration test, no need to update the md5 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4843 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 11:40:27 +00:00
chartl	3e75431bc8	Thanks to mark: VCFInfoToTable removed in favor of a more flexible walker. Slight change to the argument structure of the walker to make it play more nicely with Queue: the field list parsing is pushed into the command line system (e.g. the variable is exposed as a List<String> and not a String, so Queue doesn't have to join a list into a string only to have it broken out again. This also allows the user to specify -F field1 -F field2 -F field3 if he/she so desires. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4842 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 03:33:36 +00:00
kshakir	01323447c6	Removed LibBat.SUB2_BSUB_BLOCK since the use of it exits the JVM. Fixed integration tests to wait on their own for the job to run instead of using SUB2_BSUB_BLOCK. Updated VariantRecalibrationIntegrationTests MD5s which were knocked out of sync whele SUB2_BSUB_BLOCK was exiting in the middle of integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4840 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 19:57:20 +00:00
hanna	67c07d1a6a	Fixed recently introduced multiplexer issue where DoC couldn't be written directly to command-line. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4839 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 19:35:15 +00:00
hanna	526ae92093	Getting back to '-L unmapped': - basic unit tests for interval sorting and merging with mix of mapped/unmapped. - validation to ensure that locus walkers (really all non-read walkers) blow up with a user error when -L unmapped is specified. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4837 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 18:24:18 +00:00
ebanks	afd4655674	Use @Output instead of @Argument. As a side note, Chris I'm ready for this nightmare to go away... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4835 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 17:13:15 +00:00
ebanks	cf7d932a17	Fix for f***ed up BWA alignments that adhere to SAM specs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4834 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 17:12:25 +00:00
kshakir	d550fdfd60	Disabling integration test to see if this restores the full test suite. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4833 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 15:27:02 +00:00
delangel	a5008faca8	Bug fix: when getting variant contexts at a site, we need to get only variants that start at current location, otherwise we get duplicated records when filtering indels. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4830 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 19:23:10 +00:00
delangel	17db2e0e24	(forgot I hadn't committed this) - refactored IndelStatistics module and added a new inner class to compute Indel classification along with other statistics. So, we now get an extra table specifying, per sample, counts of whether indels are: - Repeat Expansions - Novel sequence And for indels of size <=2 we get a per-mononuc. or dinuc. breakdown of novels and expansions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4828 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 17:43:43 +00:00
chartl	cf75caf653	java changes: VariantEvalWalker's logger is made public, so that variant eval modules can access it through the parent object. DesignFileGenerator comment lists how best to bind things to it, and the feature accessor is better refined to grab the genome loc. (old change) scala changes: convenience addAll( List[CommandLineFunction] ) added to QScript class (and thus removed from the fCPV2) useful command line functions added to a new library package for command line functions (these are fast simple VCF command lines) bug fixed in ProjectManagement for the class where there's only one batch to be batch-merged (not really part of the use-case, but an edge-condition that came up during pipeline testing) first draft of a private mutations pipeline which will be elaborated in future git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4823 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-12 05:10:45 +00:00
depristo	abd6ce1c77	A TiTv-free approach for cutting variants! Apparently much better than previous approach, and will work for indels and SV will truly minor modifications to the code. Will discuss with methods group on Monday. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4822 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-11 23:08:13 +00:00

1 2 3 4 5 ...

4080 Commits (7b92cd5008a3c58f17a7523abd07c1ed764d4e98)