Commit Graph

3548 Commits (e02aac07439a1e2e565fd235255afa28df36acdb)

Author SHA1 Message Date
chartl 5889138f4a *facepalm*
forgot to add the samples to the header. How could the VCFWriter let me get away with something so boneheaded?!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4513 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 05:36:29 +00:00
chartl 2bc5971ca1 Added - a tool to fix reference bases of a VCF. The OMNI had a couple of sites with incorrect reference bases (look to be legacy from other chips), and a few more that had ref and alt flipped. GAP should probably take care of it, but since I need results by monday, I'm doing it.
Modified - SelectVariants: Hook up to VariantContextUtils to recalculate AC/AF/AN, which uses the accessor in VariantContext to do this. Somehow sites that were selected down to hom-ref genotypes only wound up getting positive AC. 

**IMPORTANT** I kind of need input here. The header of a file used for an integration test specifies AC as being an integer. Recalculating it casts it into an integer list (which it should be, as it allows for alternate alleles). However this appears to clash with what the jexl expression is looking for? For now, the integration test itself needed to be changed -- it's unclear what to do when the header specifies AC of being one class, but recalculating it casts to another class, and I'm not sure what to do.

I'm committing my omni_qc pipeline because I'm almost certain 2 months down the road I'm going to wonder what the heck I did to generate my results.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4511 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 03:18:01 +00:00
ebanks 7aa030a9a4 Hmm. Apparently variants can get lifted over to different chromosomes. Who knew? Reverting changes from a couple of days ago. The only way to do this correctly (without requiring lots of memory) is to turn off on-the-fly indexing for this walker. Integration tests cover this now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4510 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 02:54:12 +00:00
chartl 8b2d387643 Added in an eval module that calculates the dispersion histograms between eval and comp (e.g. M_{i,j} = # of times eval observed to have AC i, comp AC j -- for af it's i/100 vs j/100 )
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4507 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 19:07:43 +00:00
ebanks f78ff08e2b This is less correct than my previous change but it's what UGv1 does and now is not the right time to start mucking with things.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4506 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 18:56:45 +00:00
ebanks 471c18054f Fix for SB calculation: the best overall AF might not have any mass when just looking at reads from a single strand. We need to compute the best AF for each stratification.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4505 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 17:51:18 +00:00
asivache 42c3d74432 bug fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4503 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 16:27:40 +00:00
chartl c9d473edee More changes to Variant Eval and Genotype Concordance (passes all integration tests):
1: -sample can now include a file, which will be parsed for sample-name entries
2: If you request a sample to run analysis on, but it is not present in any of your RODs, VEW will exception out
3: Change added to parse Integer, String, and List<Integer> type Allele Count annotations (error otherwise)
4 [slightly problematic]: The count objects now maintain row-keys in order, as the keys were taking an inordinate amount of time in onTraversalDone (multiple calls to getRowKeys(), so many multiple sorts of the same underlying unsorted object, very bad)

There is a legacy comparison object which is unused which I will strip out soon.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4502 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 12:40:36 +00:00
ebanks 9f54170dff Hooking up the liftover tool to the new on-the-fly sorting VCF writer so that records can now get emitted in order.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4499 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 07:27:01 +00:00
ebanks d41c252b13 Looking over the calling results with Ryan, it's clear that while the grid search optimization (ignoring samples that are clearly ref) can work for assigning genotypes, it cannot be used for calculating P(AF>0). There's too much area under the likelihood curve that gets lost and the QUALs are negatively affected. However, testing showed that this only slightly affects runtime (~15 minutes per 1Mbase for the 1kg allpops). The optimization does remain for genotyping.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4498 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 19:06:32 +00:00
ebanks 2606e67cf1 Reverting Matt's change from yesterday which I accidentally blew away when trying to cope with the stupid svn update issues we've been plagued with recently.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4495 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 14:40:42 +00:00
ebanks cfb33d8e12 Filtering optimizations are now live for UGv2. Instead of re-computing filtered bases at every locus, they are computed just once per read and stored in the read itself. Eyeballing the results on the ~600 sample set from 1kg, we cut out ~40% of the runtime! QUALs are now sometimes different from UGv1 because I noticed a bug in v1 where samples with spanning deletions only were assigned ref calls instead of no-calls which ever so slightly affects the QUAL. Not a big deal though.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4494 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 05:04:28 +00:00
chartl 4ac636e288 Minor change: when tabulating concordance by AC, ignore sites with multiple segregating alleles in the population, at least for now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4493 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 01:35:33 +00:00
chartl 7c9ef59d65 This is simultaneously a minor and major change to VariantEval, so take heed:
The core walker has been modified so that when variant contexts (eval and comp) are subset to command-line-specified sample(s), the chromosome count annotations (AC/AN/AF) are altered to reflect the AC/AN/AF of only those samples involved in the comparison. No more getting AC500 when you're comparing a 10-sample overlap. Interestingly enough, this didn't break any integration tests.

GenotypeConcordance now has two additional tables: Allele Count Statistics, and Allele Count Summary Statistics. These work exactly identically to the Sample Statistics and Sample Summary Statistics tables, except that the partition being used is no longer the sample, but instead the allele count of the variant sites. These tables stratify by both eval and comp ACs, e.g.

evalAC0
evalAC1
evalAC2
compAC0
compAC1
compAC2

Differences with previous integration tests were verified to only be in the Allele Count tables (by grepping them out of the diff); a new test has been added for the simple case of an AC=1 site in the eval becoming an AC=2 site in the comp.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4491 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 22:26:15 +00:00
hanna 83b8676b69 Hack to fix mysterious disappearing read attributes. Ultimately caused
by the fact that the GATKSAMRecord, by design, needs to both inherit from 
SAMRecord and wrap a 'member' SAMRecord, and method calls that aren't
implemented as explicit passthroughs can compromise the content of the
SAMRecord in subtle ways.

Will be automatically fixed when Picard moves to a lightweight SAMRecord
interface rather than the current heavyweight implementation.  But in 
the short-term, there's no obvious fix.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4489 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 19:06:54 +00:00
depristo da29fcdb68 No longer writes the index to disk twice. But fixes for closing VCFWriters throughout the codebase
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4488 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 14:26:06 +00:00
aaron 28a1020c89 comment out debugging line that was clogging the performance test output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4487 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 03:26:55 +00:00
aaron 272ac2ae4a more fixes for tests broken by indexing-on-the-fly; I think this should do it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4486 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 01:54:32 +00:00
hanna ed39af53cd Fix for exception when trying to load reference segment for a read that aligns
to 0 bases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4485 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 23:50:51 +00:00
ebanks fe9f128631 Better fix for earlier bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4484 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 19:21:33 +00:00
aaron ff0df1a2da A fix for an integration test that was broken by on-the-fly indexing. Also, better reporting of Tribble exceptions in GATK integration tests. Trying to get the tests back up and running...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4483 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 18:39:56 +00:00
ebanks 69652e08c6 Bug fix for reads that completely fall within an insertion: the I cigar string element was 1 base too long.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4482 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 14:46:21 +00:00
kiran f348ca2976 Now processes VCF files with repeated loci without crashing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4481 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 04:36:07 +00:00
ebanks fd8351cd49 Get rid of useless test/'optimization' that was carried over from UGv1. New codde is (minimally) faster with same results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4478 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-11 04:04:07 +00:00
ebanks f28523e7de Implemented SB for UGv2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4477 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-11 03:56:01 +00:00
hanna 7008a469dc Update MalformedReadFilter to pass reads that have cigar strings like 40S36I
that have 0 aligned bases in the genome.  We'll have to fix walkers as faults
appear.

Also added JIRA GSA-406: finer-grained control of MalformedReadFilter: want
to exception out by default in these cases but pass them with a warning with
a corresponding -U flag.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4476 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-11 03:01:04 +00:00
ebanks 530875817f Experimental code for better filtering of bases in sam records. Not hooked up yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4475 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-11 02:19:51 +00:00
ebanks a0de269c4b Better message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4474 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-10 20:11:51 +00:00
rpoplin 0a4cf02a52 Fix for index out of bounds exception in VR.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4473 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-10 17:35:15 +00:00
depristo 116309b3c3 More test cases for UG integration test. We currently fail doing multi-threaded gzip output, FYI
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4472 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 20:22:12 +00:00
depristo 38a67fed63 High performance version of standard vcf writer. New general static Tribble class for common constants, including general .idx constant and functions to get standard index name for a given file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4471 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 19:53:21 +00:00
fromer bdd3a9752e Changed min MQ and BQ to 20 (for phasing)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4469 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 19:27:45 +00:00
asivache 05500d1a8d An iterator wrapper/adapter: takes GenomeLoc iterators 1 and 2 and traverses intersections of intervals from 1 with intervals from 2. Both 1 and 2 must be SORTED and NON_OVERLAPPING, but this iterator does NOT perfrom any checks, so if these conditions are not met, the behavior is unspecified
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4468 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 16:34:00 +00:00
asivache 253d528e49 not ready for commit yet
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4467 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 15:30:55 +00:00
asivache 4f2f33b42a fix method invocation to conform to new API; this version of the code will compile but new functionality is still not fully in
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4466 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 15:30:26 +00:00
asivache cece19d4d2 not ready for commit yet
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4465 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 15:14:54 +00:00
asivache 39e373af6e deleting accidentally committed junk
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4464 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 15:13:01 +00:00
asivache b3d81984aa renaming MergingIterator to RODMergingIterator as it is more appropriate for this specialized implementation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4462 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 14:10:11 +00:00
asivache 77dddd0afa renaming MergingIterator to RODMergingIterator as it is more appropriate for this specialized implementation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4461 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 14:08:28 +00:00
chartl 21ec44339d Somewhat major update. Changes:
- ProduceBeagleInputWalker
 + Now takes a validation ROD and a prior to give it, will use those genotypes in place of the variant genotypes if both are present
 + Takes a bootstrap argument -- can use some given %age of the validation sites
 + Optionally takes a bootstrap output argument -- re-prints the validation VCF, filtering those sites used as part of the bootstrap
-BeagleOutputToVCFWalker
 + Now filters sites where the genotypes have been reverted to hom ref
 + Now calls in to the new VCUtils to calculate AC/AN

-Queue
 + New pipeline libraries for easy qscript creation, still a work in progress, but this is a considerable prototype
 + full calling pipeline v2 uses the above libraries
 + minor changes to some of my own scripts
 + no more need for contig interval lists, these will be parsed out of your normal interval list when it is provided



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4459 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 13:30:28 +00:00
ebanks 97b153f2fa Quick fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4457 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 06:10:52 +00:00
ebanks acd238f3f2 For Chris: pull out the chromosome counting code into VCUtils so that other tools can make use of it. Transitioned SelectVariants over to use it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4456 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 04:37:54 +00:00
delangel 3838823262 Two ugly hopefully temporary fixes for new genotyping model:
a) In Indel genotyper: we can't deal yet with extended events correctly and we are still triggering at each extended event which results in repeated records on a vcf. So, to avoid this, keep track of start position of candidate variantes we've visited and if we've visited a variant before we don't do it again.
b) Avoid infinite terms in QUAL and in genotype likelihoods which can happen if posterior AF happens to be exactly zero. For now, hard-code a minimum value of each term of the posterior AF likelihood to be -300 (ie 1e-300 in lin space). This can be solved with better and smarter log-to-lin conversions and some precision fixes in AF calculation.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4455 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 00:53:16 +00:00
rpoplin 0de658534d Removed the qScale arguments in VariantRecalibrator. It is smarter about how it tries to find a cut so the arbitrary scale factor hopefully is no longer necessary. Now the recalibrated variant quality score more accurately reflects our believed lod of the call.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4451 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-07 18:04:57 +00:00
fromer ee00dcb79d 1. Phasing now ignores bases without minimum base quality (BQ) and minimum mapping quality (MQ); 2. The probability of a non-called base is now divided by 3, to evenly split up the error probability over the non-called bases
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4450 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-07 17:40:59 +00:00
fromer f8f1cc45a3 Now ReadBackedPhasing caps Base Quality by Mapping Quality
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4445 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 20:48:57 +00:00
scalvo bda427f078 Change specification of AnnotationInputTable, and fix 2 bugs.
Previous output spec contained 3 columns:
 haplotypeReference,haplotypeAlternate,haplotypeStrand
where haplotypeReference was always on the + strand, and haplotypeAlternate was on the strand specified by haplotypeStrand.

The new specification contains 3 columns:
 haplotypeReference,haplotypeAlternate,transcriptStrand
where haplotypeRef and haplotypeAlt are required to be on the + strand.  transcriptStrand now specifies the strand of the transcript, which is needed for interpreting the haplotypes.

Bugfix #1: fix incorrect assignment of variantCodon and variantAA
(Previously variantCodon was incorrectly set to referenceCodon)

Bugfix #2: fix incorrect codingCoordStr values for - strands (bug reported by Giulio Genovese), and incorrect usage of "m." for mitochondrial transcripts (bug reported by Steve Hershman)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4444 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 20:46:09 +00:00
scalvo b5c127e643 Removed HAPLOTYPE_STRAND_COLUMN; Previously, GenomicAnnotation allowed a user to specify the strand of the haplotypeAlternate, and would reverseComplement the haplotypeAlternate if HAPLOTYPE_STRAND_COLUMN was "-". The new specification does not allow this functionality, and instead requires both the reference and the alternate haplotypes to be on the + strand (as in VCF format).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4443 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 20:37:41 +00:00
kshakir ca5db821ce Added the ability to Queue to run scala functions inside the JVM. NOTE: Extend from InProcessFunction instead of CommandLineFunction to use this functionality.
Queue now submits new LSF jobs only after previous functions have completed successfully.
When the Queue process is shutdown (ex: via Control-C) sends a bkill command for any running jobs.
Ported commands like creating directories and scatter/gather interval list to scala functions.
Updates to LSF status tracking by porting the python to internally generated bash scripts.
Temporarily disabled job name submission to LSF.  Plus side is that the full command is now available in "bjobs -w".  TODO: Put back jobName passing to LSF based on an option?
Changed BaseTest to allow scala to access paths to references.
Changed the extension generator to default the analysis name to the walker "name".

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4442 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 18:29:56 +00:00
ebanks 3c5dc675ab For Guillermo: only decide that something is a clear reference call if it is at least 10 times as likely as the next best genotype
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4441 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 15:16:41 +00:00
depristo 00491fcd2e Only see not writing GATK Run Report if you are running with debug enabled
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4437 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:09:21 +00:00
rpoplin 69485d6a7a Added command line argument for the max value of the allele count prior in VariantRecalibrator (--max_ac_prior). Default value increased to 0.99 from 0.95.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4436 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:00:53 +00:00
ebanks 3d564f4a29 reverting an accidental change from the dindel merge
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4434 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 03:08:09 +00:00
ebanks b5e148140b Officially fixed the UG priors; updated the default min MQ/BQs to pipeline values of q20 and min calling threshold to Q50
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4431 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 18:35:36 +00:00
fromer c6668bd49c Fixed bug in phasing, where mapping probability was incorrectly raised to the power of number of non-null bases [instead, it is just multiplied into phasing probability once]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4430 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 17:07:31 +00:00
hanna 250c18e679 Error message fixes for the following issues:
nvjpM4yOwQAu3fNGxi4oXLuVpKn6aAlf,1GL0OuXK2xKQfvbu34tWYgbojSVSLo0l,
ehEGBJOfgc4V7qj8W0Homf5ICuVK5Sm3,cZsreLm1CbY3aYKZhV7DOSvQNwur41zp,
GlrlyGEyP9kJDIRCQNFQp7BGJBXSzdDJ,hyz1uiHXr39ANmdZu9K1epOSX8EL3mDw,
q0n4EucZESCI4LZhQik306zD4VAuH2cb.  

Messages:
camrhG5tHzlY9WUSEVpVZGkU1tyJqKb5,s0OX2g7nYRctJxyFoQCa6clac9IsjHyi,
THIAtjllvYNlnTmiMnJEIHd2Ju4gqQIO,jwVk3JYZJNHloW7HO4LeGxFexknqro0v,
BFNRGOGmGGJNNPZqgeF1ikTNFfskbyLc,...

Were fixed in 4392.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4428 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 03:37:13 +00:00
ebanks aa00801108 remove reference to -mrl
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4423 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 17:27:01 +00:00
chartl f978c25b9d Perhaps both, Eric. Perhaps both.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4422 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 13:56:04 +00:00
chartl 0eb777612a Swap "." over to VCFConstants.MISSING_DEPTH_v3
Why v3, you ask? Why not? Simply because v2 was a String so old and clunky, the sun would fizzle out and grow cold before any VCF could be successfully parsed.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4421 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 13:41:41 +00:00
chartl 74087c44ae Fixed a bug which caused a parsing exception when there was a variant with a dp field of ".", e.g. "GT:DP 0/1:." -- which can happen when using imputation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4420 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 12:37:36 +00:00
ebanks 6448753cf7 Removed the SequenomValidationConvertor and renamed it VariantValidationAssessor since it no longer handles ped/sequenom files (but instead works on vcfs/variantcontexts). Updated all of the wiki docs, including adding instructions on how to convert ped files to vcf, a la Shaun Purcell. We now officially no longer support ped files everyone. Other misc cleanup in the code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4419 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 02:11:38 +00:00
ebanks d8db48204e Fix typo and tell people not to post user errors
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4415 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-03 18:58:03 +00:00
ebanks 490e5e1b0f Better error when bad ref bases are provided
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4414 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-03 05:40:37 +00:00
aaron 64b7b3f83b fix for a recent change to the indexing code where we ignore the results of locking the file (this is bad), and as a result don't write the index; this should fix the build.
Off to Yosemite in 4 hours, enjoy the week gsa folks!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4410 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 04:35:11 +00:00
depristo 7551ba8249 Trival refactoring in preparation for on-the-fly indexing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4409 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 22:32:59 +00:00
rpoplin 2f7892601c Useful debugging argument added to VariantRecalibrator to only use sites whose qual field is above --qual
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4406 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 21:08:55 +00:00
hanna 575c38fc04 Accidental fail to commit missing file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4405 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 20:26:51 +00:00
delangel d4398f2686 silly bug fix: if I'm to do a short term hack to avoid -infinity likelihoods I might as well do it right.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4403 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 18:39:45 +00:00
hanna 8d25a5f9f2 A mechanism for supplying attribution text -- mainly useful for external
walkers.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4402 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 18:31:19 +00:00
delangel e920badcc4 Temporary fix for case where genotype likelihoods are exactly (1,0,0) or (0,1,0) etc. at a site with new indel genotyper: this would make us blow up when converting to log space and try to assign genotypes at a site. A more robust solution is in the works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4401 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 17:43:43 +00:00
rpoplin b83fdf8a17 Bug fix in AnalyzeAnnotations. Be sure the site is a biallelic, unfiltered SNP.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4400 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 13:09:46 +00:00
delangel fa9c21c020 More fixes for exact AF calculation model in new unified genotyper:
a) Fixed bugs in new dynamic programming-based genotyper
b) Fixed up temp hack that handles extended pileups for now.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4398 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 02:32:50 +00:00
delangel eb67aee732 bug fix: forgot to uncomment code to compute genotype likelihoods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4397 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 21:38:22 +00:00
delangel ece694d0af Next iteration on new UG framework:
- Brought over exact AF estimation from branch (which is now dead). Exact model is default in UnifiedGenotyperV2.
- Implemented completely new genotyping algorithm given best AF estimate using dynamic programming, which in theory should be better than both greedy search and any HWE-based genotyper.
- Integrated and added new Dindel likelihood estimation model.
- Corrected annotators that would call readBasePileup: since we can be annotating extended events, best way is to interrogate context for kind of pileup and either readBasePileup or readExtendedEventPileup.

All changes above except last one are still in playground since they require more testing.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4396 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 21:33:59 +00:00
hanna bf7fd08810 Fix newly-introduced bug in the PluginManager/DynamicClassResolutionException
where, when the system can't find a plugin of the correct name, the system
prefers to crap all over itself and throw an unintelligible NullPointerException
rather than displaying an intelligent error.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4393 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 19:07:05 +00:00
hanna 14e19f4605 (Slightly) better exception text when SAM/BAM output file can't be created.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4392 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 18:43:22 +00:00
hanna 1fb8c86f6d Looks like we've got two competing models for an empty interval list: null and
the empty list.  Score another victory for the integration tests.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4391 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 17:11:47 +00:00
hanna 78343be52c At some time in the recent past, we lost our ability to process the '-L all'
argument.  Brought it back, and added an integrationtest to make sure it
stays around.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4390 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 15:58:43 +00:00
delangel e80742e72f Use -o as argument for output file in ProduceBeagleInputWalker, to be consistent with other walkers (you're welcome, chartl :)).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4386 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 22:46:39 +00:00
hanna 732aa32758 Every Sting app from now on will be forced into the US English locale.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4385 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 21:55:21 +00:00
fromer 20ffe484bc Added detection and INFO field marking of phasing inconsistencies (and optional filtration using --filterInconsistentSites)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4384 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 19:28:56 +00:00
rpoplin a6c7de95c8 By using the AC info field instead of parsing the genotypes we cut 78% off the runtime of VariantRecalibrator. There is a new argument to force the parsing of genotypes if necessary. Various other optimizations throughout.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4383 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 18:56:50 +00:00
ebanks 2d1265771f Fix for G: make sure to generate the genotype conformations in the grid for the target frequency when not using grid search for anything except the conformations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4382 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 16:44:53 +00:00
delangel 4556e3b273 First iteration in filling up exact AF calculation with new refactored UG. Code computes EM iterations of exact AF spectrum and returns to caller.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4381 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 16:21:54 +00:00
ebanks 0d71dff928 Small bug fix to the new UG (need to initialize the entire posteriors array) means that we also get identical results as old UG when calling with 60 samples in the pilot1 data. Now that I'm happier with UGv2, I've transitioned it to use the correct AF priors instead of the busted ones still in the old UG.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4379 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 14:24:50 +00:00
hanna eee134baf2 Chris found a bug in the downsampler where, if the number of reads entering
the pileup at the next alignment start is large, we don't add as many of those
incoming reads as we should.  No integration tests were affected.

Thanks, Chris!


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4378 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 11:18:12 +00:00
ebanks 0ec07ad99a Initial version of refactored Unified Genotyper. Using SNP genotype likelihoods and GRID_SEARCH AF estimation models, achieves the exact same results as original UG on 1-2 samples with the exception of strand bias (not implemented yet); other than that I have no idea. Needs tons more testing. Do not use. For Guillermo only.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4377 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 08:42:25 +00:00
kshakir 6df7f9318f For enums generate the full path to the Enum type to avoid collisions such as enum Model and enum Model used in the same class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4376 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 05:28:59 +00:00
fromer e322e71c2f Restored SVN history for phasing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4373 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 00:02:02 +00:00
fromer 720aaca8a0 Trying to restore SVN history for phasing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4372 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:50:28 +00:00
fromer bf88117ead Trying to restore SVN history for phasing directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4371 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:48:24 +00:00
fromer dfb5143a41 Restore folder
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4370 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:46:07 +00:00
fromer 7c909bef82 Moved phasing classes out of playground! The code is still under production, though...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4369 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:21:28 +00:00
fromer 8d8980e8eb Fixed phasing algorithm to: 1. More correctly weed out irrelevant reads and sites; 2. Crudely flag sites with large phase discrepancies betweens reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4368 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 23:02:53 +00:00
chartl 5a5c72c80d Accidentally commited some debug output to PackageUtils, reverting change.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4367 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 21:58:42 +00:00
chartl 862c94c8ce Small change for Matt -- output partition types in lexicographic order.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4365 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 20:08:03 +00:00
ebanks 7ad87d328d Make sure to uppercase ref bases since they aren't coming from the engine
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4364 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 19:05:46 +00:00
bthomas 96cccafb0d Adding a few helper methods for accessing sample metadata, and associated unit tests. These are motivated by discussion with Ryan about how he'll use sample metadata in VariantEvalwalker - hopefully will make it easier for him. Methods are:
-- getToolkit().subContextFromSampleProperty(): filters a VariantContext to genotypes that come from samples that have a given property value
-- getToolkit().getSamplesWithProperty(): gets all samples with a given property
-- getToolkit().getSamplesFromVariantContext(): sample objects that are referenced by name in a VariantContext



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4361 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-28 02:16:25 +00:00
ebanks 1034853a84 Adding 'solexa' to list of known/supported platforms
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4357 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-27 02:38:38 +00:00
aaron 70f03a7113 first pass of well-formatted tribble exceptions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4352 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-25 03:29:33 +00:00