Commit Graph

10888 Commits (9c63cee9fcdb69a7a8e8d77a771ddb2afa18f7cd)

Author SHA1 Message Date
Mark DePristo 06687bfaf6 Intermediate commit on simplifying AFCalcResult
-- Renamed old class AFCalcResultTracker.  This object is now allocated by the AFCalc itself, since it is heavy-weight and was badly optimized in the UG with a thread-local variable. Now, since there's already a AFCalc thread-local there, we get that optimization for free.
-- Removed the interface to provide the AFCalcResultTracker to getlog10PNonRef.
-- Wrote new, clean but unused AFCalcResult object that will soon replace the tracker as the external interface to the AFCalc model results, leaving the tracker as an internal tracker structure.  This will allow me to (1) finally test things exhaustively, as the contracts on this class are clear (2) finalize the IndependentAllelesDiploidExactAFCalc class as it can work with a meaningfully defined result across each object
2012-10-15 07:53:56 -04:00
Mark DePristo c82aa01e0e Generalize testing infrastructure to allow us to run specific n.samples calculation 2012-10-15 07:53:55 -04:00
Mark DePristo ec935f76f6 Initial implementation and tests for IndependentAllelesDiploidExactAFCalc
-- This model separates each of N alt alleles, combines the genotype likelihoods into the X/X, X/N_i, and N_i/N_i biallelic case, and runs the exact model on each independently to handle the multi-allelic case.  This is very fast, scaling at O(n.alt.alleles x n.samples)
-- Many outstanding TODOs in order to truly pass unit tests
-- Added proper unit tests for the pNonRef calculation, which all of the models pass
2012-10-15 07:53:55 -04:00
Mark DePristo 5a4e2a5fa4 Test code to ensure that pNonRef is being computed correctly for at least 1 genotype, bi and tri allelic 2012-10-15 07:53:55 -04:00
Mark DePristo ee2f12e2ac Simpler naming convention for AlleleFrequencyCalculation => AFCalc 2012-10-15 07:53:55 -04:00
Mark DePristo cf3f9d6ee8 Reorganize and cleanup AFCalculations
-- Now contained in a package called afcalc
-- Extracted standard alone classes from private static classes in ExactAF
-- Most fields are now private, with accessors
-- Overall cleaner organization now
2012-10-15 07:53:55 -04:00
Mark DePristo 13211231c7 Restructure and cleanup ExactAFCalculations
-- Now there's no duplication between exact old and constrained models.  The behavior is controlled by an overloaded abstract function
-- No more static function to access the linear exact model -- you have to create the surrounding class.  Updated code in the system
-- Everything passes unit tests
2012-10-15 07:53:54 -04:00
Mark DePristo 99ad7b2d71 GeneralPloidyExact should use indel max alt alleles 2012-10-15 07:53:54 -04:00
Mark DePristo bf276baca0 Don't try to compute full exact model for > 100 samples 2012-10-15 07:53:54 -04:00
Mark DePristo b924e9ebb4 Add OptimizedDiploidExactAF to PerformanceTesting framework 2012-10-15 07:53:54 -04:00
Mark DePristo f800f3fb88 Optimized diploid exact AF calculation uses maxACs to stop the calculation by maxAC by allele
-- Added unit tests to ensure the approximation isn't so far from our reference implementation (DiploidExactAFCalculation)
2012-10-15 07:53:54 -04:00
Mark DePristo efad215edb Greedy version of function to compute the max achievable AC for each alt allele
-- walks over the genotypes in VC, and computes for each alt allele the maximum AC we need to consider in that alt allele dimension.  Does the calculation based on the PLs in each genotype g, choosing to update the max AC for the alt alleles corresponding to that PL.  Only takes the first lowest PL, if there are multiple genotype configurations with the same PL value.  It takes values in the order of the alt alleles.
2012-10-15 07:53:54 -04:00
Mark DePristo 7666a58773 Function to compute the max achievable AC for each alt allele
-- Additional minor cleanup of ExactAFCalculation
2012-10-15 07:53:53 -04:00
Mark DePristo b3cb33a416 simple script to run nano schedule main[] 2012-10-15 07:52:02 -04:00
Guillermo del Angel a4767a20be Bug fixes for temp mutect integration 2012-10-13 22:03:41 -04:00
Guillermo del Angel e3a8ed2151 Further bug fixes to merge cancer/germline fastq-bam pipelines 2012-10-13 11:16:14 -04:00
Guillermo del Angel b961f78f49 Temp fixes 2012-10-12 16:14:43 -04:00
Kristian Cibulskis 661fa5b98c added support for indel calling (with non-VCF format output) 2012-10-12 16:02:05 -04:00
Eric Banks a8efa5451a Protect against bad bases users have screwy data (or try to use zipped references) 2012-10-12 15:05:03 -04:00
Guillermo del Angel 7e1657d243 Merge branch 'unstable' of github.com:broadinstitute/cmi-gatk into unstable 2012-10-12 14:49:37 -04:00
Mauricio Carneiro 274ac4836f Allowing the GATK to have non-required outputs
Modified the SAMFileWriterArgumentTypeDescriptor to accept output bam files that are null if they're not required (in the @Output annotation).

This change enables the nWayOut parameter for the IndeRealigner and ReduceReads to operate optionally while maintaining the original single way out.

[#DEV-10 transition:31 resolution:1]
2012-10-12 14:49:16 -04:00
Mauricio Carneiro 05111eeaef Making nContigs parameter hidden in ReduceReads
For now, the het reduction should only be performed for diploids (n=2). We haven't really tested it for other ploidy so it should remain hidden until someone braves it out.
2012-10-12 14:49:15 -04:00
Guillermo del Angel 32e377a0db Fix bugs so that we can pass in 2 simultaneous samples in metadata (no co-cleaning yet but at least we don't need to run pipeline twice) to produce 2 bams. Pasted temp mutect so it's also run at the end of the run 2012-10-12 14:39:28 -04:00
David Roazen da1cffbfca Run performance tests in gsa-engineering queue on gsa4 rather than gsa queue
Running the performance tests on the farm wasn't working out very well --
it's been too long since they've run to completion. Switching back to
running them on gsa4 for now.
2012-10-12 14:21:27 -04:00
Guillermo del Angel dc03a09722 Merge branch 'develop' into unstable 2012-10-12 14:19:42 -04:00
Kristian Cibulskis c1706ef0ef upgraded mutation caller with VCF output
raw indel calls (non filtered,non vcf)
2012-10-12 14:18:12 -04:00
Guillermo del Angel 5971006678 Bug fix when running nondiploid mode in UG with EMIT_ALL_SITES: if site was reference-only, QUAL is produced OK but genotypes were being set to no-call because of unnecessary likelihood normalization. May change integration test md5 which I'll fix later today 2012-10-12 12:45:55 -04:00
Eric Banks 81532a0529 Missing file are user errors. 2012-10-12 09:48:12 -04:00
Eric Banks fa77a83783 Update the out of space error to include another permutation 2012-10-12 09:38:12 -04:00
Eric Banks 85525d9e6e Make Geraldine's life easier: from now on we treat problems where a temp file cannot be found when running the GATK with multiple threads as User Errors (since they are 99.9% of the time). This is an extremely large class of errors in Tableau and on the forums. Helpful error message tells users exactly what we tell them on the forums anyways (Geraldine: feel free to edit). 2012-10-12 09:19:50 -04:00
Eric Banks ad60300bee Catch malformed BAM files at the source since this is the largest class of errors in Tableau. 2012-10-12 09:07:57 -04:00
Eric Banks 593c8065d9 Fix docs for BadMateFilter 2012-10-12 08:35:45 -04:00
Christopher Hartl 6b9987cf1b Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable 2012-10-12 00:48:42 -04:00
Christopher Hartl c1211ad3a1 Full test suite of LD-corrected GRM calculation. The correctness of this code is now largely verified. Matches GCTA when no correction is used (up to 6 decimal places). Bed reading relies on a particular test directory that is still local. The rest is all generated in unit test fashion. 2012-10-12 00:46:02 -04:00
David Roazen 3861212dab Fix inefficiency in FilePointer GenomeLoc validation
Validation of GenomeLocs in the FilePointer class was extremely inefficient
when the GenomeLocs were added one at a time rather than all at once.

Appears to mostly fix GSA-604
2012-10-11 19:55:14 -04:00
Guillermo del Angel 47e9d967fe Merging in from cmi-develop branch - staying in this branch for now 2012-10-11 15:35:43 -04:00
Guillermo del Angel 77949ec740 Some fixes to QC commands in pipeline, and workaround for critical engine bug in GATK that makes it hang when doing small targeted BAM's with a whole exome interval list 2012-10-11 15:08:30 -04:00
Ami Levy Moonshine ef3882f439 PhaseByTransmission: small typo /n. variantCallQC_summaryTablesOnly.R: small changes (more to come) /n GeneralCallingPipeline.scala: the new pipeline script. It is not as clean as I want it to be, but it works. I still going to work on it a little bit more. Also, it does not include yet: (1) the RR step (2) need better eval step (3) need to include other targets (currently it eork on the CEU Trio) 2012-10-11 14:51:41 -04:00
Guillermo del Angel af5a6fdace Resolve [DEV-7]: add single-sample VCF calling at end of FASTQ-BAM pipeline. Initial steps of [DEV-4]: queue extensions for Picard QC metrics 2012-10-11 11:09:49 -04:00
Mark DePristo 9b19f5ce99 No longer include stack traces for user exceptions in GATK logs
-- Was taking a shocking large amount of space on the server, and slowing down Tableau so much all stack traces had to be disabled
2012-10-10 20:41:03 -04:00
Ryan Poplin 08b8ce6903 Fixing merge conflicts related to the comment formatting in the BQSR. 2012-10-10 16:03:58 -04:00
Ryan Poplin 45717349dc Fixing BQSR bug reported on the forum for reads that begin with insertions. 2012-10-10 16:01:37 -04:00
David Roazen 40a3b5bfe2 Revert "Testing github auto-mirroring attempt #2; please ignore"
This reverts commit aacbe369446af8d7901820bf828ed15d72497005.
2012-10-10 15:28:50 -04:00
David Roazen fba6a084e4 Testing github auto-mirroring attempt #2; please ignore 2012-10-10 15:28:13 -04:00
David Roazen 267d1ff59c Revert "Testing the new github auto-mirroring; please ignore"
This reverts commit bd8b321132167f6f393f234ea0e93edcfd8701ff.
2012-10-10 15:07:48 -04:00
David Roazen 66ee3f230f Testing the new github auto-mirroring; please ignore 2012-10-10 15:06:50 -04:00
Mauricio Carneiro e9eaa33c0b adding some directories to gitignore 2012-10-10 13:26:13 -04:00
Mauricio Carneiro 29195cd3aa Removed the intellij files from the root and made an example package for new users. This allows users to start at the same page and then change it as they see fit without interfering with the repo (thanks guillermo!) 2012-10-10 13:25:38 -04:00
Mauricio Carneiro fdf29503fb removing annoying xml from IDEA configuration 2012-10-10 13:25:38 -04:00
Mauricio Carneiro e29bcab42e Updating Intellij enviroment and adding Scala 2012-10-10 13:25:38 -04:00