Mauricio Carneiro
a640afa995
adding some directories to gitignore
2012-09-27 11:09:41 -04:00
Mauricio Carneiro
3e68fee764
Removed the intellij files from the root and made an example package for new users. This allows users to start at the same page and then change it as they see fit without interfering with the repo (thanks guillermo!)
2012-09-27 11:04:56 -04:00
Christopher Hartl
abbe757907
Merged bug fix from Stable into Unstable
2012-09-27 00:15:35 -04:00
Christopher Hartl
55cdf4f9b7
Commit changes in Variants To Binary Ped to the stable repository to be available prior to next release.
2012-09-27 00:13:32 -04:00
Mauricio Carneiro
b9dab068ee
New version of the pipeline starting from an ALIGNED bam going all the way to reducing using n-way out cleaning
2012-09-26 16:16:53 -04:00
Mauricio Carneiro
f8b954334e
Revised implementation of the RAWBAM => BAM pipeline
...
stripped out all the FQ pipeline and tumor/normal information.
2012-09-26 13:37:15 -04:00
Mark DePristo
33b2f65bbd
Script to evaluate SNP and indel calls for the experimental downsampler
...
-- Calls NA12878 with and without the expt. downsampler on chr1
-- Creates combined vcf, annotating sites as overlapping omni SNPs and Mills indels
-- Creates simple combined.table that has chr, pos, set, and type to easily ID missed good sites with the new downsampler
2012-09-26 11:33:06 -04:00
Ryan Poplin
f009424952
Adding Phase2 HC calling qscripts for both the original calls and the project consensus.
2012-09-26 11:26:24 -04:00
Ryan Poplin
e49fe74612
Adding some of the qscripts from my BQSR experiments.
2012-09-26 10:55:34 -04:00
Mauricio Carneiro
c9c2682f86
removing annoying xml from IDEA configuration
2012-09-25 17:18:44 -04:00
Mauricio Carneiro
9486131d17
First implementation of the CMI data processing pipeline, handling both germline and cancer BAM/FQ => BAM.
...
Not ready for prime time yet, need more work!
2012-09-25 17:15:42 -04:00
Mauricio Carneiro
cb8d4c97e1
First implementation of a generic 'bundled' Data Processing Pipeline for germline and cancer.
...
not ready for prime time yet!
2012-09-25 17:13:50 -04:00
Mauricio Carneiro
65b100f9b0
Reverting the DPP to the original version, going to create a new simplified version for CMI in private.
2012-09-25 12:02:34 -04:00
Mauricio Carneiro
4324bd72fd
Updating Intellij enviroment and adding Scala
2012-09-25 10:51:53 -04:00
Mark DePristo
e1524ebbc8
NA12878 HiSeq b37 20:10-11 mb test files
2012-09-25 08:54:02 -04:00
Eric Banks
caa431c367
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-24 21:46:36 -04:00
Eric Banks
11a71e0390
RR bug: when determining the most common base at a position, break ties by which base has the highest sum of base qualities. Otherwise, sites with 1 Q2 N and 1 Q30 C are ending up as Ns in the consensus. I think perhaps we don't even care about which base has the most observations - it should just be determined by which has the highest sum of base qualities - but I'm not sure that's what users would expect.
2012-09-24 21:46:14 -04:00
Mauricio Carneiro
4aad135f8c
Generic input file name recognition (still need to implement support to FastQ, but it now can at least accept it)
2012-09-24 17:01:17 -04:00
Mauricio Carneiro
ca84586443
Adding default intellij configuration files
2012-09-24 16:15:57 -04:00
David Roazen
3f44b3e019
Update DataProcessingPipelineTest MD5s
2012-09-24 15:38:07 -04:00
David Roazen
0b488cce66
ExperimentalReadShardBalancer: close() exhausted iterators
...
Fixes a truly awful SAMReaders resource leak reported by Eric -- thanks Eric!
2012-09-24 14:52:59 -04:00
Mark DePristo
9fd30d6f1c
When writing the initial commit for nt + nct I realized this class was really just a ThreadGroupOutputTracker
...
-- The code is cleaner and the logical more obvious now.
2012-09-24 14:15:36 -04:00
Mark DePristo
3e8d992828
Remove bad error test from MicroScheduler, as it's no longer applicable.
2012-09-24 14:15:36 -04:00
Mark DePristo
a6b3497eac
Fixes GSA-515 Nanoscheduler GSA-577 -nt and -nct together appear to not close resources properly
...
-- Fixes monster bug in the way that traversal engines interacted with the NanoScheduler via the output tracker.
-- ThreadLocalOutputTracker is now a ThreadBasedOutputTracker that associates via a map from a master thread -> the storage map. Lookups occur by walking through threads in the same thread group, not just the thread itself (TBD -- should have a map from ThreadGroup instead)
-- Removed unnecessary debug statement in GenomeLocParser
-- nt and nct officially work together now
2012-09-24 14:15:35 -04:00
Mark DePristo
4749fc114f
Temp. disable -nt > 1 and -nct > 1 while bugs are worked out
2012-09-24 14:15:35 -04:00
Mark DePristo
847e79247d
Use SNP only model for NCT big exome test
2012-09-24 14:15:35 -04:00
Mark DePristo
f42e55c9df
Add NCT specific performance test for 5K exomes
2012-09-24 14:15:35 -04:00
Mark DePristo
09bbd2c4c3
Include exception in VCFWriter when one is found when rethrowing as ReviewedStingException
2012-09-24 14:15:35 -04:00
Mark DePristo
10a6b57be6
Fix thread name: should be master executor not input
2012-09-24 14:15:35 -04:00
Eric Banks
9464dfdbf2
Don't penalize the reduced reads for spanning deletions (when surrounding base quals are Q2s)
2012-09-24 14:06:07 -04:00
Ami Levy Moonshine
f98b1d38b5
just lines indentation
2012-09-24 13:47:09 -04:00
Eric Banks
6a73265a06
RR bug: we were adding synthetic reads from the header only before the variant region, which meant that reads that overlap the variant region but that weren't used for the consensus (because e.g. of low base quality for the spanning base) were never being used at all. Instead, add synthetic reads from before and spanning the variant region.
2012-09-24 13:29:37 -04:00
Eric Banks
ef680e1e13
RR fix: push the header removal all the way into the inner loops so that we literally remove a read from the general header only if it was added to the polyploid header. Add comments.
2012-09-24 11:14:18 -04:00
Eric Banks
1509153b4b
Adding my little walker to assess reduced bam coverage against the original bam because it's turning out to be very useful.
2012-09-23 00:47:40 -04:00
Eric Banks
0187f04a90
Proper fix for a previous RR bug fix: only remove reads from the header if they were actually used in the creation of the polyploid consensus.
2012-09-23 00:39:19 -04:00
Eric Banks
74bb4e2739
Fixing the VariantContextUtilsUnitTest
2012-09-22 23:24:55 -04:00
Eric Banks
344083051b
Reverting the fix to the generalized ploidy exact model since it cannot handle it computationally. Will file this in the JIRA.
2012-09-22 23:07:28 -04:00
Eric Banks
25e3ea879a
Oops, missed this test before when updating md5s
2012-09-22 22:16:35 -04:00
Eric Banks
ced652b3dd
RR bug: we need to call removeFromHeader() for reads that were used in creating a polyploid consensus or else they are reused later in creating synthetic reads. In the worst case, this bug caused the tool to create 2 copies of the reduced read.
2012-09-22 21:50:10 -04:00
Eric Banks
60b93acf7d
RR bug: we need to test that the mapping and base quals are >= the MIN values and not just >. This was causing us to drop Q20 bases.
2012-09-22 21:32:29 -04:00
David Roazen
f6a22e5f50
ExperimentalReadShardBalancerUnitTest was being skipped; fixed
...
TestNG skips tests when an exception occurs in a data provider,
which is what was happening here.
This was due to an AWFUL AWFUL use of a non-final static for
ReadShard.MAX_READS. This is fine if you assume only one instance
of SAMDataSource, but with multiple tests creating multiple SAMDataSources,
and each one overwriting ReadShard.MAX_READS, you have a recipe for
problems. As a result of this the test ran fine individually, but not as
part of the unit test suite.
Quick fix for now to get the tests running -- this "mutable static"
interface should really be refactored away though, when I have time.
2012-09-22 01:56:39 -04:00
David Roazen
e077347cc2
Re-allow running the GATK with experimental downsampling
...
It's now possible to run with experimental downsampling enabled
using the --enable_experimental_downsampling engine argument.
This is scheduled to become the GATK-wide default next week after
diff engine output for failing tests has been examined.
2012-09-21 23:20:46 -04:00
David Roazen
34eed20aa6
PerSampleDownsamplingReadsIterator: fix for incorrect use of DOWNSAMPLER_POSITIONAL_UPDATE_INTERVAL
...
Notify all downsamplers in our pool of the current global genomic position every
DOWNSAMPLER_POSITIONAL_UPDATE_INTERVAL position changes, not every single
positional change after that threshold is first reached.
2012-09-21 22:43:39 -04:00
David Roazen
133085469f
Experimental, downsampler-friendly read shard balancer
...
-Only used when experimental downsampling is enabled
-Persists read iterators across shards, creating a new set only when we've exhausted
the current BAM file region(s). This prevents the engine from revisiting regions discarded
by the downsamplers / filters, as could happen in the old implementation.
-SAMDataSource no longer tracks low-level file positions in experimental mode. Can strip
out all related code when the engine fork is collapsed.
-Defensive implementation that assumes BAM file regions coming out of the BAM Schedule
can overlap; should be able to improve performance if we can prove they cannot possibly
overlap.
-Tests a bit on the extreme side (~8 minute runtime) for now; will scale these back
once confidence in the code is gained
2012-09-21 22:17:58 -04:00
Guillermo del Angel
ab8fa8f359
Bug fix: AlleleCount stratification in VariantEval didn't support higher ploidy and was producing bad tables
2012-09-21 20:48:12 -04:00
Eric Banks
dcd31e654d
Turn off RR tests while I debug
2012-09-21 17:26:00 -04:00
Eric Banks
21251c29c2
Off-by-one error in sliding window manifests itself at end of a coverage region dropping the last covered base.
2012-09-21 17:22:30 -04:00
Mauricio Carneiro
2c3dc291c0
Added positive/negative strand to the synthetic reads
2012-09-21 10:00:48 -04:00
Mauricio Carneiro
51cb5098e4
Fixed the alignment issues with reads that started with empty consensus headers
2012-09-21 10:00:47 -04:00
Mauricio Carneiro
aa1d2f3a5b
Not every consensus is well aligned. Need to check more, but starting position has been fixed.
2012-09-21 10:00:45 -04:00