Commit Graph

1167 Commits (d0ca6f8a9c187e523b28b9265a52d931d60bb4de)

Author SHA1 Message Date
chartl 84c2c5d7e6 Stop running away from my commits, test modules.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5919 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 13:05:53 +00:00
chartl 092952db44 After verifying that the changes to these tests were all in the RankSum annotations, I'm commiting fixes to the test md5s.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5918 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 13:01:18 +00:00
chartl 511cd48d7a There is an edge case ( |Set1| = 5, |Set2| = 4) where the exact p-value exceeds the range of the normal distribution we want to invert. For the edge cases, this happens exactly at the mean, and so this can be safely replaced with a z value of 0.0
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5915 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 17:30:09 +00:00
chartl a79967d9af After extensive testing of MannWhitneyU:
- Verified that exact calculations do agree with R's dwilcox()
 - Verified that exact calculations do not agree with R's wilcox.test
   + This is because R does a correction, and calculates CDFs rather than PDFs (e.g. sums over dwilcox() values)
 - Can now specify MWU to calculate cumulative exact tests, rather than point probabilities
 - Z-scores are now calculated properly for exact tests
   + Previously, z-values calculated by inverting normal CDF from U-statistic PDF
   + Now both inversions are done, with a smart heuristic (biased variance) to make the point-calculated Z-value more accurate
   + Additional tests



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5911 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 15:51:27 +00:00
rpoplin 2b5683909e Updated VQSR integration tests because of the new Omni file. Fixed overflow condition in FisherStrand when the depth is too high.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5910 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 14:20:37 +00:00
ebanks 44cb7e4980 Renaming to make grepping through the output less confusing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5908 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-31 19:54:44 +00:00
rpoplin 9e834391fe We now skip over all covering RODs in the BQSR as intended instead of just those which can be converted into a VariantContext. All the integration tests change because of subtleties in how certain dbsnp rod records are being converted into VCs. Added integration test which uses a bed file as the list of known polymorphic sites.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5892 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 16:32:17 +00:00
depristo 8ed82e5a08 The previous version of the UG was always creating BAQ'd pileups for the underlying site QUAL calculation. This resulted in some slowdown in the code. But as far as I can tell, the code actually didn't apply the BAQ'd base quality anywhere when the BAQ field wasn't in the read, so this just saves us 20% of the runtime when BAQ isn't enabled from heading into the BAQ subsystem when we don't actually want to get the BAQ'd base qualities.
Fixed minor problem with WalkerTest for "" (for parameterization) md5s.
Added an explicit integrationtest for BAQ NONE
Now only creates the BAQ'd pileup, if the useBAQPileup parameter is provide in initializeAlternateAllele.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5891 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 14:00:52 +00:00
depristo 136c8c7900 ClipReads now supports HARDCLIP_BASES, though in fact this turned out to be not necessary for my desired tests. In the process of developing the HARDCLIP mode, I added some proper ReadUtils unit tests, which would ideally be expanded to include other ReadUtil functions, as added
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5890 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 11:42:22 +00:00
delangel f7298f4a7f First of many baby steps to redo way in which we trigger events for indel calling and to eliminate extended events: get rid of SpanningDeletions annotation for indels. It's completely useless, and even more so once we no longer trigger at extended events (because we'll trigger by definition a base before a deletion starts, so deletions present in the current pileup are not informative).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5876 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 00:49:23 +00:00
depristo 1bd1404aa9 Sometimes md5s can be null
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5867 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 19:17:18 +00:00
depristo e582a92af6 WalkerTest now checks for valid md5s in the integrationtests themselves, so no more stray whitespace errors. Added a WalkerTestTest to ensure tha t bad MD5s are detected and an error thrown
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5865 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 14:34:55 +00:00
hanna 06486c134a Kill extra space in the md5.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5863 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 12:00:31 +00:00
depristo 57e4693e4c Slightly better error message when failing to create the index on the fly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5861 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 11:04:08 +00:00
depristo cf3dbfee97 Renamed variantMergeOptions to filteredRecordsMergeType, as this is really what it does. Cleaned up the wiki so that it's clear what this does, as well as included an example of how to create an intersection with CombineVariants and SelectVariants. Added integrationtests of CombineVariants with OMNI and HapMap that deal with the two ways to merge fitlered/unfiltered records at the same site.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5860 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 01:54:29 +00:00
hanna 4bfec4c55b Reenabling E.coli ValidatingPileup with MV1994 realigned using the BWA/C bindings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5856 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:32:53 +00:00
hanna 5dca1e4d2e Make IntervalIntegrationTest aware of the new alignments in the MV1994.bam
testset.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5852 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 19:59:47 +00:00
chartl 7ff5375493 Removing build-killing dependency on a private package.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5851 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 18:13:15 +00:00
chartl 0b07373909 Incorporating old feedback from eric: @deprecated methods should not be @deprecated, but rather protected, and the test's package moved to where it can access those test methods.
Also allows for the slightly more awesome name "MWUnitTest"



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5850 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 18:06:05 +00:00
chartl 480859db50 Contractified version of MannWhitneyU. Some behavior has been changed:
- Running a test when there are no observations of at least one of the sets now breaks the MWU contract
   + MWU returns Pair(Double.NaN,Double.NaN) in these instances to maintain the contract of never returning null
   + No more Double.Infinity values will appear
 - RankSumTests now probe the return values for NaNs, and don't annotate if they appear
 - For small sets where the probability is calculated recursively, the z-value is now the inversion of the error function
    and not the approximate z-value
 - UG and Annotator integration tests updated to reflect changes



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5845 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 13:57:15 +00:00
depristo a18b0152df Contracts for SimpleTimer, as well as UnitTests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5841 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 19:45:31 +00:00
depristo f608ed6d5a Removed old (and unused) reporting system, now that Kiran's VE reporting system is working. Refactors dictionary creation error messages into UserExceptions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5836 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 18:42:52 +00:00
depristo 6a49e8df34 Significant change to the way subsetting by sample works with monomorphic sites. Now keeps the alt allele, even if a record is AC=0 after the subset. Previously, the system dropped the alt allele, which I don't think is the right behavior. If you really want a VCF without monomorphic sites, use the option to drop monomorphic sites after subsetting. See detailed information below.
Right now, if you select a multi-sample VCF file down (or one with filters I see) down to a smaller set of samples, and the site isn't polymorphic in that subgroup, then the alt allele is lost.  For example, when selecting down NA12878 from the OMNI, I previously received the following VCF:

1       82154   rs4477212       A       .       .       PASS    AC=0;AF=0.00;AN=2;CR=100.0;DP=0;GentrainScore=0.7826;HW=1.0     GT:GC   0/0:0.7205
1       534247  SNP1-524110     C       .       .       PASS    AC=0;AF=0.00;AN=2;CR=99.93414;DP=0;GentrainScore=0.7423;HW=1.0  GT:GC   0/0:0.6491
1       565286  SNP1-555149     C       T       .       PASS    AC=2;AF=1.00;AN=2;CR=98.8266;DP=0;GentrainScore=0.7029;HW=1.0   GT:GC   1/1:0.3471
1       569624  SNP1-559487     T       C       .       PASS    AC=2;AF=1.00;AN=2;CR=97.8022;DP=0;GentrainScore=0.8070;HW=1.0   GT:GC   1/1:0.3942

Where the first two records lost the ALT allele, because NA12878 is hom-ref at this site.  My change results in a VCF that looks like:

1       82154   rs4477212       A       G       .       PASS    AC=0;AF=0.00;AN=2;CR=100.0;DP=0;GentrainScore=0.7826;HW=1.0     GT:GC   0/0:0.7205
1       534247  SNP1-524110     C       T       .       PASS    AC=0;AF=0.00;AN=2;CR=99.93414;DP=0;GentrainScore=0.7423;HW=1.0  GT:GC   0/0:0.6491
1       565286  SNP1-555149     C       T       .       PASS    AC=2;AF=1.00;AN=2;CR=98.8266;DP=0;GentrainScore=0.7029;HW=1.0   GT:GC   1/1:0.3471
1       569624  SNP1-559487     T       C       .       PASS    AC=2;AF=1.00;AN=2;CR=97.8022;DP=0;GentrainScore=0.8070;HW=1.0   GT:GC   1/1:0.3942

The genotype remains unchanged, but the ALT allele is now preserved.  I think this is the correct behavior, as reducing samples down shouldn't change the character of the site, only the AC in the subpopulation.  This is related to the tricky issue of isPolymorphic() vs. isVariant().  

isVariant => is there an ALT allele?
isPolymorphic => is some sample non-ref in the samples?

In part this is complicated as the semantics of sites-only VCFs, where ALT = . is used to mean not-polymorphic.  Unfortunately, I just don't think there's a consistent convention right now, but it might be worth at some point to adopt a single approach to handling this.  Wiki docs updated.

Does anyone have critical infrastructure that depends on the previous convention?  Let me know so we can coordinate the change.

There's a new function subContextFromGenotypes() that also takes a Set<Allele> to handle this type of behavior.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5832 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-21 13:59:16 +00:00
depristo e234589240 Contracts for GenomeLocParser and GenomeLoc are now fully implemented.
GenomeLocs can officially have any start/stop values from -Inf - +Inf.  Bounds w.r.t. the reference are enforced, optionally, by GenomeLocParser.  General code cleanup throughout the subsystem.

All validation code for GLs is now centralized, and all I/O systems now validate their inputs.  Because of this, the Picard interval processing code has been changed to examine whether an interval is valid, and only keep the valid intervals.  Note that the scatter/gather test was changed, because the original hg18 chr20 interval files as actually malformed (all records for some reason where on chr20).  

Many interval processing routines were moved to IntervalUtils, as this is their natural home.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5830 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-21 02:01:59 +00:00
depristo e16bc2cbd9 Contracts for Java now write for GenomeLoc and GenomeLocParser. The semantics of GenomeLoc are now much clearer. It is no longer allowed to create invalid GenomeLocs -- you can only create them with well formed start, end, and contigs, with respect to the mater dictionary. Where one previously created an invalid GenomeLoc, and asked is this valid, you must now provide the raw arguments to helper functions to assess this. Providing bad arguments to GenomeLoc generates UserExceptions now. Added utilty functions contigIsInDictionary and indexIsInDictionary to help with this.
Refactored several Interval utilties from GenomeLocParser to IntervalUtils, as one might expect they go

Removed GenomeLoc.clone() method, as this was not correctly implemented, and actually unnecessary, as GenomeLocs are immutable.  Several iterator classes have changed to remove their use of clone()

Removed misc. unnecessary imports

Disabled, temporarily, the validating pileup integration test, as it uses reads mapped to an different reference sequence for ecoli, and this now does not satisfy the contracts for GenomeLoc


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5827 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-20 15:43:27 +00:00
hanna 03452c15c0 Cleanup GATKBAMIndex unit test to allow a more efficient access pattern for
FindLargeShards.  Runtime of FindLargeShards on papuan dataset is now 75min.
GATK proper should benefit as well, although the benefits might be so small
as to not be measurable.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5798 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-12 21:50:33 +00:00
rpoplin 40797f9d45 Ensuring a minimum number of variants when clustering with bad variants. Better error message when Matrix library fails to calculate inverse.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5793 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-12 01:48:37 +00:00
ebanks dfdef2d29b PLEASE READ ME! In order to prepare for the upcoming changes to VCF4, we felt it was best to split up the vcf3 and vcf4 codecs (vcf4 is not backwards compatible to vcf3 and certain changes are too complex to handle in both codecs). Using the 'VCF' rod type in the GATK will now throw a UserException for vcf3.2 or vcf3.3 files telling you to use the 'VCF3' type instead (and vice versa). Integration/unit tests have been updated. For programmers: note that there is currently a lot of code duplication in the two codecs (although I pulled out the easy stuff to a VCFCodecUtils class); however WE ARE FREEZING THE VCF3 CODEC AND WILL NO LONGER MAKE CHANGES TO IT. All updates/improvements will be targetted to the vcf4 codec only as vcf3 is there only to be able to read legacy files. People should really be using vcf4 files only.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5787 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-11 12:07:44 +00:00
hanna f275be6968 A 'fat shard' finder. Cranks through the indices of a BAM file or list of
BAM files looking for outliers (outliers right now are defined naively  as 
shards whose sizes are more than 5 stddevs away from the mean).  Runs in
13 minutes per chromosome on 707 low pass whole genome BAMs -- not great, but
much faster than running UG on the same region to discover anomalies.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5782 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-10 12:56:47 +00:00
kshakir 7d21350a17 Fixed import.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5780 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-09 18:07:40 +00:00
ebanks 15c7bd82a5 Fix for IndelRealigner memory problem. Now the Constrained mate fixing writer is told whether a read has been modified and, if it wasn't, can dump it when the cache needs to get flushed at places with tons of coverage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5777 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-06 19:34:41 +00:00
hanna c2e8c460cb Factor out all testing dependencies into a separate test configuration and
only download that test configuration when running unit/integration tests.
This means that the build will (hopefully) never break because it can't
fetch a file that isn't required for the GATK to run.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5775 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 22:42:11 +00:00
delangel 7d7ce6cf00 Two embarassing bug fixes:
a) Forgot to convert from phred to log-prob when computing gap penalties from recal table.
b) Forgot to uncomment code to correctly deal with hard-clipped bases in a read. But because of this, had to do a short term workaround to at least temporarily return class from hardClipAdaptorSequence to GATKSAMRecord. Otherwise, I get exceptions when casting because somehow some reads in HiSeq get to be SAMRecord (which GATKSAMRecord inherits from) but some reads get to be BAMRecords (which can't be cast into GATKSAMRecord), not sure why.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5771 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 17:08:34 +00:00
kshakir 28b897d5de Fixed O(N^2) operation when scattering interval files.
Cleaned up intervals contig count function.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5768 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 03:32:35 +00:00
carneiro 3882d1b9c0 fixing the build \o/
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5767 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 00:57:49 +00:00
kshakir 8ad547e6c2 Fixed another interval bug where dividing up N intervals into N parts wasn't working.
Minor updates to the FCPTest to match the changes due to using the old indel caller.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5766 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:49:35 +00:00
hanna 5c6965575e Some refactoring that Mauricio and I worked through together. Changed filters
to extend from org.broadinstitute.sting.gatk.filters.ReadFilter rather than
directly from net.sf.picard.filter.SamRecordFilter, which allows us to add
an initialize(GATKEngine) method so that filters can do any initialization
they'd like based on CL arguments, SAM headers, etc.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5760 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 19:29:08 +00:00
rpoplin 6c7a0adc76 Updating VariantGaussianMixtureModelUnitTest to use truth sensitivity cutting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5750 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 13:56:01 +00:00
rpoplin 23cd3a7a5d Moving VQSR v2 to core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5740 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:20:06 +00:00
rpoplin e73720c2db Updating VQSLOD annotation description
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5735 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:01:08 +00:00
ebanks d4cbd8691c Make the default that we only output SNPs (so that when I make another release we don't get flooded with questions about why the UG is all of a sudden so slow)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5729 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 16:38:55 +00:00
rpoplin 3224bbe750 New visualization output for VQSR. It creates the R script file on the fly and then runs Rscript on it. Adding 1000G Project consensus code. First pass of having VQSR work with missing data by marginalizing over the missing dimension for that data point (thanks Chris and Bob for ideas). Updated math functions to use apache math commons instead of approximations from wikipedia. New parameters available for the priors based on further reading in Bishop and looking at the new visualizations. Updated integration test to use more modern files. Updated MDCP to use new best practices w.r.t. annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5723 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 19:14:42 +00:00
ebanks deed7c47a1 Continuing the epic fail, some of our existing integration tests were wrong because of the lazy loading failure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5712 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 17:54:41 +00:00
ebanks ab9ffb1a74 Epic failure on the lazy loading of genotypes: if the input VCF had its samples unsorted and we used a walker that didn't require genotypes, then we would sort the samples but not load genotypes (and therefore the genotypes wouldn't match the samples anymore). Added simple integration test to cover this case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5711 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 16:03:45 +00:00
rpoplin b7334dcc1e Rank sum test annotations are the Z-scores from the test instead of the p-value.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5707 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 14:35:00 +00:00
ebanks 45081c32d7 continuing from last night, the integration tests weren't covering the right behavior either
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5706 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 13:30:57 +00:00
droazen d650efd40a Fix for bug GSA-449: Intervals that are not in GATK format are not validated
to the same standard as GATK format intervals. Full validation against contig
bounds is now performed for all intervals, regardless of their source. Also
fixed a few tests for validation exclusions that were backwards.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5698 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 18:12:10 +00:00
chartl 7afeb1ab17 Removing broken imports (boo)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5692 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 18:55:25 +00:00
chartl bc3fd70b0a Removing the old association walker, switching test to just validate that MannWhitneyU is doing the right thing. Unit tests still pass.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5690 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 18:05:19 +00:00
kshakir f619dd3ca7 Refactored IntervalUtils used to parse and scatter intervals for Queue.
Scattering non-contig interval lists by number of loci in the intervals instead of just number of intervals.
Queue caches the list of locs and how to split them up instead of reloading them from disk repeatedly.
TODO: general purpose function to divide data evenly.
Skip over comments when parsing picard analysis files.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5687 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 00:06:00 +00:00