Eric Banks
51eb95d638
Missed these tests before
2011-09-09 11:46:37 -04:00
Eric Banks
6ad8943ca0
CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly.
2011-09-09 09:45:24 -04:00
Khalid Shakir
510d5e7730
Merged bug fix from Stable into Unstable
2011-09-09 01:34:55 -04:00
Khalid Shakir
367bbee25a
Fixed typo when printing the contents or last N lines of a file. Thanks to larryns.
2011-09-09 01:33:25 -04:00
Eric Banks
eaaba6eb51
Confirming that when stratifying by sample in VE the monomorphic sites for a given sample are not counted for the relevant metrics. Adding integration test to cover it.
2011-09-08 13:17:34 -04:00
Ryan Poplin
2636d216de
Adding indel vqsr integration test
2011-09-08 10:38:13 -04:00
Ryan Poplin
9cba1019c8
Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap
2011-09-08 09:25:13 -04:00
Ryan Poplin
e0020b2b29
Fixing PrintRODs. Now has input and only prints out one copy of each record
2011-09-08 08:58:37 -04:00
Ryan Poplin
29c968ab60
clean up
2011-09-08 08:42:43 -04:00
Ryan Poplin
59841f8232
Fixing genotype given alleles for indels. Only take the records that start at this locus.
2011-09-08 08:41:16 -04:00
Guillermo del Angel
45d54f6258
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 16:49:49 -04:00
Guillermo del Angel
9604fb2ba3
Necessary but not sufficient step to fix GenotypeGivenAlleles mode in UG which is now busted
2011-09-07 16:49:16 -04:00
Mark DePristo
2ded027762
Removed dysfunctional tranches support from VariantEval
2011-09-07 16:09:24 -04:00
Eric Banks
aa9e32f2f1
Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark.
2011-09-07 15:48:06 -04:00
Mark DePristo
d7e355b4b6
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 14:54:16 -04:00
Mark DePristo
9127849f5d
BugFix for unit test
2011-09-07 14:54:10 -04:00
Eric Banks
3a04955a30
We already had isPolymorphic and isMonomorphic in the VariantContext, but the implementation was incorrect for many edge cases (e.g. sites-only files, sites with samples who were no-called). Fixing. Moving on to VE now.
2011-09-07 14:01:42 -04:00
Mauricio Carneiro
ee9d599558
Just cleaning up
...
clean up old commented code from tha data processing pipeline.
2011-09-07 13:32:40 -04:00
Guillermo del Angel
743bf7784c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 13:21:26 -04:00
Guillermo del Angel
5f22ef9a8c
Added missing javadoc info to Beagle arguments
2011-09-07 13:21:11 -04:00
Mark DePristo
3bcbfa6e06
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 13:13:17 -04:00
Mark DePristo
430da23446
At least 2 minutes must pass before a status message is printed, further stabilizing time estimates
2011-09-07 13:13:07 -04:00
Mauricio Carneiro
6857d0324e
Merge branch 'master' into rr
2011-09-07 12:59:08 -04:00
Mark DePristo
7e9e20fed0
Forgot to delete previous call
2011-09-07 12:54:52 -04:00
Mark DePristo
d23d620494
Pushing traversal engine timer start to as close to actual start as possible
...
-- Should make initial timings more accurate
2011-09-07 12:52:33 -04:00
Mark DePristo
6ff432e1f2
BugFix for TF argument to VariantEval, actually making it work properly
2011-09-07 12:50:17 -04:00
Mauricio Carneiro
131cb7effd
Bringing Reduce Reads bug fixes to the main repository
2011-09-07 12:25:53 -04:00
Mark DePristo
a1920397e8
Major bugfix for per sample VariantEval
...
-- per sample stratification was not being calculated correctly. The alt allele was always remaining, even if the genotype of the sample was hom-ref. Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything. Eric is going to fix the system in general, so this commit may break the build.
2011-09-07 12:18:11 -04:00
Mark DePristo
d5641cfac5
Merge branch 'variantEvalST'
2011-09-07 10:44:23 -04:00
Mark DePristo
2f4cf82e3b
VariantEval cleanup. Added VariantType Stratification
...
-- ArrayList are List where possible
-- states refactored into VariantStratifier base class (reduces many lines of duplicate code)
-- Added VariantType stratification that partitions report by VariantContext.Type
2011-09-07 10:43:53 -04:00
Christopher Hartl
436f6eb52b
Reverting Eric's change and pushing in some command-line-option documentation.
2011-09-07 08:53:30 -04:00
Eric Banks
1ef8a1750a
I asked nicely and got nothing. Then I threatened and still got nothing. So I am carrying through on my threats. Guillermo, you have a short reprieve because you were away on vacation, but let's get yours done tomorrow afternoon.
2011-09-06 21:07:49 -04:00
Eric Banks
da9c8ab386
Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.
2011-09-06 20:39:42 -04:00
Mark DePristo
9559115ad5
Bugfix for singleton runs. Now with histograms where possible
2011-09-06 16:54:01 -04:00
Mark DePristo
3db7ecb920
ReducedRead flag cached in GATKSAMRecord. 20% performance improvement
2011-09-06 15:11:38 -04:00
Roger Zurawicki
47607a7eff
Fixed bug where deletions messed up interval clipping
...
- Instead of using readLength, the ReadUtil function are used to get a proper read coordinate
- Added debug info in interval clipping ( with -dl)
NOTE: method might not be safe for production and checks need to be added to the ClippingOp code
2011-09-06 14:25:57 -04:00
Khalid Shakir
0adb388dee
Fixed bug in SelectVariants that was annotating sample_file / exclude_sample_file as @Argument instead of @Input meaning they weren't tracked in Queue.
...
Updates for HybridSelectionPipeline:
- Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5.
- If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds.
- Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches.
- Removed GSA_PRODUCTION_ONLY flag from indel caller.
- Updated indel hard filters based on delangel's analysis.
- Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.
2011-09-06 12:41:46 -04:00
Mauricio Carneiro
28d782b4c7
Allowing multiple dnsnp and indel files in the DPP
2011-09-02 13:38:47 -04:00
Mauricio Carneiro
08ae6c0c61
ReadClipper is now handling unmapped reads
2011-09-02 11:32:30 -04:00
Eric Banks
d241f0e903
Adding docs for the pcr error rate argument.
2011-09-01 21:57:02 -04:00
Mauricio Carneiro
ad4ea0b80b
Merged bug fix from Stable into Unstable
2011-09-01 18:14:45 -04:00
Mauricio Carneiro
e253f6f05d
Fixing typo in DPP
...
platform and library were exchanged when rebuilding the read group information
2011-09-01 18:13:52 -04:00
Mauricio Carneiro
d2a33beff7
Added WGS/WEX b37-decoy CEU trio datasets
2011-09-01 13:14:40 -04:00
Eric Banks
827fe6130c
Adding hidden printing option. Also, always run UG in mode GENOTYPE_GIVEN_ALLELES given that we don't actually test for the correct alleles (otherwise UG may choose a different allele and we may falsely validate the wrong one).
2011-09-01 11:40:35 -04:00
Mark DePristo
1aa4b12ff0
Reduced the number of combinations being tested here, which was overkill
2011-09-01 10:42:43 -04:00
Mark DePristo
ac49b8d26b
Conditional support for PerformanceTrackingQuerySource to measure Tribble / GATK bridge performance
...
-- Removed DEBUG option, instead use MEASURE_TRIBBLE_QUERY_PERFORMANCE in RMDTrackerBuilder
2011-09-01 10:41:55 -04:00
Mauricio Carneiro
4b5a7046c5
Making ReadLengthDistribution Public
...
Found this neat little walker Kiran wrote stashed in the private tree. Very useful. Generalized it a bit, added GATKDocs and moved it to public. I might include it as a QC step on the pacbio processing pipeline.
* generalize it so it works with non pair ended reads.
* generalize it to work with no read group information
2011-08-31 15:52:28 -04:00
Mauricio Carneiro
7d79de91c5
Merge branch 'master' into rr
2011-08-30 02:50:19 -04:00
Mauricio Carneiro
0cd9438ac2
fixed soft unclipped calculation
...
* getRefCoordSoftUnclippedEnd was not resetting the shift when hitting insertions. Fixed.
* getReadCoordinateForReferenceCoordinateBeforeAlignmentEnd was returning the wrong read coordinate position. Fixed.
2011-08-30 02:45:29 -04:00
Mauricio Carneiro
fd540592ab
Added RMS calculation for consensus MQ
...
Consensus MQ is now the average of the RMS of the mapping qualities of the reads making each site.
2011-08-30 02:45:20 -04:00
Mauricio Carneiro
6f9264d2b3
Hard Clipping no longer leaves indels on the tails
...
The clipper could leave an insertion or deletion as the start or end of a read after hardclipping a read if the element adjacent to the clipping point was an indel. Fixed.
2011-08-30 02:44:58 -04:00
Mauricio Carneiro
943876c6eb
Added QUAL/MINVAR parameters to the walker
2011-08-30 02:44:46 -04:00
Mauricio Carneiro
7532be7f5a
Allowing to clip after AlignmentEnd if end is soft clipped.
...
Read clipper now identifies and clips even if the requested coordinate is outside the alignment but the read contains soft clipped bases in that region.
2011-08-30 02:44:46 -04:00
Mauricio Carneiro
90a1f5e15c
Several bug fixes
...
* When hard clipping a read that had insertions in it, the insertion was being added to the cigar string's hard clip element. This way, the old UnclippedStart() was being modified and so was the calculation of the new AlignmentStart(). Fixed it by subtracting the number of insertions clipped from the total number of hard clipped bases.
* Walker was sending read instead of filtered read when deleting a read that contains only Q2 bases
* Sliding the window was causing reads that started on the new start position to be entirely clipped.
2011-08-30 02:44:19 -04:00
Mauricio Carneiro
66a8b36cf5
Fixed most indexing bugs
...
* added bases and quals to consensus
* fixed consensus read cigar generation.
2011-08-30 02:43:41 -04:00
Mark DePristo
c6d8df8639
queueJobReport is a public feature of Queue
2011-08-29 17:20:54 -04:00
Mark DePristo
1e5001b447
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-29 17:04:21 -04:00
Mark DePristo
3af001fff2
Bugfix for file that must not exist on disk
2011-08-29 17:00:10 -04:00
Mark DePristo
3b09d42ed6
Now only prints 1 warning message about duplicate headers in simpleMerge
2011-08-29 14:41:29 -04:00
Eric Banks
c2f0db969b
Don't use the default deletion value from UG if not asking to have it set
2011-08-29 13:48:10 -04:00
Eric Banks
bb7a37e8f2
We need to allow reference calls in the input VCF for the GenotypeAndValidate walker when using the BAM as truth so that we can test supposed monomorphic calls against the truth.
2011-08-29 13:19:35 -04:00
Ryan Poplin
bc252a0d62
misc minor bug fixes in assembly. Increasing the minimum number of bad variants to be used in negative model training in the VQSR
2011-08-29 08:11:31 -04:00
Mark DePristo
61633c95a8
Default jobreport is now jobPrefix, so you see logs like Q-2508.jobreport.txt
2011-08-28 19:19:45 -04:00
Mark DePristo
a5c65fc133
Debugging information to print out the Query tracks
2011-08-28 18:54:49 -04:00
Mark DePristo
b38de1fa35
Now captures the exechost in the job report
...
-- Works for in process, shell, and LSF runners
-- Cleanup of debugging output
2011-08-28 12:05:56 -04:00
Mark DePristo
7bf006278d
Moved ResolveHostname to general utils as a static function
2011-08-28 12:04:16 -04:00
Mark DePristo
ccec0b4d73
AnalyzeCovariates uses the general RScript system now
...
-- Convenience constructor for collection for testing
-- callRScript() now accepts Objects not Strings, for convenience
2011-08-27 12:54:13 -04:00
Mark DePristo
1ceb020fae
UnitTests for RScript
2011-08-27 10:50:05 -04:00
Mark DePristo
e37a638e09
Fix for disallowed characters in GATKReportTable
...
-- Illegal characters are automatically replaced with _
2011-08-26 13:24:06 -04:00
Mark DePristo
0cb1605df0
Clean documentation for JobRunInfo
2011-08-26 09:22:58 -04:00
Mark DePristo
415d5d5301
LSF long times are in seconds, convert to milliseconds to meet standard
2011-08-26 09:18:28 -04:00
Mark DePristo
c0503283df
Spelling fix requires md5 updates
2011-08-26 07:40:44 -04:00
Mark DePristo
eef1ac415a
Merge branch 'master' into rodTesting
...
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToTable.java
2011-08-26 00:35:41 -04:00
Eric Banks
9b7512fd94
Just because there's a ref base doesn't mean the VC needs to be padded
2011-08-25 22:42:14 -04:00
Mark DePristo
e03dfdb0ab
Automatic iteration field addition works properly.
2011-08-25 16:59:02 -04:00
Mark DePristo
e01273ca7c
Queue now writes out queueJobReport.pdf
...
-- General purpose RScript executor in java (please use when invoking RScripts)
-- Removed groupName. This is now analysisName
-- Explicitly added capability to enable/disable individual QFunction
2011-08-25 16:57:11 -04:00
Eric Banks
09a729da3a
Removing incorrect comment
2011-08-25 15:42:52 -04:00
Eric Banks
8bbef79fc2
Create clipped alleles during allele parsing instead of creating a full VC, clipping alleles, and regenerating the VC from scratch.
2011-08-25 15:37:26 -04:00
Mark DePristo
0f4be2c4a4
Argument to disable queueJobReport entirely
...
-- Minor improvements to RodPerformanceGoals
2011-08-25 13:32:03 -04:00
Mark DePristo
d65faf509c
Default output name for Queue JobReport is queue_jobreport.gatkreport.txt
2011-08-25 13:15:20 -04:00
Mark DePristo
a7d6946b22
Refactored QJobReport and QFunction, which is now automatically tracked
...
-- All QFunctions, including sg ones, are tracked
-- Removed memory information
2011-08-25 13:13:55 -04:00
Mauricio Carneiro
16caca0822
BLASR BAMs and new BWA parameters
...
*Added the functions to turn a BLASR generated BAM file into a usable BAM file.
*Modified the bwa parameters according to test results from NA12878 pb2k dataset.
2011-08-24 17:04:07 -04:00
Mauricio Carneiro
e3f5d7067a
Added ReorderSam queue binding
2011-08-24 17:03:11 -04:00
Mark DePristo
08fb21f127
Removing hostname
2011-08-24 16:45:50 -04:00
Mauricio Carneiro
dc8398e165
fixing bai output for indel cleaning.
2011-08-24 15:58:34 -04:00
Mark DePristo
06e30a81d1
Fixes throughout for getting job information
...
-- no more hostname -- it's just not going to be important
2011-08-24 15:30:09 -04:00
Ryan Poplin
29c7b10f7b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-24 15:18:58 -04:00
Ryan Poplin
e5008aba00
Output the top two haplotypes as a variant call by running smith-waterman alignment against the reference and calling any difference as variation. This is the first verion that runs end-to-end by taking in reads as bam file and writing out variant calls in VCF.
2011-08-24 15:18:44 -04:00
Mark DePristo
4918519a58
No more NPE in getRuntime() when you cntr-c out of Queue
2011-08-24 14:14:01 -04:00
Mark DePristo
16d8360592
QJobReport is now the official capability name
2011-08-24 13:59:14 -04:00
Mark DePristo
d047c19ad1
Writes output to file
2011-08-24 13:52:05 -04:00
Mark DePristo
3ae68e2397
JobLogging trait now writes out GATKReport log of jobs
2011-08-24 13:36:39 -04:00
Guillermo del Angel
e618cb1e79
a) Renamed/expanded SelectVariants arguments that choose particular kinds of variants and particular allelic types, now instead of -Indels or -SNPs we can specify for example -selectType [MIXED|INDEL|SNP|MNP|SYMBOLIC]. To select biallelic, multiallelic variants, use -restrictAllelesTo [BIALLELIC|MULTIALLELIC]. Corresponding gatkdocs changes.
...
b) More useful AC,AF logging in VariantsToTable with multiallelic sites: instead of logging comma-separated values, log max value by default. Hidden, experimental argument -logACSum to log sum of ACs instead. This is due to extreme slowness of R in parsing strings to tokens and computing max/sum itself (~100x slower than gatk).
c) Added integrationtest for new SelectVariants commands
2011-08-24 12:25:50 -04:00
Mauricio Carneiro
cd12f7f286
Fixed list dependency
...
Instead of creating a bam list file, I dynamically create a scala list and pass as parameters. This way the intermediate bam files don't get deleted before they should.
2011-08-24 11:12:46 -04:00
Mauricio Carneiro
219252a566
Adapting to the new RodBinding framework
2011-08-24 11:12:46 -04:00
Mark DePristo
28ee6dac41
Fixed spelling mistake
2011-08-24 10:14:45 -04:00
Ryan Poplin
f37875600a
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-24 09:02:44 -04:00
Khalid Shakir
1ecbf05aae
Avoid segfaults due to out of date and possibly abandonded LSF DRMAA implementation when use'ing LSF instead of .combined_LSF_SGE
2011-08-23 23:49:36 -04:00
Mark DePristo
b8bc03bb42
JobRunInfo improvements
...
-- dry-run now adds some info, for testing
-- InProcessRunner adds some, but not all, of the information we want
2011-08-23 17:11:22 -04:00
Mark DePristo
569e1a1089
Walker.isDone() aborts execution early
...
-- Useful if you want to have a parameter like MAX_RECORDS that wants the walker to stop after some number of map calls without having to resort to the old System.exit() call directly.
2011-08-23 16:53:06 -04:00