Commit Graph

5057 Commits (cae4b9b0de177d68a3066bd00c9ef7d2de8ea130)

Author SHA1 Message Date
carneiro cae4b9b0de quick update with the correct CEU trio bam file and it's final location.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5098 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 19:17:19 +00:00
depristo 5ed128f839 Slightly more tolerant timing setting. Main() method in GenomeLocProcessTracker to generating timing data for trackers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5097 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 15:16:07 +00:00
depristo 61c29d550d Fix for NullPointer where a run starts but there's nothing to do (no shards) and reduceInit() wasn't being called correctly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5096 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 15:15:10 +00:00
depristo f522eb2848 Previous tests were just too big...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5095 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 13:48:38 +00:00
kiran 2901299ff6 Sets the number of samples to all of the samples in the file when it's not specifed on the command-line explicitly. GenotypeConcordance no longer a standard evaluation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5094 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-27 01:38:26 +00:00
hanna 4a33cdacde Some basic integration tests detecting breakage in OTF BAM index generation.
Doing it manually for the moment so that there's at least something testing
this capability; will followup eventually with Mark to see whether we can
shape the VCF index generation code in such a way that it supports BAM index
testing as well.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5093 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 23:48:04 +00:00
fromer 466f8f8a3c Compares RBP phasing to a simple trio phasing model that can phase a child het iff both parental genotypes are known and at least one of them is not het [at EACH of the sites in the pair to be phased]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5092 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 23:43:29 +00:00
ebanks 68729045ca Always best to use the left-aligned version of the dbsnp vcf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5091 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 20:21:50 +00:00
asivache 43812a28fc If among all the multiple alignments for the given read we have 'unmapped' ones (can happen with bwa 0.5.7 and maybe later versions), then discard the latters and keep only the mapped ones. Keep 'unmapped' only if its the only alignment available.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5090 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 20:07:08 +00:00
asivache 63b709d992 When remapping the read, set MAPQ, CIGAR etc to 0/null for unmapped reads. This is not required according to spec but current samtools jdk otherwise dies in STRICT validation mode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5089 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 19:49:07 +00:00
ebanks d33162145b Moving the --sites_only argument up into the VCFWriter itself so that any walkers that write VCFs can choose not to emit genotypes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5088 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 19:38:16 +00:00
kiran a97184fddf Frick! Changed to refer to the *playground* version of VariantEvaluator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5087 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 19:33:03 +00:00
corin 73e2942c62 Reformated backdrop--removed the date
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5086 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 18:25:59 +00:00
kiran a9d0772516 When evaluating JEXL expressions, on't blow up if the eval VC is null
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5085 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 18:25:03 +00:00
kiran 22e599ec76 Fixed output report to properly handle evaluation modules with TableType objects. Promoted CpG to a standard stratification. Demoted Filter to a non-standard stratification. Now, if the filter stratification is not specified, VariantEval only evaluates PASSing sites.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5084 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 17:38:21 +00:00
ebanks 2dcce58279 oneoffs walker to assess GLs at truth sites
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5083 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 14:59:05 +00:00
ebanks dfc5a3d1f3 added integration test for --sites_only option
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5082 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 14:58:15 +00:00
ebanks 0429301536 Added ability to output just sites (no genotypes) from UG with the --sites_only argument. Note that we do still genotype in this mode so that the INFO annotations are identical, but we strip the genotypes out of the VC right before writing to output. In other words, this is not designed to make UG go faster; the point here is to allow downstream tools not to have to parse GTs if they don't want to. Here you go, Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5081 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 14:52:38 +00:00
ebanks 01e032e89c Missorted BAMs are User Exceptions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5080 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 14:09:39 +00:00
depristo be697d96f9 An apparently robust implementation of the file locking for distributed computation, using Lucene's file creation locking approach. It is worth trying out for those with large-scale, high-cost data sets. Details and discussion at group meeting on Wednesday. Some cleanup still needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5079 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 13:45:40 +00:00
kshakir df2e7bd355 Disabled FCPTest whilst we figure out where the C426 bams went.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5078 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 05:11:57 +00:00
hanna 862b299b47 Fix Picard OTF index generation issue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5077 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 03:42:46 +00:00
kshakir ce5b11317b Moved some shutdown logic from the LSF job runner into the QGraph.
Because of Java's type erasure JobManagers must provide runtime access to the runner class to shutdown.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5076 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 20:28:54 +00:00
fromer 6ac888d26a Correct accounting for cases where first het in interval is phased
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5075 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 19:48:54 +00:00
fromer af79fa629f PROPERLY print out list of intervals and their stats
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5074 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 19:20:36 +00:00
delangel db2e2cb0ff Another trivial change to make VQSR work with indels
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5073 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 19:05:31 +00:00
corin b22f82d5dd Minor formatting udpates to deal with long bait names, multiple sequencer types, and date formatting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5072 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 19:02:40 +00:00
fromer 17ba75e502 Can now print out list of intervals and their stats
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5071 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 18:36:59 +00:00
corin 32cdcc933c A quick python script to give the status of the projects in the humgen/gsa-pipeline/ directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5070 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 15:21:50 +00:00
kshakir b3c9b9bfbe +1 file that should have been with the last checkin.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5069 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 05:31:17 +00:00
kshakir 9923e05e0a Moved MD5 utils from WalkerTest to BaseTest for use by PipelineTests.
Moved VariantEval validation from FCPTest to PipelineTest.
Cleaned up some duplicate code for writing temp files during tests.
Moved FCPTest to playground namespace to match move for FCP.q.
Added a basic HelloWorldPipelineTest for the HelloWorld QScript. 
Moved duplicated error handling from JobRunners into the FunctionEdge.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5068 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 04:11:49 +00:00
hanna 9db02059ac Fix for Ryan's issue: reads ending with indel distort the location of the
pileup, resulting a two map() calls for the same locus (and no map call for
the locus immediately following).
Fixed bug and added comprehensive unit tests.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5067 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 19:49:39 +00:00
kshakir 76ee57639d Updated FCPTest to match changes to UG in r5058.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5066 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 19:30:02 +00:00
depristo 7b92cd5008 Adding lucene dependency for file locking -- may be removed in the near future
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5065 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 18:59:42 +00:00
fromer 61fe409211 Basic walker to count the number of (phased) hets in each exome target
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5064 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 17:53:14 +00:00
depristo c50f39a147 V3 of the distributed GATK. High-efficiency implementation. Support for status tracking for debugging and display. Still not safe for production use due to NFS filelock problem. V4 will use alternative file locking mechanism
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5063 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 16:45:07 +00:00
delangel fd864e8e3a Minimal necessary (but most likely not sufficient) changes to run VQSR on indel data: don't fill Ti/Tv fields if non-SNP, request VC only st start of position, check if isSNP() before doing snp-specific operations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5062 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 02:36:36 +00:00
depristo a51061fd96 Improved distributed processing analytics. Still not 100% ready for prime-time. More improvements incoming. Iterator claim now supports requests to obtain in a single atomic claim (one lock) multiple sequential shards, which radically reduces overhead. However, deadlocking is still possible...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5061 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 16:17:25 +00:00
ebanks 2d4bcb60a1 Don't print out alt alleles for ref calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5060 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 06:33:31 +00:00
ebanks 2ba35dc7ba Bad chain files are user errors
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5059 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 06:04:36 +00:00
ebanks 2bbcc9275a Committing the fragment-based calling code. Results look great in all datasets (will show this at 1000G this week with Ryan). Note that this is an intermediate commit. The code needs to be cleaned up and the fragmentation code needs to be moved up into LocusIteratorByState. This should all happen later this week, but I don't want Ryan to have to keep running from my own personal Sting directory. The current crappy implementation adds ~10% to the runtime, but that should all go away in the next iteration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5058 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 05:04:17 +00:00
ebanks bb6999b032 Better documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5057 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-23 03:36:09 +00:00
corin 1dcdebbc9e Updating the file path for proper inclusion of the background in the tearsheet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5056 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-22 19:15:33 +00:00
depristo c52d2d5f79 Bug fix for SimpleTimer that didn't always convert elapsed times from milliseconds to seconds
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5055 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-22 18:50:59 +00:00
depristo ff61aeb762 continuing to push to get right answers for long-running jobs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5054 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-22 15:02:02 +00:00
delangel a50d7f74fa Change to support plotting of indel quality as a function of covariates - for now, just call different R calling script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5053 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-22 14:09:23 +00:00
delangel fa0c476b82 Script for calling indels in all phase 1 samples - VQSR part still needs work but raw calling is done
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5052 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-22 14:07:10 +00:00
depristo 9b1b8d46aa Performance tracking of GenomeLocProcessingTrackers, as well as a marker for where to put tracker in HierarchicalMicroScheduler
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5051 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 22:24:42 +00:00
rpoplin 95d6ddc38c lastProgressPrintTime should only be updated when a progress log is printed not when a performance log is printed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5050 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 22:23:14 +00:00
depristo 8ece2b9230 Distributed GATK analysis scripts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5049 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 22:09:07 +00:00