gatk-3.8

Commit Graph

Author	SHA1	Message	Date
ebanks	cbcdfc584d	Moving out of core and into playground git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5671 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-21 02:30:22 +00:00
depristo	cc78027bd3	Two optimizations. Even more aggressive printProgress meter optimization to only even consider doing work once every 1000 cycles through the engine. Second, GenomeLocParser now uses a single indirection around the contigInfo variable. This class uses a last used cache to retrieve efficiently contig information instead of always returning to the underlying SAMSequenceDictionary hashmap to make genome locs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5670 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-21 01:31:26 +00:00
depristo	29857f5ba6	Fix for instability in output of fasta alternative reference maker when snpmask and snp files are provided and have overlapping records. The order of the records changed due to optimization of the refmetadatatracker, and uncovered this non-determinanism. Now preferrentially masks out includes sites from snps before considering masking out sites in snpmask git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5669 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 21:54:09 +00:00
kshakir	8619f49d20	Added a utility method to retrieve the contig lengths for WG chunking. Added a rudimentary GATKReportParser for parsing VE3 results. Re-enabled the FCPTest using VE3, the GATKRP, and the PicardAggregationUtils. The tag type for .rod files is DBSNP, not ROD. More explicit return types on implicit methods. Added null checks for implicit string to/from file conversions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5668 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 19:22:21 +00:00
delangel	59dd79faab	One more optimization: don't use Math.round(), but do my own rouding/casting. UG now about 40% faster calling indels, 30-35% faster calling snp's+indels simultaneously. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5667 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 19:15:58 +00:00
delangel	246d8190b5	Round one of "easy" zero-effort optimizations to UG's indel caller. Mostly inline functions, avoid repeated computation and try to optimize SoftMaxPair() which is by far the bigest runtime hog. More to come... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5666 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 18:57:34 +00:00
depristo	d8b8f857f3	V2 -- now working -- of a core walker that creates the standard GATK resource bundle See https://www.broadinstitute.org/gsa/wiki/index.php/GATK_resource_bundle Which live locally in /humgen/gsa-hpprojects/GATK/bundle/current You use this following command to create the bundle: java -Djava.io.tmpdir=/broad/shptmp/depristo/tmp -jar dist/Queue.jar -S scala/qscript/core/GATKResourcesBundle.scala --gatkjarfile dist/GenomeAnalysisTK.jar -bsub -jobQueue gsa -svn 5660 $* Annoyingly, it must be run in the trunk directory, and requires an explicit svn version number to create the directory. It also must be run in two stages manually. First, the local bundle is created, and then with the -phase2 argument all of the files in the local bundle are compressed and pushed to the FTP server. I'm likely going to shift most of my processes over to using this location for data file access, especially for b37 data sets. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5665 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 12:48:47 +00:00
depristo	a8f8077d7a	Simple optimizations for cases where there is no data or RODs at sites, such as with the FastaStats walker. private static immutable Lists and Maps in underlying data structures that have no associated data. Also, avoiding a double map.get() in the low-level genome loc parser. RefMetaDataTracker is now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5664 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 10:52:16 +00:00
hanna	54660a8c25	Fix requested by Lee Lichtenstein: first check to see whether it's time for a progress message, then aggregate metrics. Makes the overhead of printProgress in RealignerTargetCreator go from >20% to ~3%. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5663 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 03:22:48 +00:00
carneiro	d35c7d1029	- minor changes to the 'justclean' script to handle the Trio Cleaning. - fixing a bug on single ended BWA option of the data processing pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5662 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-19 16:35:24 +00:00
hanna	49550e257f	Fix for JamesP's issue. This issue appeared because of a design flaw in the interface between SAMDataSource and IntervalSharder that needs to stay around until the original BAM sharder is retired. Will add a JIRA to fix design flaw. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5661 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-19 00:52:13 +00:00
depristo	50e86cfee9	useful chain files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5660 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-18 19:47:49 +00:00
depristo	541c9109b3	V1 of GATK Resource Bundling system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5659 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-18 19:23:45 +00:00
ebanks	673772a522	Catch samtools exceptions and make them 'BAM Exceptions' asking the user to run Picard's validator and re-index the file before posting anything to the forum. Let's see whether this helps or not. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5658 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-18 03:52:43 +00:00
ebanks	e97a5ca161	Rename 'verbose' argument to 'debug_file'. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5657 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-18 03:17:13 +00:00
chartl	e28fc21642	Spurious associations can develop from including ambiguous reads in these tests. Perhaps MQ0 reads shouldn't be used for anything except MQ0, but the best way to do that is to restructure the code, so for now I'll put it off. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5656 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-17 23:17:03 +00:00
ebanks	49ea07acce	My fixes to Tribble yesterday revealed that some of the test VCFs for integration tests were actually malformed. Also, Guillermo updated the b37 dbSNP VCF and that broke some tests. Should be good for now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5655 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-17 03:39:11 +00:00
chartl	23fac043d9	Fix the outputs so the proper files are gathered (not automatic due to multiplexer) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5654 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 23:55:12 +00:00
chartl	e5ef8388fc	BatchMerge - AlleleVCF --> AllelesVCF, this (combined with Eric's fix) will solve James P.'s forum issue. After viewing results on real case/control data from RAW -- it's really working quite well. ReadIndels, however, needs to use a T-test rather than a U-test, especially in deep coverage (at indel sites, the reads with indels will have mostly the same number of CIGAR indel elements -- one -- which doesn't really play nicely with the UTest when sample sets are large). Modified ReadsLargeInsertSize to be a two-way test (e.g. ReadsLarge and ReadsSmall). BaseQualityScore also suffers from the same issue as read indels, so switching over to a T-test in that case as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5653 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 22:03:16 +00:00
ebanks	1c32deb108	For some reason I wasn't allowing expressions to be used with the -all argument. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5652 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 20:59:10 +00:00
corin	2cf6a06503	Throwing an error if INFO fields arguments contain whitespace. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5651 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 20:52:55 +00:00
corin	fce6d25075	Moved the reference ID to a meta data field for validity declaration. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5650 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 20:28:56 +00:00
corin	59215dab48	Now writes results to a minimal vcf with annotations included in the INFO field. Must be run with -NO_HEADER to totally remove header for the most bare bones vcf; otherwise also includes command line meta data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5649 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 20:14:02 +00:00
ebanks	fe26954ac6	Minimal support for reading in VCF4.1 files. Added TODOs that need to be fixed or cleaned up to truly support this version. VCF constants updated. Lower-case bases permitted. Please let's make sure to refactor once we're ready to support it for good. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5648 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 18:59:37 +00:00
ebanks	7e9051ea25	The solution to James's bug was just to clean up the code and simplify it. What happened was that functionality that got put into UGCalcLikelihoods was then generalized into the UG engine but then never removed from UGCalcLikelihoods. This knowingly breaks the batch merger, but Chris said he'll take care of it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5647 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 18:05:10 +00:00
kshakir	798178b167	Another case of just because you can do something doesn't mean you should. Scala type inference for the implicit return types on implicit methods was a little too much for poor IntelliJ IDEA to handle, and it was breaking things like copy/paste, auto-complete, etc. Also updated the Queue package to include all Sting utils. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5646 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 15:39:56 +00:00
hanna	0d7cca169e	Sigh. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5645 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 14:37:24 +00:00
hanna	0965020804	Screwed up the doc string. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5644 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 14:30:20 +00:00
hanna	be3bad1f61	Low-memory sharding is now enabled by default. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5643 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-15 14:22:07 +00:00
ebanks	2830dc70b7	UG can still return null in certain nasty cases git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5642 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 20:11:17 +00:00
fromer	8e0f5bc5a5	Prevent NullPointerException in cases where SNP is filtered git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5641 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 19:59:59 +00:00
depristo	ee94af3539	Oops, left out of earlier commit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5640 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 18:21:16 +00:00
chartl	104d5515fe	Huh, somehow this change didn't make it through last time git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5639 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 17:09:37 +00:00
chartl	47fa7e2227	+ Added override to extractFileEntries + UG now doesn't care whether it's given SNPs or indels to genotype, it will do the right thing -- so remove the option to specify which GM user wants + Max misamatches argument removed integration test will follow git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5638 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 15:13:35 +00:00
kshakir	cad6722cf6	Emailing on function start. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5637 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 14:55:35 +00:00
depristo	8ed9c0f518	VariantsToTable now blows up by default if you ask for a field that isn't present in a record. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5636 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 14:42:43 +00:00
fromer	b3cd14d10a	Since GCcontentIntervalWalker no longer uses any ROD, turn it into a LocusWalker that traverses by REFERENCE git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5635 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 03:15:09 +00:00
kshakir	475ad1259d	Put a band-aid on the FCP by switching use of DINDEL to INDEL and explicitly running UG the old way with just indels and just snps. Switched YAML parser to new Broad parser which will additionally update picard cleaned bams to the latest version if the project and sample are specified. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5634 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 02:22:31 +00:00
aaron	2089c3bdef	removing; should of gone to the CGA repo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5633 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 22:17:45 +00:00
aaron	da6f2d3c9d	adding the capseg tools to the new walker repo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5632 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 22:11:08 +00:00
kshakir	4bb573b1f5	Centralizing a bunch of Broad specific utility functions from code scattered in GSA-Firehose, PipelineTest, custom QScripts, etc. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5631 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 21:29:02 +00:00
ebanks	91d308fc6d	temporary patch until Picard (hopefully) fixes the NM calculation to deal with reads that align off the end of the contig git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5630 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 19:18:18 +00:00
ebanks	fa6468d167	Remove the adaptor sequence clipping read filter because it is dangerous (it breaks LocusIteratorByState). We'll bring it back to life when ReadTransformers are created. Instead, have the utility code return a new clipped SAMRecord (necessary so that we don't break SNP calling in UG when the indel caller tries to hard-clip the reads). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5629 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 18:47:47 +00:00
hanna	5849e112e1	Fix exception in block weighting minus function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5628 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 17:07:04 +00:00
corin	9ee30ce594	Whole genome pipeline script. currently chunks, cleans, calls, merges, selects and filters indels, recalibrates, and evals. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5627 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 16:59:48 +00:00
hanna	a36adf0c6b	Request from the cancer team -- guarantee via javadoc that the returned read metrics are actually a clone, which they can do with as they wish. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5626 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 15:10:46 +00:00
delangel	06b1497902	Corrected bad merge. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5625 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 15:02:09 +00:00
delangel	9134bf3129	Long-forgotten change I neglected to commit a while back: add ability for SelectVariants to extracts either SNPs or Indels from combined vcf file. Not the ideal place to do it but it's important to at least have something to split vcfs now that we call snp's and indels combined. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5624 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 14:58:44 +00:00
chartl	8e0d191a70	Added a walker to help sort out which samples in a region are giving signal. Lots of reused code that shouldn't be. Will refactor later. Also fixed an "issue" with InsertSizeDistribution -- apparently for mate pairs, the first mate (karyotypically) will have a POSITIVE insert size, and the second a NEGATIVE insert size -- thus the insert size distribution was being conflated with enrichment/depletion of first-in-pair or second-in-pair reads. Gah. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5623 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 13:53:31 +00:00
chartl	efe6c539ac	Re-enabling disabled test. Apparently T-tests are very picky about your using an unbiased variance. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5622 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 03:05:50 +00:00

1 2 3 4 5 ...

5630 Commits (cbcdfc584d33206dce650c6234a7cf3602fa7a3f) All Branches Search

5630 Commits (cbcdfc584d33206dce650c6234a7cf3602fa7a3f)

All Branches