gatk-3.8

Commit Graph

Author	SHA1	Message	Date
kshakir	787e5d85e9	Added the ability to test pipelines in dry or live mode via 'ant pipelinetest' and 'ant pipelinetest -Dpipeline.run=run'. Added an initial test for genotyping chr20 on ten 1000G bams. Since tribble needs logging support too, for now setting the logging level and appending the console logger to the root logger, not just to "org.broadinstitute.sting". Updated IntervalUtilsUnitTest to output to a temp directory and not the SVN controlled testdata directory. Added refseq tables and dbsnps to validation data in BaseTest. Now waiting up to two minutes for gather parts to propagate over NFS before attempting to merge the files. Setting scatter/gather directories relative to the -run directory instead of the current directory that queue is running. Fixed a bug where escaping test expressions didn't handle delimiters at the beginning or end of the String. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4717 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-22 22:59:42 +00:00
kshakir	673fa841a4	Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader. Removed obsolete usages of PackageUtils with updated PluginManager. Ported Queue interval utilities written in scala over to Sting's java IntervalUtils. Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles. Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test). While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1". Upgraded to scala 2.8.1 and updated calls to deprecated functions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-12 20:14:28 +00:00
depristo	1de713f354	Massive review of maybe 50% of the exceptions in the GATK. GATKException is a tmp. tracker so that I can tell which StingExceptions I've reviewed. Please don't use it. If you are working on new code and are considering throwing exceptions, it's either UserError or StingException, please git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4246 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-09 23:21:17 +00:00
asivache	a47824d680	A couple of type specific implementations of a single extend() method: takes an array (byte[] or short[] currently) and "extends" it to the left or to the right by the specified number of elements. Returns newly allocated array, with the content of original array copied in (if we extend by n elements to the left, then the returned array will have n default-filled elements followed by the content of the old array). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3932 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 15:30:48 +00:00
depristo	dfc36c1e95	Restructuring of the mandatory read filters for traversals. Now everything uses ReadFilters, even for the required filters like being mapped for LocusWalkers. Statistics now tracked for each read filter used during the traversal and info emitted in INFO at the end. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3445 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-26 22:12:25 +00:00
depristo	727822adb4	BaseUtils has more clear distinction between byte and char routines. All char routines are @Depreciated now. Please use bytes. Better organization of reverse(), now in Utils not BaseUtils. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3400 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 14:05:13 +00:00
weisburd	2f3933148d	Added fast split(str, delimiter) methodf git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3384 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 03:37:26 +00:00
weisburd	8b2ce128b5	Optimized the join(..) method. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3280 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-30 15:55:07 +00:00
hanna	c1e53d407d	The copyright tag that I copied/pasted from a LaTeX document into IntelliJ had unicode quote characters embedded in it. These characters were invisible inside IntelliJ but cause compile warnings for Ryan and Aaron, who for whatever reason have a different default charset. Fixed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3203 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-20 15:26:32 +00:00
hanna	1bc26f69e9	An attempt to cleanup the Utils directory. Email to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3198 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-19 23:00:08 +00:00
ebanks	47e30aba92	Rods for reads hooked up into the cleaner git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3070 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-24 18:17:56 +00:00
depristo	076d21d394	Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-22 02:47:09 +00:00
depristo	934d4b93a2	VariantContext to VCF converter. BeagleROD, and phasing of VCF calls. Integration tests galore :-) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2814 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-09 19:02:25 +00:00
depristo	af8c47fc2f	Fixing up testVariantContext for integration tests for variant context. Printing of VCs and genotypes now stable using sorting. Cleaned up comments in quality score by strand. RefMetaDataTracker now directly allows walkers to obtain VariantContexts using the simple Collection<VariantContext> getAllVariantContexts(GenomeLoc curLocation, EnumSet<VariantContext.Type> allowedTypes, boolean requireStartHere, boolean takeFirstOnly) function. VCF and dbSNP VariantContexts now officially supported. Other importan types can be added to the adapator system in refdata package. Integration tests later today git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2791 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-05 15:42:54 +00:00
ebanks	9a658e6b18	-Fixed VCF header line bug -Added useful trim() method for Strings for characters other than whitespace git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2538 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-07 17:51:41 +00:00
depristo	e793e62fc9	minor code cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2189 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 20:57:20 +00:00
ebanks	be6a549e7b	Added the capability to allow expressions in an integration test command (i.e. -filter 'foo') by escaping them in the command. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2132 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 02:34:48 +00:00
aaron	aece7fa4c7	a convenience method to join a map into a single string, which I need for some VCF work. Added some documentation to the join method as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2057 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 16:50:01 +00:00
ebanks	4558375575	Stage 1 of the VariantFiltration refactoring is now complete. There now exists a parallel tool called VariantAnnotator which simply takes variant calls and annotates them with the same type of data that we used to use for filtering (e.g. DoC, allele balance). The output is a VCF with the INFO field appropriately annotated. VariantAnnotator can be called as a standalone walker or by another walker, as it is by the UnifiedGenotyper. UG now no longer computes any of this meta data - it relegates the task completely to the annotator (assuming the output format accepts it). This is a fairly all-encompassing check in. It involves changes to all of the UG code, bug fixes to much of the VCF code as things popped up, and other changes throughout. All integration tests pass and I've tediously confirmed that the annotation values are correct, but this framework could use some more rigorous testing. Stage 2 of the process will happen later this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2053 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-16 02:41:20 +00:00
aaron	ad1fc511b1	intermediate commit for some changes in the Variation system, so Eric can go ahead with his changes. Everything is pretty set, but the Variation interface could use a convenience method that joins all the alternate alleles. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1903 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-23 06:31:15 +00:00
ebanks	52d2e0ca07	All walkers now use read.getReadGroup() git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1839 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 19:27:40 +00:00
depristo	d9588e6083	bug fixes to LIBS and LIBH following ultra-aggressive regression testing across 454, solid, and solexa git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1558 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:36:12 +00:00
hanna	ccdb4a0313	General-purpose management of output streams. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-23 00:56:02 +00:00
ebanks	4efe26c59a	Major: allow genotyper to optionally output in 1KG format, including outputting the samples in which indels are found. Minor: refactor 454 filtering git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1300 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-23 19:53:51 +00:00
hanna	b18caa2052	Fix for GSA-90: System isn't failing with an error when you use the wrong reference. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1225 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-13 20:42:12 +00:00
hanna	d19366eaad	Cleanup emergency fixes for out-of-bounds issues in reference retrieval. Fix spelling mistakes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1173 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-06 15:41:30 +00:00
depristo	6684cb8bc9	copySamFileHeader() utility function git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1154 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-02 12:55:51 +00:00
ebanks	ea2426dcd0	one more change needed to commit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1093 348d0f76-0448-11de-a6fe-93d51630548a	2009-06-25 15:09:53 +00:00
depristo	819862e04e	major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a	2009-06-05 23:34:37 +00:00
aaron	199be46c36	changed the warning that is outputted when the GenomeLoc constructor can't find the given contig in the reference. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@913 348d0f76-0448-11de-a6fe-93d51630548a	2009-06-05 15:49:03 +00:00
aaron	37efd78c7e	fixed the logger call so we get output that indicates this class generated the message git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@911 348d0f76-0448-11de-a6fe-93d51630548a	2009-06-05 15:02:17 +00:00
ebanks	36fb6ca3c5	Allow user to specify the compression to be used when writing out BAM files. Updated most of the walkers to reflect this change. Now it won't take forever to write BAMs! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@909 348d0f76-0448-11de-a6fe-93d51630548a	2009-06-05 08:48:34 +00:00
asivache	d601548d53	added reallocate(int[] orig_array, int new_size) and int[] indexOfAll(String s, int ch); the former is self-explanatory, while the latter returns array of indices of all occurences of ch in the specified string git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@856 348d0f76-0448-11de-a6fe-93d51630548a	2009-05-29 20:15:00 +00:00
hanna	5e8c08ee63	Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a	2009-05-28 20:13:01 +00:00
depristo	d261459c48	Useful function to create a string with N copies of a same char git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@784 348d0f76-0448-11de-a6fe-93d51630548a	2009-05-21 22:23:52 +00:00
hanna	23e9e29964	Changed reads traversals from providing a LocusContext from which the reference sequence could be extracted to a char[] containing the reference bases. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@657 348d0f76-0448-11de-a6fe-93d51630548a	2009-05-11 22:45:11 +00:00
depristo	e848f34896	countOccurances of char in string and max of a list of bytes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@622 348d0f76-0448-11de-a6fe-93d51630548a	2009-05-07 18:03:49 +00:00
jmaguire	6cef8bd76c	added k-best quality path enumeration. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@497 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-22 20:26:51 +00:00
jmaguire	af6788fa3d	Misc: 1. Added logGamma function to utils 2. Required asserts to be enabled in the allele caller (run with java -ea) 3. put checks and asserts of NaN and Infinity in AlleleFrequencyEstimate 4. Added option FRACTIONAL_COUNTS to the pooled caller (not working right yet) AlleleFrequencyWalker: 5. Made FORCE_1BASE_PROBS not static in AlleleFrequencyWalker (an argument should never be static! Jeez.) 6. changed quality_precision to be 1e-4 (Q40) 7. don't adjust by quality_precision unless the qual is actually zero. 8. added more asserts for NaN and Infinity 9. put in a correction for zero probs in P_D_q 10. changed pG to be hardy-weinberg in the presence of an allele frequency prior (duh) 11. rewrote binomialProb() to not overflow on deep coverage 12. rewrote nchoosek() to behave right on deep coverage 13. put in some binomailProb() tests in the main() routine (they come out right when compared with R) Hunt for loci where 4bp should change things: 14. added FindNonrandomSecondBestBasePiles walker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@471 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-19 15:35:07 +00:00
asivache	df5aae5ed4	got read of a couple of warnings and added percentage(x,base) methods git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@462 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-17 15:15:21 +00:00
depristo	72a3d84ed2	General purpose pileup code -- you can use these features to obtain detailed pileup data from reads and offsets. Useful for all pileup based walkers. Expanded support for rodSAMPileup to enable the new ValidatingPileupWalker, which takes a samtools pileup output and checks that GATK gives identical output as samtools on a per base and per qual pileup. It's going to be a very useful validation tool. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@418 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-14 22:13:10 +00:00
kiran	998fad76c6	Some utility methods for creating pileups of secondary bases and secondary quals. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@397 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-14 13:57:54 +00:00
depristo	bb666ce392	Added mappingQualPileup function for use in the verbose mode of Pileup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@391 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-14 00:51:26 +00:00
jmaguire	f39092526d	Added function RandomSubset git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@379 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-13 12:14:53 +00:00
depristo	00722e19bc	The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@319 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-07 22:19:54 +00:00
ebanks	3f75fc4e83	Unfortunately, because BWA occasionally outputs crazy reads, we need to make sure not to have an ArrayIndexOutOfBoundsException thrown. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@297 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-06 03:51:35 +00:00
ebanks	42eb356782	1. modifed by read traversals with indexes to be more general 2. GenomeLocs for reads should have ends spanning the read (moved it to GenomeLoc from Utils) 3. Got rid of those stupid unmappable characters from comments in various files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@289 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-03 18:24:08 +00:00
depristo	d952790258	GFF now parses attributes correctly and efficiently. Slightly better interface to Utils.join git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@253 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-01 22:54:38 +00:00
depristo	385736469c	High performance pileup code and utilities git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@242 348d0f76-0448-11de-a6fe-93d51630548a	2009-04-01 00:47:47 +00:00
depristo	d7c0bcc223	Reorganized GenomeLoc code to more clearly and better use the picard SequenceDictionary information. All GenomeLoc[] are not ArrayList<GenomeLoc> for clarity and consistency Parsing now recursively merges contiguous elements chr1:1-10;chr1:11-20 => chr1:1-20 Added support for TraversingByLoci over all reference positions specified by the provided location array. System dynamically determines which traversal system to use. Pileup now marks, very clearly, reference positions without covered reads. Made changes around the codebase to deal with new GenomeLoc structure. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@218 348d0f76-0448-11de-a6fe-93d51630548a	2009-03-28 20:37:27 +00:00

1 2

53 Commits (8831ec3dce0f937b89660b7e303fd5e7c35e8a3f)