Commit Graph

4548 Commits (cbce3e3c83c72a8c7dff7b8fec00f00a2b419e83)

Author SHA1 Message Date
asivache a47824d680 A couple of type specific implementations of a single extend() method: takes an array (byte[] or short[] currently) and "extends" it to the left or to the right by the specified number of elements. Returns newly allocated array, with the content of original array copied in (if we extend by n elements to the left, then the returned array will have n default-filled elements *followed* by the content of the old array).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3932 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:30:48 +00:00
asivache 012a7cf0a5 mismatchCount now has a version that counts mismatches only along a part of the read (takes additional args start_on_read and length_on_read to specify the read's subsequence to be interrogated);
isMateUnmapped() convenience shortcut method added.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3931 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:27:35 +00:00
delangel e6e8a20a1e 1) Fix MyHaplotypeScore to ignore 454 reads, since all those pathological non-existing indels make some sites' score blow up. If a site is only covered by 454 reads, we (hopefully) detect this graciously and just emit a score of 0.0 for the site.
2) New annotation SByDepth = log10(-StrandBias/Depth) (non-standard annotation, key name = "SBD"). If StrandBias/Depth happens to be positive (very rare but can happen), annotation gets value=-1000. 
3) Abstracted out new class AnnotationByDepth so that QD and SBD can share code.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3930 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:23:08 +00:00
ebanks bf60ed0b25 Needed it here too: warn user instead of dying if the R script cannot be executed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3929 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 13:11:27 +00:00
ebanks 40ffe34686 Warn user instead of dying if the R script cannot be executed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3928 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 13:08:15 +00:00
ebanks 17d5e89734 Now --list annotates which modules are Standard
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3927 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 21:00:37 +00:00
ebanks 72875cf717 Removing annoying printouts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3926 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 19:55:00 +00:00
ebanks 2307bed742 VariantEval now uses the "standard" modules only by default. You can add other modules with the -E argument and not use all of the standard ones with -noStandard (they can be added back individually with -E).
Generalized some of the packaging code from VariantAnnotator.  Matt might want to take a look to make this nicer...?



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3925 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 16:51:10 +00:00
ebanks a7ff9caf54 Added sanity check against bad people and/or crazy big indels at edges of ref context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3918 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 05:37:17 +00:00
hanna 5f1b67c1de Coping out and forcing the entire GATK (and associated JVM) to use US English
locale.  Method to force JVM into proper locale exists in CommandLineProgram
and is disabled by default, but implementers of CommandLineProgram can opt in
to the forced US locale by calling a static method.

Question for the VCF developers: I removed the code to explicitly output doubles
in US locale.  Do you / how do you want to handle this in applications that use
Tribble outside the GATK?


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3917 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 03:48:26 +00:00
hanna b5b2c19124 Updated resources package descriptor with dbsnp 129 for b37.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3916 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 02:42:48 +00:00
chartl 2bc69572cb Make transcript2info capable of handling b37/hg19 contigs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3915 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-02 17:32:08 +00:00
depristo c203e0fb02 Added JEXL support for hetCount, homRefCount, and homVarCount in VCs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3914 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-02 12:24:11 +00:00
depristo 7fab5c0a8f support for -singleton_fp_rate arguments to variant recalibrator instead of the pop.gen. AF prior. Worth experimenting with Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3913 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-31 21:17:47 +00:00
chartl 6dcb63888d Be smart about the headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3912 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 16:11:16 +00:00
chartl eeb767a012 Remove legacy "out"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3911 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 15:01:02 +00:00
chartl f20cdbe60a Modified to work with MT containing VCFs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3910 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:59:59 +00:00
ebanks 6d91cd587e Be explicitly clear about which options are for debugging purposes only and shouldn't be used if your username is not ebanks@broad. If only we had a @hidden annotation option for args...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3909 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:18:31 +00:00
depristo ac8048f17b Support for automated selects for tranches in variant eval -- use -tf to make tranch-specific ve outputs. ApplyVariantCuts with tranche reading functions for general use, along with todo for ryan. CombineVariants now has --filteredAreUncalled and will treat filtered snps in input VCFs are uncalled, and so won't emit -filteredInOther set features
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3908 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:16:43 +00:00
chartl 9231d13252 Minor modification: adding an argument to make slightly more general.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3907 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 05:20:20 +00:00
chartl db54d63fc7 Hahaha yes, ownage. This now works.
BTW, Eric, thanks for forwarding the DepthOfCoverage thread to gsamembers. I'd forgotten about reduce by interval. Mighty helpful in this case!




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3906 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 04:23:02 +00:00
chartl 3e3f8c7692 Simple count intervals walker, as per my recent email to GSAMembers. Never use this. It doesn't behave the way you think it does.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3905 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 03:39:23 +00:00
chartl 9132c98eec Slightly smarter interval list dealing (whole exome intervals are .interval_list, whole genome are .interval.list). Also use BTI with the Genomic Annotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3904 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 22:04:02 +00:00
chartl 54d93f63d2 Hacky fix for LSF confusion -- submitted jobs check to see if their directory exists, despite depending on the job which creates said directory. Filter strings now have escaped quotes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3903 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 21:35:50 +00:00
chartl 0f9baa2e94 Ha ha ha ha ha
:(



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3902 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 20:48:35 +00:00
delangel ba1a330293 Corrected location and made more explicit the error message thrown if someone tries to read a VCF 3.3 file with indels, which is not supported.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3901 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 20:02:47 +00:00
kshakir 735ef19dc8 Added option to sleep after creating temporary directories.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3900 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 19:53:17 +00:00
kshakir 82c37fceb5 Create intermediate directories and don't error if the directory already exists.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3899 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 19:45:12 +00:00
chartl 7a5ee485d2 Full pipeline now works through DAG creation. First draft; more work to do to make it cleaner and better command-line input handling (and properties handling); but the DAG is rendered and looks good.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3898 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 19:36:17 +00:00
delangel 5af986e0c1 Add an integration test for Beagle (one for ProduceBeagleInput and one for BeagleOutputToVCFWalker)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3897 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 18:49:22 +00:00
chartl 4d4cf6e1dc Updates to calling pipeline
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3896 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 18:37:20 +00:00
chartl 52f24c86fa Script to split a provided interval list into contigs. Excesses will be dropped into the last provided file. Works like splitIntervals.sh. This is for Queue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3895 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 16:42:53 +00:00
chartl 62a9217a61 A brute-force exome/genome independent end-to-end cleaning/calling pipeline using Queue
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3894 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 13:17:14 +00:00
chartl f35e6d73b4 Actually name the class the name of the file. (Clearly created by cp)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3892 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-28 15:18:04 +00:00
chartl cd9395fa14 Since Picard's FixMateInformation merges, fixes mates, and sorts, allow it to be used as a gather function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3891 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-28 15:09:19 +00:00
delangel e1a34685fd Add back MyHaplotypeScore as a new implementation for HaplotypeScore, this time as a non-standard annotation. Implementaiton is also better, it computes better consensus haplotypes, ranks them by sum of quality score.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3890 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 21:23:19 +00:00
hanna 6c93b13428 A Java sizeof, implemented using the Java instrumentation API. Can either get the memory consumed either only by a single
object or by a single object and all the references it contains.  Requires a command-line change to add a Java agent to
the command-line; see the Sizeof.java javadoc for details.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3889 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 18:44:15 +00:00
rpoplin f5566a6593 Knocking out some quick findBugs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3887 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 14:10:59 +00:00
delangel 894623858d OK, bad idea to add new temporary annotation - revert to keep integration tests hapy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3886 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 12:07:13 +00:00
delangel 71bfb1ee35 First redesign of HaplotypeScore - now, a different approach is taken to build possible haplotypes at a site: first, all possible haplotypes consistent with reads are formed (reference is not used). After this list has been formed, it is ranked according to the number of reads that are consistent with it and the two most popular haplotypes are chosen.
this reduces to the old method in typical cases, but it builds haplotypes correctly if there are two variants close by within a context window.

Annotation is temporarily named MyHaplotypeScore so it can be run in parallel with old one, soon it will be renamed after some more testing.
 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3885 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 10:54:56 +00:00
delangel cffebcc867 Small utility walker used for production of the Beagle data processing paper section. Walker will print out to output file, for every site common to a reference vcf and an eval vcf, a given sample's depth, hapmap AC and AF and pre/post Beagle genotype as well as corresponding reference (e.g. Hapmap) genotype.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3884 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 03:00:17 +00:00
ebanks 1d9ed1e214 Cleanup of old VCFRecord code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3883 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 02:56:47 +00:00
ebanks 7dd55fbf13 Archiving
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3882 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 02:47:18 +00:00
aaron 9667942e52 fix for Ryan's issue: we also need to sync when we store a resource.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3881 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-26 22:17:47 +00:00
hanna 8b072b59e2 Returning index dumping functionality in BAMFileStat to a useable state.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3880 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-26 20:03:50 +00:00
depristo 19ad44d332 Minor improvements to CombineVariants to handle the complex case from Chris. IntegrationTest of complex case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3876 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 13:46:11 +00:00
ebanks 7c5a3836db Trivial changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3875 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 04:00:47 +00:00
ebanks 56de475f11 Based on feedback from non-GSA users, who claim that our exceptions are 'scary and overwhelming,' I've cleaned up the error message to first describe the error and what users should do and then ask them to copy the subsequent stack trace into their GetSatisfaction posting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3874 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 03:57:44 +00:00
ebanks 9bd8a2685b Because the performance tests were busted on LSF, no one caught this error until now: when Matt changed over the contract for the AlignmentContext, this line needed to get updated too. All is well now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3873 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 02:53:01 +00:00
depristo b551eaf8fd Actually commit the code that makes variant eval run in a reasonable amount of time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3872 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-24 17:32:03 +00:00