Commit Graph

3635 Commits (cc8d8eaedbaba69e6d271e1dc9409f62ca30138c)

Author SHA1 Message Date
asivache cc8d8eaedb Now that we always reserve space for two read ends when collecting stats stratified by libraries, we need to check that the second end was indeed present; otherwise the pointer is null and this was causing an exception
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3656 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:40:16 +00:00
kiran 26ef1f84bf Updates to not depend on an environment variable to figure out where libraries are (helpful for installation at the Sanger).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3655 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 15:46:41 +00:00
ebanks 9a24598a98 By default, don't clean reads with mates mapped to other chromosomes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3654 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 15:14:20 +00:00
weisburd e7939f7036 Fixed error message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3653 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 14:50:28 +00:00
kiran b2127e59c4 A first draft of scripts and LaTeX templates required to automatically generate slides for the 1,000 Genomes Automated Data Processing Report.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3652 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 14:30:50 +00:00
ebanks bf5cbad04c Make the target creator a rod walker (that allows reads) so that we can easily trigger the cleaner on only known indel sites. Adding an integration test to cover this case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3651 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 13:28:37 +00:00
ebanks 464ac63a22 Allowing N's in ALT field
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3650 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 11:41:32 +00:00
hanna 3a9d426ca8 Added hasPileupBeenDownsampled() boolean to ReadBackedPileup, so that a pileup can report whether or not (but not how much) it's been downsampled.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3649 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 04:56:33 +00:00
ebanks 8e848ccd84 SAMFileWriters can now write to /dev/null without throwing exceptions, so we can remove the try/catch blocks.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3648 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-27 03:59:10 +00:00
aaron 09ccdf83b2 fixing a broken test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3647 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:59:00 +00:00
depristo d6cbe4d0ad Bug fixes to support haploid genotypes, optimization for indexing, now tracks the line of the VCF and catches errors to tell you the line no and line when a parsing error occurred.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3646 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:08:41 +00:00
aaron 5f8a3f95ef The GT field once again reigns supreme (it must be the first genotype field). Thanks for the catch Eric.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3645 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:03:05 +00:00
kshakir 894ad354fa Fixed typo in the name of the shell directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3644 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 20:59:40 +00:00
kshakir 75c98c42b8 Started path of deprecation of Sting's @Argument by splitting the annotation into @Output and @Input. Anything that's not an @Output should be an @Input.
Checked in example qscripts that are basically todo integration tests.
Replaced use of queue @Input/@Output with Sting's new @Input/@Output.  This means you'll now have to doc-ument the annotations.
More work on dependency resolution cycles being created in the graph during scatter/gather.
Filtering nulls to avoid NPE exceptions in scala's 'Collection'.hashCode.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3643 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 20:51:13 +00:00
weisburd 147ba68441 Fixed bug with mrnaCoord field - made it count exon positions only, rather than introns & exons
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3642 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 19:53:32 +00:00
kshakir ce27ed0d60 Added missing @ClassType to memory limits.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3641 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:56:35 +00:00
aaron dff4c06763 Rev'ing Tribble with a special version that has excluded VCF 3.3
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3640 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:20:51 +00:00
aaron d3848745ab moving VCF 3.3 back into the GATK so Guillermo can make changes for VCF 4 output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3639 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:20:06 +00:00
aaron b3edb7dc08 two fixes for the VCF 4 parser:
- Allow the "GT" field in genotypes at any point in the genotype string (before we required they be the first key-value pair).
- Fix a bug with the phasing value put into the VariantContext, thanks for the catch Guillermo!

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3638 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:01:23 +00:00
weisburd e15fe6858e Disabling test - Will need to update big-tables soon.. will re-enable after updating md5
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3637 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 15:43:41 +00:00
aaron efa60e5de5 and add changes to the vcf used in testing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3636 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 02:56:02 +00:00
aaron f9c7803d4e this got left off my last commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3635 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 02:42:44 +00:00
weisburd 1cb8f51f8c Fixed -t arg
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3634 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 23:44:10 +00:00
weisburd 3cd0570c1e Now can run with multiple processes, multiple threads, or both
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3633 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 23:25:01 +00:00
weisburd dae3ce2c0f changed log dir
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3632 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 23:08:13 +00:00
weisburd fea8054e9e Updated long name for -l to --run-locally
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3631 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:26:45 +00:00
aaron 682f9b46c6 Two fixes together:
1) Some improvements to the VCF4 parsing, including disabling validation.
2) Reimplemented RefSeq in the new Tribble-style rod system.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3630 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:17:03 +00:00
weisburd 72e669538e Updated arg description for -s
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3629 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:04:01 +00:00
aaron 62bc7651a8 fix for PSPW with DbSNP mask. Added an integration test for this case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3628 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 19:31:32 +00:00
hanna 4840ef6d3e Another rev of picard for /dev/null writing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3627 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 19:22:47 +00:00
hanna c32f9d78ae Rev picard again, this time for error writing to /dev/null.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3626 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 04:08:26 +00:00
corin bcab0eba01 This replaces tearsheet.r, neatens up graphics, and allows the script to be used in R's interactive environment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3625 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 01:02:58 +00:00
aaron 8a9b2f4256 removing the GLF ROD.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3624 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 22:51:45 +00:00
asivache 17d2043354 bug fix: now contigs not present in the sequence dictionary are registered properly and do not cause the script to break
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3623 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:59:38 +00:00
aaron 611d834092 a couple of VCF 4 improvements:
-Validation of INFO and FORMAT fields.
-Conversion to the the correct type for info fields (i.e. allele frequency is now stored as a float instead of a string).
-Checks for CNV style alternate allele encodings( i.e. <INS:ME:L1>), right now we exception out.  Maybe we should just warn the user?
-Tests for the multiple-base polymorphism allele case.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3622 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:21:43 +00:00
aaron 54ae0b8e4e some updates to tribble for the svn commit that will follow
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3621 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:20:07 +00:00
ebanks f0fc34bb8e Bug fix: N's are allowed in the ref so don't fail when e.g. dbsnp has an N!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3620 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 17:49:14 +00:00
ebanks b6bceb39b0 Fixing up output for performance tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3619 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 17:00:17 +00:00
chartl 75d4736600 Committing changes to comp overlap for indels. Passes all integration tests; minor changes to MVC walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3618 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 15:49:13 +00:00
ebanks 9b8775180e Turn on the memory improvement by default (assume the target interval list is sorted, since it is 99.9% of the time). Make the user throw a flag when it's specfically not sorted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3617 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 15:44:55 +00:00
hanna 26d51bbe14 Another round of optimizations from Alec. Switching the header merger to
an IdentityHashMap provides another 10x+ performance boost over his previous
optimization for us.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3616 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 14:54:58 +00:00
hanna 003dd4de3e Rev Picard with performance enhancements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3615 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 22:54:23 +00:00
aaron 0cafd3d642 clip VCF alleles for indels: only a single left base, and as many right bases as align before converting to variant context.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3614 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 22:42:38 +00:00
aaron 9872b65803 clip to the null allele on the reference string in VCF 4, instead of stopping to perserve one reference base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3613 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 20:52:19 +00:00
ebanks b5df2705c9 -Remove Nway output option
-Remove in-memory sorting
-Default to name-sorting (although we allow coordinate sorting with the --sortInCoordinateOrderEvenThoughItIsHighlyUnsafe flag).

Cleaner, faster code.  Wiki has been updated (including how to use FixMateInformation.jar from Picard).  More changes coming soon.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3612 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 20:31:55 +00:00
kshakir 30cf78fdc0 Refactoring for a first version of scatter gather api with basic shell script implementations.
Modified build script so that queue is cleaned during "ant clean".



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3611 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 18:39:20 +00:00
aaron 18f62a346d fixing the tests; Bamboo captured the failure in the logs correctly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3610 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 18:38:03 +00:00
aaron a6d3e4bd47 Add code to allow reference alleles with 'N' in VariantContext, but not in the alternate allele(s). Also more updates to the VCF 4 code (fixed parsing for files without genotypes).
This check-in will temperarly break the build (I need to see if Bamboo is correctly returning the log file for the failed builds).  

Will be fixed once Bamboo starts building.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3609 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 18:26:37 +00:00
ebanks 824c2bbac0 Finishing previous checkin
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3608 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 17:21:38 +00:00
ebanks 4727bcda24 Removing Beagle output from UG. Use ProduceBeagleInput walker instead (since it can be run post-filtration and respects the FILTER column).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3607 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 16:56:37 +00:00