Commit Graph

3640 Commits (dce2c174046bee04247d942cf6f8ffee55c84ada)

Author SHA1 Message Date
kshakir dce2c17404 Added "-bsubWait" where Queue waits for all the jobs to exit before exiting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3661 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 19:52:17 +00:00
chartl 20f5fdbcf7 Changes to MVC to make the the header of its output VCF compliant with spec (give expected # of values for info field annotations)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3660 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 18:33:23 +00:00
kshakir c047232b18 Using picard for bam merging.
Properties now propagate to scatter/gather functions.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3659 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 17:59:09 +00:00
aaron 62d22ff1aa adding the original allele list to a variant context (as the annotation ORIGINAL_ALLELE_LIST), in the case where the set alleles are the result of clipping. Added tests for both cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3658 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 17:23:46 +00:00
ebanks 1292c96e29 The cleaner now adds the OC (original cigar) and OS (original alignment start) tags as appropriate to reads that get realigned; this feature can be turned off. Also, improved integration tests (sorry, Kiran!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3657 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:46:47 +00:00
asivache cc8d8eaedb Now that we always reserve space for two read ends when collecting stats stratified by libraries, we need to check that the second end was indeed present; otherwise the pointer is null and this was causing an exception
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3656 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:40:16 +00:00
kiran 26ef1f84bf Updates to not depend on an environment variable to figure out where libraries are (helpful for installation at the Sanger).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3655 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 15:46:41 +00:00
ebanks 9a24598a98 By default, don't clean reads with mates mapped to other chromosomes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3654 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 15:14:20 +00:00
weisburd e7939f7036 Fixed error message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3653 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 14:50:28 +00:00
kiran b2127e59c4 A first draft of scripts and LaTeX templates required to automatically generate slides for the 1,000 Genomes Automated Data Processing Report.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3652 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 14:30:50 +00:00
ebanks bf5cbad04c Make the target creator a rod walker (that allows reads) so that we can easily trigger the cleaner on only known indel sites. Adding an integration test to cover this case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3651 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 13:28:37 +00:00
ebanks 464ac63a22 Allowing N's in ALT field
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3650 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 11:41:32 +00:00
hanna 3a9d426ca8 Added hasPileupBeenDownsampled() boolean to ReadBackedPileup, so that a pileup can report whether or not (but not how much) it's been downsampled.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3649 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 04:56:33 +00:00
ebanks 8e848ccd84 SAMFileWriters can now write to /dev/null without throwing exceptions, so we can remove the try/catch blocks.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3648 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-27 03:59:10 +00:00
aaron 09ccdf83b2 fixing a broken test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3647 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:59:00 +00:00
depristo d6cbe4d0ad Bug fixes to support haploid genotypes, optimization for indexing, now tracks the line of the VCF and catches errors to tell you the line no and line when a parsing error occurred.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3646 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:08:41 +00:00
aaron 5f8a3f95ef The GT field once again reigns supreme (it must be the first genotype field). Thanks for the catch Eric.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3645 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:03:05 +00:00
kshakir 894ad354fa Fixed typo in the name of the shell directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3644 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 20:59:40 +00:00
kshakir 75c98c42b8 Started path of deprecation of Sting's @Argument by splitting the annotation into @Output and @Input. Anything that's not an @Output should be an @Input.
Checked in example qscripts that are basically todo integration tests.
Replaced use of queue @Input/@Output with Sting's new @Input/@Output.  This means you'll now have to doc-ument the annotations.
More work on dependency resolution cycles being created in the graph during scatter/gather.
Filtering nulls to avoid NPE exceptions in scala's 'Collection'.hashCode.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3643 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 20:51:13 +00:00
weisburd 147ba68441 Fixed bug with mrnaCoord field - made it count exon positions only, rather than introns & exons
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3642 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 19:53:32 +00:00
kshakir ce27ed0d60 Added missing @ClassType to memory limits.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3641 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:56:35 +00:00
aaron dff4c06763 Rev'ing Tribble with a special version that has excluded VCF 3.3
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3640 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:20:51 +00:00
aaron d3848745ab moving VCF 3.3 back into the GATK so Guillermo can make changes for VCF 4 output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3639 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:20:06 +00:00
aaron b3edb7dc08 two fixes for the VCF 4 parser:
- Allow the "GT" field in genotypes at any point in the genotype string (before we required they be the first key-value pair).
- Fix a bug with the phasing value put into the VariantContext, thanks for the catch Guillermo!

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3638 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:01:23 +00:00
weisburd e15fe6858e Disabling test - Will need to update big-tables soon.. will re-enable after updating md5
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3637 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 15:43:41 +00:00
aaron efa60e5de5 and add changes to the vcf used in testing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3636 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 02:56:02 +00:00
aaron f9c7803d4e this got left off my last commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3635 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 02:42:44 +00:00
weisburd 1cb8f51f8c Fixed -t arg
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3634 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 23:44:10 +00:00
weisburd 3cd0570c1e Now can run with multiple processes, multiple threads, or both
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3633 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 23:25:01 +00:00
weisburd dae3ce2c0f changed log dir
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3632 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 23:08:13 +00:00
weisburd fea8054e9e Updated long name for -l to --run-locally
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3631 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:26:45 +00:00
aaron 682f9b46c6 Two fixes together:
1) Some improvements to the VCF4 parsing, including disabling validation.
2) Reimplemented RefSeq in the new Tribble-style rod system.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3630 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:17:03 +00:00
weisburd 72e669538e Updated arg description for -s
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3629 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:04:01 +00:00
aaron 62bc7651a8 fix for PSPW with DbSNP mask. Added an integration test for this case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3628 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 19:31:32 +00:00
hanna 4840ef6d3e Another rev of picard for /dev/null writing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3627 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 19:22:47 +00:00
hanna c32f9d78ae Rev picard again, this time for error writing to /dev/null.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3626 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 04:08:26 +00:00
corin bcab0eba01 This replaces tearsheet.r, neatens up graphics, and allows the script to be used in R's interactive environment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3625 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 01:02:58 +00:00
aaron 8a9b2f4256 removing the GLF ROD.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3624 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 22:51:45 +00:00
asivache 17d2043354 bug fix: now contigs not present in the sequence dictionary are registered properly and do not cause the script to break
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3623 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:59:38 +00:00
aaron 611d834092 a couple of VCF 4 improvements:
-Validation of INFO and FORMAT fields.
-Conversion to the the correct type for info fields (i.e. allele frequency is now stored as a float instead of a string).
-Checks for CNV style alternate allele encodings( i.e. <INS:ME:L1>), right now we exception out.  Maybe we should just warn the user?
-Tests for the multiple-base polymorphism allele case.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3622 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:21:43 +00:00
aaron 54ae0b8e4e some updates to tribble for the svn commit that will follow
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3621 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:20:07 +00:00
ebanks f0fc34bb8e Bug fix: N's are allowed in the ref so don't fail when e.g. dbsnp has an N!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3620 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 17:49:14 +00:00
ebanks b6bceb39b0 Fixing up output for performance tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3619 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 17:00:17 +00:00
chartl 75d4736600 Committing changes to comp overlap for indels. Passes all integration tests; minor changes to MVC walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3618 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 15:49:13 +00:00
ebanks 9b8775180e Turn on the memory improvement by default (assume the target interval list is sorted, since it is 99.9% of the time). Make the user throw a flag when it's specfically not sorted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3617 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 15:44:55 +00:00
hanna 26d51bbe14 Another round of optimizations from Alec. Switching the header merger to
an IdentityHashMap provides another 10x+ performance boost over his previous
optimization for us.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3616 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 14:54:58 +00:00
hanna 003dd4de3e Rev Picard with performance enhancements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3615 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 22:54:23 +00:00
aaron 0cafd3d642 clip VCF alleles for indels: only a single left base, and as many right bases as align before converting to variant context.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3614 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 22:42:38 +00:00
aaron 9872b65803 clip to the null allele on the reference string in VCF 4, instead of stopping to perserve one reference base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3613 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 20:52:19 +00:00
ebanks b5df2705c9 -Remove Nway output option
-Remove in-memory sorting
-Default to name-sorting (although we allow coordinate sorting with the --sortInCoordinateOrderEvenThoughItIsHighlyUnsafe flag).

Cleaner, faster code.  Wiki has been updated (including how to use FixMateInformation.jar from Picard).  More changes coming soon.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3612 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 20:31:55 +00:00