Commit Graph

663 Commits (e0803eabd914dc31f44c4f3ce112cee79d011744)

Author SHA1 Message Date
depristo e0803eabd9 enabled underlying filtering of zero mapping quality reads, vastly improves system performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@853 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 14:51:08 +00:00
hanna 1f93545c70 Always opt to merge dictionaries when creating a SAMFileHeaderMerger.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@852 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 22:38:16 +00:00
hanna 0cf90b6f8a Tie into sequence merging code in the latest version of picard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@851 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 21:48:35 +00:00
aaron b43deda6c9 iterative changes to GLF files; also a test of checking-in over sshfs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@850 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:24:30 +00:00
hanna 5e8c08ee63 Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
ebanks 19f9ac2b05 Realign existing indels (from the aligner) to leftmost position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@848 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 04:56:51 +00:00
hanna aa17c4a468 Farewell, functionalj. You promised much, but you could not deliver.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@847 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 01:35:49 +00:00
aaron d275c18e58 adding some objects we need for the GLF format.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@846 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 22:32:25 +00:00
depristo ce6a0f522b First incarnation of the population-based SNP analysis tool. Also bug fixes throughout the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@845 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 22:02:24 +00:00
hanna a11bf0f43e Basic unit tests for ReferenceOrderedView, ShardDataProvider. Addressing GSA-25.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@844 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 21:15:01 +00:00
ebanks e533c64b8f Walker to pull out the reference for given intervals and emit them in fasta format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@843 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:39:09 +00:00
aaron 5c6163ecbf Removing the old reads traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@842 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:36:11 +00:00
aaron c7b032cc88 missed a file in the add.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@841 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:27:38 +00:00
aaron 3c3cd5bb64 Moving some of the data sharding around. A new shard catagory now exits, INTERVAL. This saved a lot of code that was mirroring the same approach in both the read and locus shard strategies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@840 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:24:31 +00:00
asivache 99524ab6d0 package name corrected
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@839 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:20:43 +00:00
asivache b76f8c4eb5 moved from playground to gatk
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@838 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:18:33 +00:00
asivache c3678c7bb9 moved from playground to gatk
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@837 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:18:08 +00:00
asivache 5b310e48f5 changed to use factored out Transcript class; some docs added (not much)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@836 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:17:23 +00:00
asivache ae0bac5696 'made public' implies the 'public' keyword, actually...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@835 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 17:57:01 +00:00
asivache 41c1a62ac4 formerly private class, factored out and made public. Represents a transcript annotation (transcript id, genomic location, genomic intervals for all exons present in this transcript, etc)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@834 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 17:52:38 +00:00
hanna 8edba13ded Unit tests for the reference views. Partially addresses GSA-25.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@833 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 17:49:45 +00:00
ebanks 9bd6489f8e Output indels in the format appropriate for low-coverage indel submission
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@832 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 17:32:15 +00:00
ebanks 919e995b7f -Moved my walkers to indels directory
-Removed entropy walker and replaced it with mismatch (column) walker
-Some improvements to the cleaner (more to come)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@830 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 16:34:24 +00:00
hanna 864a1e81e3 Delete stale class from previous rethink of the traversal engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@828 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 13:52:03 +00:00
aaron 6fab1a64fa Started work on GLF input / output basics. Do not use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@827 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 22:49:59 +00:00
asivache b81135c606 bug fixed; this rod seems to work now...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@826 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 22:25:34 +00:00
hanna a488d2dbb2 Lazy creation of output streams. Only create output streams when absolutely necessary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@824 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:56:57 +00:00
asivache ab7bb5800a forgot to remove debug print statement
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@823 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:38:27 +00:00
asivache 568a0d3c27 exon coordinates are now parsed correctly (?). IF DELIMITER IS THE LAST CHARACTER IN A STRING, String.split() DOES NOT return empty field as the last one; instead, the last field returned will be the one immediately before such delimiter! Wicked.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@822 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:36:50 +00:00
asivache f4119c17de still working on it...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@821 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:07:38 +00:00
asivache d73f2e95cc refseq added to the list of known rod types
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@820 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:06:44 +00:00
asivache 23b7a28015 simple walker that works off pre-computed tumor/normal genotyping calls (e.g. samtools pileup). Collects overal stats and also writes somatic variants into IGV-compatible bed file if asked to. NOT finished. NOT tested
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@819 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:05:47 +00:00
asivache 8f1cabd33d cmd line args changed - again; internally uses VariantType enum
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@818 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:03:58 +00:00
asivache 9ef1a21112 minor changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@817 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:03:06 +00:00
aaron d994544c47 Added back end code support for Sharding based on genomic location for reads. Changed the sharding
code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier 
and cleaner test cases.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@816 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:57:46 +00:00
asivache 4edcdffe45 refseq annotation track: should be able to provide (multiple) transcript annotations available over a given genomic position. NOT finished and NOT tested!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@815 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:07:15 +00:00
andrewk 149cc9989b spaces!!!!!!!!!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@814 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 19:40:25 +00:00
ebanks c2df35b7fe - get leftmost position of indel correct
- don't try to clean reads with mapping quality of 0
- un-deprecate


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@813 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 17:24:58 +00:00
hanna 54bb643d19 Validated Mark's assertion that GSA-27 is fixed. Also did some cleanup on the pileup walker so that it doesn't output to System.out.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@812 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 15:58:21 +00:00
hanna 008d677bea Fixed ValidatingPileup to work with Andrey's new rodSAMPileup -> GenotypeList type hierarchy.
Fixed reference-ordered data validation system to validate class hierarchies instead of specific class types.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@811 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-23 20:50:28 +00:00
aaron d056f9f3e8 Changed the name to reflect the sorted nature of the set, added some fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@810 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 22:34:24 +00:00
aaron 831d430025 Added a collection for storing GenomeLocs, that also has functions for removing by genomic region (that may span multiple GenomeLoc's in the collection), and adding regions, which are then merged with any overlapping regions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@809 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:52:40 +00:00
hanna 34413362fd Bugfix: handle case where queue is empty.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@808 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:45:22 +00:00
hanna ec2e8d5726 Fixes for getting ValidatingPileup running in parallel.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@807 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:20:24 +00:00
kiran cd80e3f372 Replaced dumb training function with a version that creates a training set slightly more sensibly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@806 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:34:33 +00:00
kiran 02c0afdb85 Added the ability to specify the sorted, unaligned bam and/or the sorted, aligned bam such that broken computations can be restarted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@805 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:33:34 +00:00
kiran 454a6d1df7 Fixed an egregious error in simpleReverseComplement wherein the RC'd string would be composed entirely of the last base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@804 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:32:20 +00:00
hanna 2a5be1debe Cleanup in datasources.providers namespace. Make it easier for others writing traversal engines to use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@803 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:12:00 +00:00
asivache 02fc4f145f refactoring: a couple of general purpose (hopefully useful?) methods/classes extracted into a standalone utils class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@802 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 18:54:40 +00:00
asivache 4b718688d5 no changes, really, just synchronizing (instead of reversing) to increase the amount of entropy
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@801 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:27:28 +00:00