Commit Graph

5884 Commits (3d628f06f0ed9e931b7fa7828aab96d8b8c086ba)

Author SHA1 Message Date
depristo 3d628f06f0 moved to playground
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5925 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 21:25:26 +00:00
depristo 429833c05a Intermediate commit (DVCS, where are you?) of a fully operational ReducedRead walker. Now results in minor differences in the raw calls (filtering is a different matter) in an exome but 20x less disk space than the full exome data. Changes to the UG necessary to process reduced reads are not yet committed, as they are being tested. This code is being moved to playground now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5924 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 21:13:31 +00:00
ebanks dd6d61c031 Adding integration test to cover the case of a read that only covers an insertion (i.e. no M in the CIGAR string).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5923 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 21:02:47 +00:00
ebanks d0ca6f8a9c Patch for case that a read spans only an insertion (i.e. no Ms in the CIGAR string): the end position should not be less than the start position (which is how Picard defines it) but instead should be equal to it. This is just a patch; we'll get a proper solution in at some point.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5922 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 20:40:56 +00:00
carneiro 355be57539 fixing the pipeline so that it still works while I'm adding support for BWA.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5921 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 19:32:28 +00:00
ebanks 3302a733ef Fixed docs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5920 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 16:02:14 +00:00
chartl 84c2c5d7e6 Stop running away from my commits, test modules.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5919 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 13:05:53 +00:00
chartl 092952db44 After verifying that the changes to these tests were all in the RankSum annotations, I'm commiting fixes to the test md5s.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5918 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 13:01:18 +00:00
ebanks c7fe062cb7 Refactored the VCF codec classes to minimize code duplication (which happened during the VCF3/4 split). Now, both codecs extend the AbstractVCFCodec class and all shared functionality exists there. Only methods that differ between the various codecs (e.g. because FILTER strings are encoded differently) are defined in the actual codecs. While I was in there, I put in checks for invalid empty inputs in the ID, FILTER, and INFO fields.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5917 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 19:40:47 +00:00
ebanks 81d9808eea Next version of test output for non-determinism
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5916 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 19:36:56 +00:00
chartl 511cd48d7a There is an edge case ( |Set1| = 5, |Set2| = 4) where the exact p-value exceeds the range of the normal distribution we want to invert. For the edge cases, this happens exactly at the mean, and so this can be safely replaced with a z value of 0.0
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5915 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 17:30:09 +00:00
carneiro dcd13060e1 created wiki page for Print Reads and changed help to match wiki.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5914 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 16:26:32 +00:00
droazen 8f6af299d8 Remove what is hopefully the last of the evil core -> playground dependencies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5913 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 16:22:35 +00:00
carneiro 8f3e8f934d added a quick option to print the first n reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5912 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 16:16:50 +00:00
chartl a79967d9af After extensive testing of MannWhitneyU:
- Verified that exact calculations do agree with R's dwilcox()
 - Verified that exact calculations do not agree with R's wilcox.test
   + This is because R does a correction, and calculates CDFs rather than PDFs (e.g. sums over dwilcox() values)
 - Can now specify MWU to calculate cumulative exact tests, rather than point probabilities
 - Z-scores are now calculated properly for exact tests
   + Previously, z-values calculated by inverting normal CDF from U-statistic PDF
   + Now both inversions are done, with a smart heuristic (biased variance) to make the point-calculated Z-value more accurate
   + Additional tests



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5911 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 15:51:27 +00:00
rpoplin 2b5683909e Updated VQSR integration tests because of the new Omni file. Fixed overflow condition in FisherStrand when the depth is too high.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5910 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 14:20:37 +00:00
hanna 6cc84c3ce2 Make the set of VariantContextAdaptors dynamic so that Andrey's MafFeature can
continue to exist and live in playground (and thus outside of the normal release
 / git release branch).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5909 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-01 02:54:55 +00:00
ebanks 44cb7e4980 Renaming to make grepping through the output less confusing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5908 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-31 19:54:44 +00:00
ebanks b75583a90b Adding debug statements for David to aid in testing the non-determinism problem. I wouldn't recommend running with --stats temporarily (or ever in fact, which is why it's @Hidden).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5907 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-31 19:53:59 +00:00
droazen c50d290133 Removing printf's used for debugging -- they have served their purpose.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5906 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-31 14:06:37 +00:00
delangel 0aef5c0074 Totaly experimental, possibly useless annotation that logs # of MQ0 reads / total depth, TBD if VQSR can use it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5905 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-30 14:05:39 +00:00
kshakir 8d294dd6e6 For the snps to create combine snps and filtered indels, now using a VCF with just snps instead of vcf with snps plus unfiltered indels.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5904 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-29 04:17:18 +00:00
kiran b4d379584c Commented out the generation of the GATKReport that I was using for debugging.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5903 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 22:15:09 +00:00
kiran 2a9c75c5ba Throw an exception if the programmer tries to access a column that doesn't exist.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5902 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 22:08:48 +00:00
kiran f3b38c0d3e Fixed a bug in my math where I assumed the genotype likelihoods were normalized to 1.0 when they in fact are not. *Now* genotypes get altered when a different genotype configuration leads to a more consistent answer with regards to inheritance constraints. There's the question of what to do when two configurations are almost equally likely - I should probably filter those events out. But currently there is no threshold on the transmission probability.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5901 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 22:08:05 +00:00
carneiro 5974675b43 Two intermediate commits, to work over the weekend.
ReplicationValidationWalker: Just the skeleton of what will be the implementation of the replication/validation model.
dataProcessingV2: Committing an UNTESTED implementation of BWA alignment. I am running tests on it over the weekend.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5900 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 22:03:08 +00:00
carneiro 69d9b5989f documenting this walker as it may be useful to others in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5899 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 21:58:51 +00:00
carneiro 2524216d4b Added the R script for VQSR
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5898 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 21:56:56 +00:00
kshakir 77cae39c8e Step towards tribble precompiled jar, support in build.xml for source with fallback to the checked in jar.
Current tribble-129M.jar in SVN does not work with current version of GATK code.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5897 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 21:04:27 +00:00
droazen a50c40ed05 Temporary commit to aid in investigation of recent intermittent
IndelRealignerIntegrationTest failures -- yes, it's the classic printf()
debugging technique. Will revert in a day or two once I get the data I need :)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5896 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 20:01:57 +00:00
carneiro 260301016a cleaned up the scripts and created an interval library to facilitate future reuse.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5895 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 19:35:36 +00:00
carneiro 0048f1f6d3 Lots of interval_list file utility scripts
1. findGenes : Parses the Genetic Association Database (from NIH) into an annotated 'genes of interest' interval_list file.
 
2. sortGenesByCoverage : combines the interval_list of the genes of interest with the report from GATK's DepthOfCoverage, generating an annotated interval_list with total and average coverages on each gene.
  
3. hasTheseTargets : Give it a list of targets (example: exon targets) and any interval_list (example: genes of interest) and it will generate an annotated interval_list of all the exons that are contained in the list of genes. 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5894 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 18:31:07 +00:00
rpoplin 2227f49220 misc cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5893 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 16:49:20 +00:00
rpoplin 9e834391fe We now skip over all covering RODs in the BQSR as intended instead of just those which can be converted into a VariantContext. All the integration tests change because of subtleties in how certain dbsnp rod records are being converted into VCs. Added integration test which uses a bed file as the list of known polymorphic sites.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5892 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 16:32:17 +00:00
depristo 8ed82e5a08 The previous version of the UG was always creating BAQ'd pileups for the underlying site QUAL calculation. This resulted in some slowdown in the code. But as far as I can tell, the code actually didn't apply the BAQ'd base quality anywhere when the BAQ field wasn't in the read, so this just saves us 20% of the runtime when BAQ isn't enabled from heading into the BAQ subsystem when we don't actually want to get the BAQ'd base qualities.
Fixed minor problem with WalkerTest for "" (for parameterization) md5s.
Added an explicit integrationtest for BAQ NONE
Now only creates the BAQ'd pileup, if the useBAQPileup parameter is provide in initializeAlternateAllele.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5891 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 14:00:52 +00:00
depristo 136c8c7900 ClipReads now supports HARDCLIP_BASES, though in fact this turned out to be not necessary for my desired tests. In the process of developing the HARDCLIP mode, I added some proper ReadUtils unit tests, which would ideally be expanded to include other ReadUtil functions, as added
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5890 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 11:42:22 +00:00
depristo 549172af10 removing dependance on jobQueue == gsa
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5889 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 10:12:09 +00:00
hanna a77ca2d36a Incorporating Guillermo's patch to eliminate compile-time dependency of (core) UG indel model
on oneoffs.  Thanks Guillermo!  We'll polish the patch when you free up a bit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5888 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 02:22:19 +00:00
kshakir fd21c5d100 Minor update so the debug messages don't show temp files as chromosome 208799060637697164972
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5887 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 22:56:33 +00:00
fromer b4af28c7df Handle case where -L argument (intervals) not given
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5886 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 20:24:56 +00:00
corin a561d3adc7 Utitlity function to for plotting post-run qc metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5885 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 20:14:25 +00:00
corin fccd5517a0 Generates post run QC plots with by sample metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5884 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 20:13:59 +00:00
corin 59495c7f03 Updated tearsheet function utility
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5883 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 19:58:57 +00:00
corin 8a57c52005 Produces a more throurough tearsheet with detailed metrics and information
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5882 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 19:57:49 +00:00
corin 1e2892a35d Preliminary QC script in R, which checks coverage, fingerprints, library duplication, total SNPs, dbSNP%, and availability of sample data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5881 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 19:57:00 +00:00
delangel 6ecbfa9013 OK, this time REALLY fix cut and paste error
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5880 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 19:47:12 +00:00
kshakir dab269160b Added cofoja to the Queue package. Although BCEL doesn't think they're needed the scala compiler respectfully disagrees.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5879 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 17:42:34 +00:00
delangel efe6602827 Fix copy-paste error from previous commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5878 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 16:02:08 +00:00
delangel 7a43673599 Bug fix: also enclose fetching FS or HRun in a try/catch block or else code will blow up if an annotation is absent (e.g. when there no evidence for a variant in a vc)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5877 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 15:00:36 +00:00
delangel f7298f4a7f First of many baby steps to redo way in which we trigger events for indel calling and to eliminate extended events: get rid of SpanningDeletions annotation for indels. It's completely useless, and even more so once we no longer trigger at extended events (because we'll trigger by definition a base before a deletion starts, so deletions present in the current pileup are not informative).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5876 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 00:49:23 +00:00