Commit Graph

6476 Commits (a2d90a35903d28ea4e7ab7017e712a008bc12712)

Author SHA1 Message Date
Guillermo del Angel a2d90a3590 Bug fix: reverted logic so that default behavior skips over sample lookup 2011-07-20 10:23:10 -04:00
Guillermo del Angel e8409c80fa Further protection vs null pointers in PrintReadsWalker 2011-07-19 21:59:24 -04:00
Guillermo del Angel fb2d475c22 Bug fix to prevent null pointer 2011-07-19 20:13:56 -04:00
Guillermo del Angel 6181d1e4cb Fixed integration test for VariantsToTable: now the * in REF column is not output 2011-07-19 14:42:11 -04:00
Guillermo del Angel e6d306458c Merge bug fixes 2011-07-19 14:36:20 -04:00
Guillermo del Angel 989dd17f95 a) Add ability in PrintReads to specify a sample file to easily subset samples, useful for IGV visualization, b) VariantsToTable is more R-friendly with Indels when printing ref/alt columns, c) Changes to SelectVariants ability to speficy a mask to randomly sample from a given AF distribution 2011-07-19 14:29:07 -04:00
Matt Hanna 005adf377f Derive MEDIAN_INSERT_SIZE plot from base plot with additional faceting. 2011-07-19 10:48:45 -04:00
Matt Hanna 9a1394d7e7 Clean up MEDIAN_INSERT_SIZE plot for consistency with other plots. 2011-07-19 10:34:50 -04:00
Matt Hanna 5d3112c665 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-19 09:32:01 -04:00
Matt Hanna 0cec2c6759 When sorting samples by date, only use filtered samples to avoid discontinuities
in the plot.  Add brief documentation for running the R script.
2011-07-19 09:28:51 -04:00
Mauricio Carneiro 9ad5c7dfa4 Resolving simple conflicts in the data processing pipeline.
Conflicts:
	public/scala/qscript/org/broadinstitute/sting/queue/qscripts/DataProcessingPipeline.scala
2011-07-19 08:05:11 -04:00
Mauricio Carneiro 7688bda1a6 better progress report for the DPP 2011-07-18 23:39:47 -04:00
Mauricio Carneiro 2b465ab43b * added optional 'no validation' for the Data Processing pipeline.
* some simplifications on the picard classes
2011-07-18 23:30:31 -04:00
Mauricio Carneiro 4cf7a2af23 Removed broad specific default paths so people from outside the broad can use it. 2011-07-18 23:25:21 -04:00
Khalid Shakir 9b446020f9 Using picard implementations for accessing aggregation directories.
Added more utilities to PicardPrivate.
Revved picard.
2011-07-18 21:49:03 -04:00
Matt Hanna 0ef37979cc Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 21:30:51 -04:00
Matt Hanna d5d107856c Subselect based on bait set. 2011-07-18 18:42:21 -04:00
Mauricio Carneiro 1837da37f6 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 17:59:26 -04:00
Mauricio Carneiro 916c0c9489 some quick & dirty debug info for the replication validation walker. 2011-07-18 17:57:12 -04:00
Matt Hanna 044f5faa4d Support for numeric columns. 2011-07-18 17:44:49 -04:00
Matt Hanna 9729d61e2d Use geom_text() instead of geom_point() when outputting data for new project
only.
2011-07-18 17:29:00 -04:00
Mauricio Carneiro f1e3c3356b Merge branch 'rbam' 2011-07-18 17:26:07 -04:00
Mauricio Carneiro c618a5b54c commented out wrong MD5s 2011-07-18 17:25:45 -04:00
Mauricio Carneiro a9f956c80c Fixed several bugs in the pooled caller. Creating a good dataset to test its accuracy now. 2011-07-18 16:04:11 -04:00
Mark DePristo 4e78f0b064 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 15:45:23 -04:00
Mark DePristo 8f0badc52b Updating md5s, as the diffobjects walker now emits the summary in reverse order. 2011-07-18 15:44:21 -04:00
Mark DePristo c05451047c Support for multiple records at the same site. The first record gets chr:start, and subsequent records get chr:start_2, chr:start_3, etc. 2011-07-18 15:43:52 -04:00
Mark DePristo 782a05e9b5 Support for sorting the diff output in reverse order. 2011-07-18 15:43:01 -04:00
Mark DePristo 45702d3084 Now supports a mode where the primary key isn't sorted. In this case the records are displayed in the order in which they are added to to the table. 2011-07-18 15:40:15 -04:00
Matt Hanna 15b44ac2c3 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 14:56:41 -04:00
Matt Hanna e5e7523f8b Modify to support either bam list format files or tsv formatted files. The
latter provide a major advantage when dealing with samples with spaces in the
names.
2011-07-18 14:56:00 -04:00
Matt Hanna adce37774a Add functionality for tsv output. 2011-07-18 14:12:01 -04:00
Eric Banks 6d5e87da10 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 13:59:10 -04:00
Eric Banks 83ba2c066a Making it deterministic 2011-07-18 13:59:02 -04:00
Eric Banks 92fa410450 Check that it's a valid bam file before parsing or bad things can happen 2011-07-18 13:43:34 -04:00
Eric Banks 80b5c5261a CombineVariants no longer combines records of different types. So now when combining SNP and indel callsets, overlapping calls get their own records. Useful for Khalid in the pipeline. For those interested, it turns out the previous behavior was doing the wrong thing occasionally (and this was even captured in the integration tests). 2011-07-18 13:42:45 -04:00
Menachem Fromer 4adead3099 Fixed import conflict 2011-07-18 13:23:20 -04:00
Menachem Fromer d8ba4ab835 Only maintain an unbroken haplotype chain if the current is phased relative to previous (by RBP), or both previous and current are parentally phased 2011-07-18 13:14:39 -04:00
Eric Banks bc8b5da698 Added docs while I was reading through the code to understand it 2011-07-18 12:25:54 -04:00
Mauricio Carneiro 5493a4dd99 Added annotations to filter out :
* unmapped reads
 * failed vendor quality reads
 * duplicate reads
 * not primary alignment reads
2011-07-18 12:06:08 -04:00
Matt Hanna d8517a000a Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 11:07:18 -04:00
Matt Hanna f15357c2e1 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 10:52:31 -04:00
Matt Hanna 95c776bf59 Updated documentation. 2011-07-18 10:52:06 -04:00
Matt Hanna cb9bef6847 Updated documentation. 2011-07-18 10:51:22 -04:00
Mark DePristo 51b0dd01c3 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 10:47:29 -04:00
Mark DePristo 449bf1b539 Testdata for diffObjects.
PipelineTest updated to point to MD5DB.java
2011-07-18 10:47:03 -04:00
Mark DePristo d6e2e89f99 Walker test system refactoring. All MD5DB related functions are now in MD5DB.java.
System has the concept of a local and a global MD5 db.  The local one is like it operated previously.  The global one lives in /humgen/gsa-hpprojects/GATK/data/integrationtests.  If the system can find this directory then MD5s will also be read / written to this location.  This means that gsabamboo will print differences as appropriate.  And all users will in effect have access to a complete history of MD5 file results.
A few minor code reshuffles changed VariantRecalibration and VCFHeader test files.
2011-07-18 10:46:01 -04:00
Mark DePristo 6f26c07b85 Removed the SpecificDifference class. Now Difference classes always have the option to remember specific master and test values. This means that all summarized differences carry with them specific examples of their differences. Consequently, now even summarized differences give at least one example of the specific difference, even when the count of the difference is > 1. Unit tests updated. Added DiffObjects integrationtest. VCFDiffableReader now specifically reads the first line of the VCF file to capture the version number. 2011-07-18 10:42:35 -04:00
Matt Hanna 1f538d2add Place the preQC database in /humgen/gsa-scr1/GATK_Data.
Rework the way data outside the center 95% is trimmed out.
Cleanup some documentation.
2011-07-18 10:33:57 -04:00
Mark DePristo 837a91b85d No more ls to stdout unless verbose is true [manageGATKS3Logs.py]
Fully qualified paths now work properly.  Moved script into git [downloadGATKReportsFromS3.csh]
Correct path to files in runGATKReport.csh
2011-07-18 08:31:08 -04:00