Commit Graph

11656 Commits (a3b98daf1a4cff4d57f93ce1002e2bf6f99ed700)

Author SHA1 Message Date
Chris Hartl a3b98daf1a Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable 2013-01-23 14:49:34 -05:00
Chris Hartl 7fcfa4668c Since GenotypeConcordance is now a standalone walker, remove the old GenotypeConcordance evaluation module and the associated integration tests. 2013-01-23 14:47:23 -05:00
Mauricio Carneiro fc54a5da55 Adding the new bash script
GSATDG-9
2013-01-23 12:14:34 -07:00
Mauricio Carneiro 6588b4bacd tcsh -> bash
David is convinced that the error is because i'm using tcsh instead of bash. Let's see if he's right :-)

GSATDG-9
2013-01-23 12:10:34 -07:00
Mauricio Carneiro 8e8993da27 oops... forgot to change sys.argv to filename
GSATDG-9
2013-01-23 12:01:06 -07:00
Mauricio Carneiro 820bec5572 Dropping xargs
- continuing the effort to reduce blob size

GSATDG-9
2013-01-23 11:54:20 -07:00
Mark DePristo ee8039bf25 Fix trivial call in unit test 2013-01-23 13:51:58 -05:00
Mark DePristo 09edc6baeb TraverseActiveRegions now writes out very nice active region and activity profile IGV formatted files 2013-01-23 13:46:01 -05:00
Mark DePristo 8026199e4c Updating md5s for CountReadsInActiveRegions and HaplotypeCaller to reflect new activity profile mechanics
-- In this process I discovered a few missed sites in the old code.  The new approach actually produces better HC results than the previous version.
2013-01-23 13:46:01 -05:00
Mark DePristo 8e8126506b Renaming IncrementalActivityProfile to ActivityProfile
-- Also adding a work in progress functionality to make it easy to visualize activity profiles and active regions in IGV
2013-01-23 13:46:01 -05:00
Mark DePristo e917f56df8 Remove old ActivityProfile and old BandPassActivityProfile 2013-01-23 13:46:01 -05:00
Mark DePristo 7fd27a5167 Add band pass filtering activity profile
-- Based on the new incremental activity profile
-- Unit Tested!  Fixed a few bugs with the old band pass filter
-- Expand IncrementalActivityProfileUnitTest to test the band pass filter as well for basic properties
-- Add new UnitTest for BandPassIncrementalActivityProfile
-- Added normalizeFromRealSpace to MathUtils
-- Cleanup unused code in new activity profiles
2013-01-23 13:46:01 -05:00
Mark DePristo eb60235dcd Working version of incremental active region traversals
-- The incremental version now processes active regions as soon as they are ready to be processed, instead of waiting until the end of the shard as in the previous version.  This means that ART walkers will now take much less memory than previously.  On chr20 of NA12878 the majority of regions are processed with as few as 500 reads in memory.  Over the whole chr20 only 5K reads were ever held in ART at one time.
-- Fixed bug in the way active regions worked with shard boundaries.  The new implementation no longer see shard boundaries in any meaningful way, and that uncovered a problem that active regions were always being closed across shard boundaries.  This behavior was actually encoded in the unit tests, so those needed to be updated as well.
-- Changed the way that preset regions work in ART.  The new contract ensures that you get exactly the regions you requested.  the isActive function is still called, but its result has no impact on the regions.  With this functionality is should be possible to use the HC as a generic assembly by forcing it to operate over very large regions
-- Added a few misc. useful functions to IncrementalActivityProfile
2013-01-23 13:46:00 -05:00
Mark DePristo ce160931d5 Optimize creation of reads in ArtificialBAMBuilder
-- Now caches the reads so subsequent calls to makeReads() don't reallocate the reads from scratch each time
2013-01-23 13:46:00 -05:00
Mark DePristo e050f649fd IncrementalActivityProfile, complete with extensive unit tests
-- This is an activity profile compatible with fetching its implied active regions incrementally, as activity profile states are added
2013-01-23 13:45:21 -05:00
Mark DePristo 8d9b0f1bd5 Restructure ActivityProfiler into root class ActivityProfile and derived class BandPassActivityProfile
-- Required before I jump in an redo the entire activity profile so it's can be run imcrementally
-- This restructuring makes the differences between the two functionalities clearer, as almost all of the functionality is in the base class. The only functionality provided by the BandPassActivityProfile is isolated to a finalizeProfile function overloaded from the base class.
-- Renamed ActivityProfileResult to ActivityProfileState, as this is a clearer indication of its actual functionality.  Almost all of the misc. walker changes are due to this name update
-- Code cleanup and docs for TraverseActiveRegions
-- Expanded unit tests for ActivityProfile and ActivityProfileState
2013-01-23 13:45:21 -05:00
Mark DePristo 42b807a5fe Unit tests for ActivityProfileResult 2013-01-23 13:45:20 -05:00
Mauricio Carneiro 5c16f57690 Reduce the size of the shell blob
- checkAllLicenses was concatenating all tests in one big && statement which is good for coding style, but bad for shell blog size limitation.

GSATDG-9
2013-01-23 11:41:18 -07:00
Eric Banks 002c0085a7 More reviews of false positive sites incorrectly annotated as polymorphic (mostly by HM3).
(for some reason I can't get reviews to work in IGV)
2013-01-23 13:21:30 -05:00
Mauricio Carneiro c5e1bb678b Refrain from pushing symlinks into the repo... not all filesystems treat it correctly 2013-01-22 15:18:19 -07:00
Chris Hartl c500e1d8ac Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable 2013-01-22 15:31:30 -05:00
Chris Hartl d33c755aea Adding docs. 2013-01-22 15:29:33 -05:00
Chris Hartl 7060e01a8e Fix for broken unit test plus some minor changes to comments. Unit tests were broken by my pulling the site status utility function into the enum. Thankfully the unit tests caught my silly duplication of a line. 2013-01-22 15:14:41 -05:00
Eric Banks 3f6cb609c9 Bug fix for AssessNA12878: don't NPE out when using -typesToInclude 2013-01-22 10:42:52 -05:00
Mauricio Carneiro e939e0d9b3 Small improvement to the license update scripts
- launch it once per file type, not license type (was unnecessary).
   - renamed ParseLicense to UpdateLicense for clarity

GSATDG-5
2013-01-21 16:24:33 -05:00
Mauricio Carneiro 35e3939dca Adding bamboo script to check all licenses
- CheckLicense.py script checks license of sources against a provided license file
   - checkAllLicenses.csh script runs CheckLicense for all files in the repo with the appropriate license for each

GSATDG-9
2013-01-21 16:09:38 -05:00
Mauricio Carneiro 35f8dc7426 Accept spaces before comments or package line
this allows us to be a bit more lienent before erroring out in the license script. Feature suggested by Yossi.

GSATDG-5
2013-01-18 16:51:01 -05:00
Mauricio Carneiro 7b8b064165 Last manual license update (hopefully)
if everyone updates their git hook accordingly, this will be the last time I have to manually run the script.

GSATDG-5
2013-01-18 16:13:07 -05:00
Mauricio Carneiro 02d1b87326 Better error handling for the license scripts
- Your commit will now fail gracefully with an error message if you mess up the license system
   - Your file will be preserved (unmodified) if you fail the commit process
   - Error message should be indicative of the error you need to fix (usually missing package information)

Set your pre-commit hook as a symlink to be automatically updated by new pushes with :

	ln -s private/shell/pre-commit .git/hooks/

GSATDG-18 #resolve
2013-01-18 16:13:07 -05:00
Ami Levy-Moonshine 0fb7b73107 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2013-01-18 15:03:42 -05:00
Ami Levy-Moonshine 826c29827b change the default VCFs gatherer of the GATK (not just the UG) 2013-01-18 15:03:12 -05:00
Mauricio Carneiro 63e1a377cc Automatic license information git-hook
This hook will automatically add / fix the license information in all files you commit to the repo.
To activate it, copy it to your hooks directory :

	cp private/shell/pre-commit .git/hook/

Now everytime you commit, you will have all your java and scala files automatically updated.

GSATDG-5 GSATDG-7 GSATDG-8 #resolve
2013-01-18 12:47:32 -05:00
Eric Banks cac439bc5e Optimized the Allele Biased Downsampling: now it doesn't re-sort the pileup but just removes reads from the original one.
Added a small fix that slightly changed md5s.
2013-01-18 11:17:31 -05:00
Chris Hartl 08d2da9057 Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable 2013-01-18 10:28:45 -05:00
Chris Hartl bf5748a538 Forgot to actually put in the md5. Also with the new change to record pairing and filtering, the multiple-records integration test changed: the indel records (T/TG | T/TGACA) are matched up (rather than left separate) resulting in properly identifying mismatching alleles, rather than HET-UNAVAILABLE and UNAVAILABLE-HET. Very nice. 2013-01-18 10:25:36 -05:00
Mauricio Carneiro f99bd5be6a Small fix to make the script more generic (thanks Yossi)
GSA-710 #resolve
2013-01-18 10:06:50 -05:00
Chris Hartl 91030e9afa Bugfix: records that get paired up during the resolution of multiple-records-per-site were not going into genotype-level filtering. Caught via testing.
Testing for moltenized output, and for genotype-level filtering. This tool is now fully functional. There are three todo items:

1) Docs
2) An additional output table that gives concordance proportions normalized by records in both eval and comp (not just total in eval or total in comp)
3) Code cleanup for table creation (putting a table together the way I do takes -way- too many lines of code)
2013-01-18 09:49:48 -05:00
Eric Banks 39c73a6cf5 1. Ryan and I noticed that the FisherStrand annotation was completely busted for indels with reduced reads; fixed.
2. While making the previous fix and unifying FS for SNPs and indels, I noticed that FS was slightly broken in the general case for indels too; fixed.
3. I also fixed a minor bug in the Allele Biased Downsampling code for reduced reads.
2013-01-18 03:35:48 -05:00
Eric Banks 6a903f2c23 I finally gave up on trying to get the Haplotype/Allele merging to work in the HaplotypeCaller.
I've resigned myself instead to create a mapping from Allele to Haplotype.  It's cheap so not a big deal, but really shouldn't be necessary.
Ryan and I are talking about refactoring for GATK2.5.
2013-01-18 01:21:08 -05:00
Eric Banks 6db3e473af Better error message for bad qual 2013-01-17 10:30:04 -05:00
Eric Banks 953592421b I think we got out of sync with the HC tests as we were clobbering each other's changes. Only differences here are to some RankSumTest values. 2013-01-17 09:19:21 -05:00
Eric Banks ded659232b Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2013-01-16 22:49:56 -05:00
Eric Banks a623cca89a Bug fix for HaplotypeCaller, as reported on the forum: when reduced reads didn't completely overlap a deletion call,
we were incorrectly trying to find the reference position of a base on the read that didn't exist.
Added integration test to cover this case.
2013-01-16 22:47:58 -05:00
Ami Levy-Moonshine fcb3c6dc2a fix small bugs in scala files 2013-01-16 22:42:20 -05:00
Eric Banks dbb69a1e10 Need to use ints for quals in HaplotypeScore instead of bytes because of overflow (they are summed when haplotypes are combined) 2013-01-16 22:33:16 -05:00
Chris Hartl e15d4ad278 Addition of moltenize argument for moltenized tabular output. NRD/NRS not moltenized because there are only two columns. 2013-01-16 18:00:23 -05:00
Mark DePristo 738c24a3b1 Add tests to ensure that all insertion reads appear in the active region traversal 2013-01-16 16:25:36 -05:00
Mark DePristo 3c476a92a2 Add dummy functionality (currently throws an error) to allow HC to include unmapped reads during assembly and calling 2013-01-16 16:25:36 -05:00
Eric Banks 79bc818022 Bug fix for VariantsToVCF: old dbSNP files can have '-' as reference base and those records always need to be padded. 2013-01-16 16:15:58 -05:00
Eric Banks 4cf34ee9da Bug fix to FisherStrand: do not let it output INFINITY. This all needs to be unit tested, but that's coming on the horizon. 2013-01-16 15:35:04 -05:00