Mark DePristo
3d4f0e9dd7
Now supports the case where you have multiple AC values in the info field.
2011-07-07 17:21:15 -04:00
Mark DePristo
0df31f3d78
Simple usability improvements to the DiffEngine and DiffObjectsWalker. Can specify min count for summarized diff objects for display.
2011-07-07 11:16:42 -04:00
Mark DePristo
ccf34f7e45
(1) Added very useful helper class TestDataProvider to BaseTest that making creating data providers for TestNG far easier
...
(2) DiffEngine now officially working with with summaries. Extensive UnitTests all around!
2011-07-06 21:57:22 -04:00
Mark A. DePristo
1f1231f47a
Implementation of key summarizing algorithm and support routines. UnitTests for support routines. Almost ready to test the summarizer on real difference set.
2011-07-05 23:23:49 -04:00
Mark A. DePristo
080875d5da
Refactored DiffNode/DiffElement/DiffValue class structure. DiffElement is now a pair of Name -> Value, where value is either a DiffValue or its subclass DiffNode. Code cleaned up, more tests added. DiffEngine is now working, with tests. DiffObjectWalker can now take two VCFs and itemize the difference between the two files correctly and concisely.
2011-07-05 16:13:39 -04:00
Mark A. DePristo
60b9aa7c59
Intemediate commit. Not working, but last changes are now logged before revisiting the DiffNode DiffElement DiffLeaf hierarchy
2011-07-05 09:10:34 -04:00
Mark A. DePristo
3a8710b7de
Parsing and printing via simple oneLineString representations. X=Y, X=(A=B C=D), for example. Diff algorithm implementation, but no testing. DiffEngineUnitTest implemented, and testing framework nearly ready to actually evalute the correctness of the diff algorithm.
2011-07-04 23:43:49 -04:00
Mark A. DePristo
527fbeaf3c
Extensive unit tests for DiffNodes, Diffelements, and DiffLeafs data structure. The lack of unity in these three data structures is a bit gross, to be honest, but it might may not be a significant factor when I reach implementing the generic diff functions. The problem is that ideally these would look like the scheme structures:
...
(A B (C D E))
which is a nested list containing A and B items and a sublist of C D E. Here there are only two classes: lists and everything else. Right now we have three. DiffNodes, which contain both atomic fields (A B) as well as the subnodes ((C D E)) here. These a specific class for DiffLeaf, which is really just a pair mapping name=value. And DiffElement contains a named item, since all objected in the hierarchy have a name. It's just doesn't feel right to me right now. Ultimately the problem is that you want the objects to be self-describing, so the DiffElement and DiffLeaf are a clean factoring the need for names in both the values and the nodes.
2011-07-04 19:34:15 -04:00
Mark A. DePristo
38740b0ff5
First working version of the DiffNode readers for VCF and BAM files. Unit tests confirm the readers are approximately working. Skeleton of a working DiffObjects walker that will be able to provide detailed information about how exactly two files of the same type differ, so long as the files are supported by the DiffNode structure.
2011-07-04 16:11:42 -04:00
Mark A. DePristo
983670e6ac
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-03 15:57:34 -04:00
Ryan Poplin
06a1ab1820
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-02 18:42:27 -04:00
Ryan Poplin
fb315b5f8c
Merge branch 'incoming'
2011-07-02 18:10:48 -04:00
Ryan Poplin
41d46059e7
fixing bad format statement
2011-07-02 18:09:17 -04:00
Ryan Poplin
3804afeb8a
Merge branch 'incoming'
2011-07-02 17:55:39 -04:00
Ryan Poplin
781c0c33a4
Use the worst X% of calls in addition to the bad training sites list. Don't include the already added calls in the calculation of X%
2011-07-02 17:55:10 -04:00
Ryan Poplin
4f821c081b
Merged bug fix from Stable into Unstable
2011-07-02 17:42:45 -04:00
Ryan Poplin
14eb7873a0
Reorganizing private oneoff qscripts
2011-07-02 17:41:19 -04:00
Ryan Poplin
6b8af6afd8
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-02 17:15:56 -04:00
Ryan Poplin
fdc2ebb321
Adding ability to specify in VQSR a list of bad sites to use when training the negative model. Just add bad=true to the list of rod tags for your bad sites track.
2011-07-02 17:15:13 -04:00
Guillermo del Angel
09af6bbc6c
Ugh - backed out experimental code not for public consumption unintendedly committed
2011-07-02 16:58:57 -04:00
Guillermo del Angel
c6c0dba040
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-02 16:45:34 -04:00
Guillermo del Angel
b66581dc45
More changes on consensus script
2011-07-02 16:45:08 -04:00
Ryan Poplin
4532a84314
Merged bug fix from Stable into Unstable
2011-07-02 10:48:55 -04:00
Ryan Poplin
14375c3ba9
Moving my very first walkers into the archive.
2011-07-02 10:45:12 -04:00
Ryan Poplin
5faf40b79d
Moving AnalyzeAnnotations into the archive because it has outlived its usefulness.
2011-07-02 10:39:53 -04:00
Ryan Poplin
43959e6780
Moving old R scripts into the archive
2011-07-02 10:33:27 -04:00
Ryan Poplin
17ff5bb094
Variant records coming out of the VQSR are now annotated with which input annotation was most divergent from the Gaussian mixture model. This gives a general sense for why each variant was removed from the callset.
2011-07-02 09:55:35 -04:00
Guillermo del Angel
635dc5de4b
New hyper-parallel structure for indel consensus: each 3 MB chunk is divided into 100 subchunks so I can fit in hour queue. Got rid on indel realignment and snp parts, use BTI to compute only at input sites.
2011-07-01 20:51:01 -04:00
Khalid Shakir
c65e52f88a
Merged bug fix from Stable into Unstable
2011-07-01 20:50:56 -04:00
Khalid Shakir
b6bc64a0c8
Cleanup of the utils.broad package.
...
Using Picard IoUtils on sample names.
2011-07-01 20:47:03 -04:00
Eric Banks
0c9105ca22
Minor fix of description
2011-07-01 18:07:35 -04:00
Eric Banks
358cf98b67
Merged bug fix from Stable into Unstable
2011-07-01 17:27:07 -04:00
Eric Banks
444eae316c
Moving these supported perl scripts to public
2011-07-01 17:26:25 -04:00
Guillermo del Angel
95abb2e14a
Merged bug fix from Stable into Unstable
2011-07-01 17:12:21 -04:00
Guillermo del Angel
cfdece6dd8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-07-01 17:11:32 -04:00
Guillermo del Angel
1c1cd373d8
Fix to make liftOver.pl work with new file structure
2011-07-01 17:07:22 -04:00
David Roazen
a025f85c08
Merged bug fix from Stable into Unstable
2011-07-01 16:44:33 -04:00
David Roazen
546e7777fa
Re-fixing paths in pipeline tests after example qscripts got moved.
2011-07-01 16:39:10 -04:00
David Roazen
b19e22aed9
Merged bug fix from Stable into Unstable
2011-07-01 16:20:25 -04:00
David Roazen
e9030a7bfd
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-07-01 16:19:35 -04:00
Mauricio Carneiro
d631f645c4
Merged bug fix from Stable into Unstable
2011-07-01 16:16:31 -04:00
Mauricio Carneiro
b0fb63e20a
moving the example scala scripts to the qscripts package.
2011-07-01 16:14:59 -04:00
Mauricio Carneiro
04971aecc9
Merged bug fix from Stable into Unstable
2011-07-01 16:04:50 -04:00
David Roazen
d647ea4fdc
Long-delayed change to CachingIndexedFastaSequenceFile. Made the cache
...
non-static to avoid problems when multiple references are used within the same
thread (eg., during integration tests). This should kill the intermittent
IndelRealignerIntegrationTest failures.
2011-07-01 16:04:30 -04:00
Mauricio Carneiro
d19351f71a
Added capability of running multiple bam files in the same directory.
2011-07-01 16:02:28 -04:00
David Roazen
350a76d5dd
Merged bug fix from Stable into Unstable
2011-07-01 13:53:46 -04:00
David Roazen
11d4af0e75
Path-related fixes to the private queue pipeline tests.
2011-07-01 13:41:34 -04:00
David Roazen
9644f104c4
Fixes to the queue pipeline tests to account for the new directory structure.
2011-07-01 13:13:24 -04:00
Matt Hanna
7159b14e76
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-01 10:53:49 -04:00
Matt Hanna
bf29ca21cd
Some bugfixes for yesterday's sparse data population enhancements.
2011-07-01 10:53:07 -04:00