aaron
653f70efa2
added methods to validate an interval before you try to make a GenomeLoc: boolean validGenomeLoc().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2846 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 20:35:35 +00:00
kshakir
fc810a1800
Updated VCF Reader to parse VCFs according to the VCFv3.3 spec. Column headers are tab separated since sample names might have spaces.
...
Updated test files in /humgen/gsa-scr1/GATK_Data/Validation_Data/*.vcf to remove spaces except for when they are supposed to be in the sample name.
Added @Test before VCFReaderTest.testHeaderNoRecords()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2809 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 22:55:59 +00:00
depristo
956b570c8e
V5 improvements to VariantContext. Now fully supports genotypes. Filtering enabled. Significant tests throughout system. Support for rebuilding variant contexts from subsets of genotypes. Some code cleanup around repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2721 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 18:37:17 +00:00
aaron
db9570ae29
Looks bigger than it is:
...
* Moved GATKArgumentCollection into gatk.arguments folder to clean up the main folder, also added some associated argument classes (most of the changes).
* Added code the argument parsing system for default enums, we needed this so we could preserve the current unsafe flag, and at the same time allow finer grained control of unsafe operations. You can now specify:
"-U" (for all unsafe operations), "-U ALLOW_UNINDEXED_BAM" (only allow unindexed BAMs), "-U NO_READ_ORDER_VERIFICATION", etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2586 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 00:14:35 +00:00
asivache
cff8b705c0
Oh, and the test would not work anymore...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2585 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:47:09 +00:00
ebanks
040fdfee61
Cleaned up the interface to VCFRecord. It's now possible (and easy) to create records and then write them with a VCFWriter.
...
I've updated HapMap2VCF to use the new interface; Chris agreed to take care of Sequenom2VCF.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2558 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 21:42:12 +00:00
ebanks
b643a513bb
Minor interface change for VCFGenotypeRecord.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2537 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 16:48:09 +00:00
ebanks
6c739e30e0
1. Removing an old version of the Genotype interface which is no longer being used. Needed to do this now so that the naming conflicts would cease.
...
2. Adding a preliminary version of the new Genotype/Allele interface (putting it into refdata/ as the VariantContext really only applies to rods) with updates to VariantContext. This is by no means complete - further updates coming tomorrow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2533 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 05:51:10 +00:00
aaron
a34c2442c0
moved hard-coded file paths to the oneKGLocation, validationDataLocation, and seqLocation variables setup in the BaseTest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2460 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 07:40:48 +00:00
hanna
4617052b3c
For Alec, and others at the Broad who want to run our unit/integration tests off of gsa1/gsa2: put a ceiling on the amount of memory that integration tests can use. Reduce the memory footprint of the fasta reader test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2457 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:42:46 +00:00
aaron
b134e0052f
added changes to the code to allow different types of interval merging,
...
1: all overlapping and abutting intervals merged (ALL),
2: just overlapping, not abutting intervals (OVERLAPPING_ONLY),
3: no merging (NONE). This option is not currently allowed, it will throw an exception. Once we're more certain that unmerged lists are going to work in all cases in the GATK, we'll enable that.
The command line option is --interval_merging or -im
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2437 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:59:14 +00:00
ebanks
cf303810d3
VCF reader now creates the correct type of header line for each header type
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2423 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 20:39:06 +00:00
aaron
7e0f69dab5
Changed the GLF record to store it's contig name and position in each record instead of in the Reader. Integration tests all stay the same.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2410 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:54:56 +00:00
ebanks
4ea31fd949
Pushed header initialization out of the GenotypeWriter constructors and into a writeHeader method, in preparation for parallelization.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2406 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 19:16:41 +00:00
ebanks
94f5edb68a
1. Fixed VCFGenotypeRecord bug (it needs to emit fields in the order specified by the GenotypeFormatString)
...
2. isNoCall() added to Genotype interface so that we can distinguish between ref and no calls (all we had before was isVariant())
3. Added Hardy-Weinberg annotation; still experimental - not working yet so don't use it.
4. Move 'output type' argument out of the UnifiedArgumentCollection and into the UnifiedGenotyper, in preparation for parallelization.
5. Improved some of the UG integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2398 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 04:14:14 +00:00
ebanks
97618663ef
Refactored and generalized the VCF header info code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2346 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 21:02:45 +00:00
aaron
09811b9f34
Now that we always output the VCF header, make sure that we correctly handle the situation where there are no records in the file. Added unit tests as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2333 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 19:51:05 +00:00
ebanks
ee691b8899
Added a whole bunch of unit tests for VCF reading.
...
We could still use more, but this is a good start.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2303 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 03:31:23 +00:00
ebanks
e8822a3fb4
Stage 3 of Variation refactoring:
...
We are now VCF3.3 compliant.
(Only a few more stages left. Sigh.)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2287 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 21:43:28 +00:00
hanna
9e2f831206
A bit of cleanup in preparation for Picard patch.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2286 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 16:09:04 +00:00
ebanks
b6f8e33f4c
Stage 2 of Variation refactoring:
...
VCFRecord now implements Variation, VCFGenotypeRecord now implements Genotype.
Because of this change, RodVCF is now just a wrapper around the VCFRecord and does nothing else. Also, one can call toVariation on the VCFGenotypeRecord and it returns the VCFRecord.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2271 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 06:48:03 +00:00
aaron
b3bdcd0e60
make sure we close the error log stream in CommandLineProgram if it's opened; unit tests and clean-up for BasicVariation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2241 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 06:59:27 +00:00
aaron
8fbc0c8473
fix for bug GSA-234: fasta index files couldn't handle anything but letters, numbers, or spaces in the contig name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2147 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 19:19:47 +00:00
rpoplin
7f947f6b60
Updated recalibrator integration tests to use all three platforms as well as a bam with multi-platform reads intermingled. CountCovariates v2.0.1: Once again uses a read filter to filter out zero mapping quality reads. Added --sorted_output option to output the table recalibration file in sorted order
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2122 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 19:51:36 +00:00
rpoplin
1d46de6d34
The old recalibrator is replaced with the refactored recalibrator. Added a version message to the logger output. These walkers start at version 2.0.0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2117 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 14:58:33 +00:00
aaron
6ba1f3321d
Fixed the sample mix-up bug Kiran discovered, and added a unit test in the VCF reader class (Thanks for the good example files Kiran). Also renamed the toStringRepresentation function to toStringEncoding, and added a matching method in VCFGenotypeRecord.
...
Updated the integration tests that were failing to due to different ordering of genotyping entries in VCF, I'll check in the VCF diff tool I wrote when I get a cycle or two.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2092 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 18:17:47 +00:00
aaron
aece7fa4c7
a convenience method to join a map into a single string, which I need for some VCF work. Added some documentation to the join method as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2057 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 16:50:01 +00:00
ebanks
5cdbdd9e5b
now that the design is stable, pull the setReference and setLocation methods back out of Genotype and stick them into constructors of implementing classes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1931 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 13:27:37 +00:00
ebanks
3091443dc7
Sweeping changes to the genotype output system, as per several discussions with Matt & Aaron.
...
Some things still need to be changed, but it will entail some more design decisions first (which means I get to bug M&A again tomorrow!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1930 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 03:46:41 +00:00
depristo
449a6ba75a
Deleting lots of code as part of my cleanup. More classes tagged for removal. Many more walkers have their days numbered.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1885 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 12:23:36 +00:00
aaron
cfa86d52c2
ensure that in the indel case we don't allow identification as both an insertion and deletion at the same location in the VCF ROD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1875 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:21:00 +00:00
aaron
8aacc43203
VCF output now emits no calls as ./.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1863 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 18:51:31 +00:00
aaron
96972c3a5c
a fix for a bug Eric found: if your first call contains fewer samples than calls at other loci, your VCFHeader got setup incorrectly.
...
Also moved a buch of Lists over to Sets for consistancy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1859 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 04:57:50 +00:00
aaron
a69ea9b57c
Cleaning up the VCF code, adding lots of tests for a variety of edge cases. Two issues are still outstanding: updating the no call string with the standard 1000g decided on today, and fixing Eric's issue where not all the VCF sample names are present initially.
...
also: their, I hope your happy Eric, from now on I'll try not to flout my awesomest grammer in the future accept when I need to illicit a strong response :-)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1858 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 04:11:34 +00:00
hanna
32d55eb2ff
Fix issue Eric was seeing with java.lang.Error in unmap0.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1804 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 17:46:56 +00:00
hanna
f4b6afb42c
JVM issue id 5092131 ( http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5092131 )
...
was causing OOM issues with the new mmapping fasta file reader during large jobs.
Temporarily reverting the reader until a workaround can be found.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1801 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 04:45:46 +00:00
hanna
fcb6a992c8
Switched IndexedFastaSequenceFile over to use memory mapping to load data rather than
...
the loop-with-small block size. Performance improvements in loading refs are extreme;
segments can be loaded in <1ms. chr1 in its entirety can be loaded in 1.5sec (down
from 30sec).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1781 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 00:07:15 +00:00
aaron
3aec76136f
Removing the AllelicVariant interface, which is replaced by the Variation interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1770 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 17:44:24 +00:00
aaron
66fc8ea444
GSA-182: Adding support for BED interval files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1767 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 02:45:31 +00:00
aaron
7fc4472e6d
A big fix for MergingSamRecordIterator, where we weren't correctly handling the comparisons of SAMRecords correctly (we weren't applying the new reference index first, so sometimes the MT contig would be ID 23, sometimes 24 in different records).
...
Also a fix to the GLF tests, and a correction to PrintReadsWalker to remove the close() on the output source, the source handles that itself (and you get a double close).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1758 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 19:35:35 +00:00
aaron
e885cc4b21
changes for corrected GLF likelihood output, along with better tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1754 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 20:45:05 +00:00
hanna
70e1aef550
Better integrate the @ArgumentCollection into the command-line argument parser. Walkers can now specify their own @ArgumentCollections. Also cleaned up a bit of the CommandLineProgram template method pattern to minimize duplicate code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1746 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 22:23:19 +00:00
andrewk
5662a88ee1
Cosmetic change to list sampling functions: the typical usage of n and k were reversed. No change in functionality of the classes has been made and unit tests still pass.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1736 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 18:12:32 +00:00
aaron
eeb14ec717
a couple of light changes to GenomeLocSortedSet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1708 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:38:53 +00:00
aaron
83a9eebcc4
fixed a bug I checked in that Eric found, for intervals with no start or stop coordinate. Now I owe Eric a cookie, and Milk Street is so far away. Damn.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1679 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 04:34:18 +00:00
aaron
7bfb5fad27
fixing the dbSNP test. Also removing unnessasary comments from the GenomeLocParser, added some tests, and commented out the performance test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1676 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 23:32:24 +00:00
aaron
39a47491a9
changes to make GenomeLoc string parsing 25% faster
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1675 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 22:37:47 +00:00
aaron
7b39aa4966
Adding the VCF ROD. Also changed the VCF objects to much more user friendly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 20:19:34 +00:00
aaron
542d817688
more cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1631 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 21:42:03 +00:00
aaron
b401929e41
incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 04:48:42 +00:00
aaron
e03fccb223
Changes to switch Variant Eval over to the new Variation system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1611 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 05:34:33 +00:00
depristo
17ab1d8b25
General purpose merging iterator implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1593 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:06:15 +00:00
aaron
811503d67b
vcf changes from Richards comments, fixed a test case
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1456 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 14:32:16 +00:00
aaron
6313c465fb
we want the RMS of the reads qualities not the RMS of the RMS of the read qualities.
...
Also the VCF version tag seems to be standardized as VCR. Updated the VCF code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1447 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 21:56:29 +00:00
aaron
5725de56dc
fixes in VCF, some changes to get it ready to move out of the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1441 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 23:31:03 +00:00
aaron
4cf9110468
Adding a lot of changes to the VCF code, plus a new basic validator. Also removing an extra copy of the Artificial SAM generator that got checked in at some point.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1437 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 05:08:28 +00:00
aaron
63d90702d6
another iteration of the VCFReader and VCFRecord, introducing the VCFWriter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1429 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 22:17:34 +00:00
aaron
8403618846
the start to the VCF implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1425 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 04:34:15 +00:00
andrewk
8eeb87af2a
Tests for downsampling related utilities in ListUtils class that didn't get checked in earlier
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1352 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:09:35 +00:00
aaron
bca894ebce
Adding the intial changes for the new Genotyping interface. The bullet points are:
...
- SSG is much simpler now
- GeliText has been added as a GenotypeWriter
- AlleleFrequencyWalker will be deleted when I untangle the AlleleMetric's dependance on it
- GenotypeLikelihoods now implements GenotypeGenerator, but could still use cleanup
There is still a lot more work to do, but this is a good initial check-in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1335 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 19:43:59 +00:00
aaron
b4adb5133a
GLF rod as a AllelicVariant object.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1282 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 00:55:52 +00:00
aaron
9ecb3e0015
adding GLFRods with tests and some other code changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1251 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 15:30:19 +00:00
aaron
36819ed908
Initial changes to the SSG to output GLF by default
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1231 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 08:46:04 +00:00
aaron
e4152af387
added a big speed-up for interval list input processing. With large interval sets this was taking way too long...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1227 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 22:00:00 +00:00
hanna
b61f9af4d7
Cleaning up, preparing to incorporate a better fix for Eric's problems with validation stringency in BAM files opened directly from the walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1222 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 01:42:13 +00:00
aaron
8ee5c7de8e
GLF reader and writer check in.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1202 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 23:06:37 +00:00
hanna
da4d26b1ea
Enum support for command-line argument system, and some cleanup for hacks to the CleanedReadInjector that were required because Enum support was missing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1199 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:26:16 +00:00
hanna
433ad1f060
Cleanup...deprecate FastaSequenceFile2 in favor of IndexedFastaSequenceFile or ReferenceSequenceFile from Picard, depending on the application.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1196 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 18:49:08 +00:00
hanna
194b75613b
Fix compile problem with unit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1187 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 20:29:31 +00:00
aaron
f6a273a537
other fixes for some broken unit tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1181 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 05:53:13 +00:00
hanna
d19366eaad
Cleanup emergency fixes for out-of-bounds issues in reference retrieval. Fix spelling mistakes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1173 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 15:41:30 +00:00
aaron
d4d3af20f2
made a fake fasta generator, so we can now generate a complete bam / fasta combo of made up data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1150 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 21:35:34 +00:00
aaron
f5cba5a6bb
Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1132 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:17:24 +00:00
aaron
d7d4298917
Some files to support generic genotype outputing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1112 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:43:41 +00:00
aaron
5b1c23a7f2
changes to fix and test the interval based traversals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1095 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:54:15 +00:00
aaron
bcb64d92e9
Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1069 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 14:39:41 +00:00
depristo
8ac40e8e2d
Updated version of the recalibration tool
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 17:45:47 +00:00
aaron
6ee64c7e43
added changes to support alec toUnmappedRead seek. Huge improvements (orders of magnitude) in unmapped read performance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1021 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 22:15:56 +00:00
hanna
71e3825fa1
First pass of a walker for Eric that searches through an input BAM file for unclean reads, injecting the cleaned reads in their place and outputting the composite result.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@989 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:18:13 +00:00
aaron
195b4ea7b4
a rename for consistancy of Sam to SAM, creating a genotype utils dir, and moving the GLF code into it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@984 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 17:46:06 +00:00
aaron
36c98b9d6c
added tools to test read based traversals using the artificial in-memory SAM file tools, and testing of the PrintReadsWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@957 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 01:52:25 +00:00
aaron
eb962fe52a
adding an artificial sam file writer, used to unit test some of the walkers (mainly the PrintReadsWalker)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@956 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 21:47:49 +00:00
kiran
af0b03a257
Added tests for mostFrequentBaseFraction() and reverseComplementString()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@944 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:53:45 +00:00
aaron
109bef6c08
We're no longer in the read-dropping business.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@901 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:37:51 +00:00
aaron
82aa0533b8
added some more documentation to the GLF writer and it's supporting classes, and some other fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@875 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 14:53:58 +00:00
aaron
e712d69382
GLF writing support
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@872 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 21:30:18 +00:00
aaron
b43deda6c9
iterative changes to GLF files; also a test of checking-in over sshfs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@850 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:24:30 +00:00
hanna
5e8c08ee63
Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
aaron
d275c18e58
adding some objects we need for the GLF format.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@846 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 22:32:25 +00:00
aaron
d994544c47
Added back end code support for Sharding based on genomic location for reads. Changed the sharding
...
code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier
and cleaner test cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@816 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:57:46 +00:00
aaron
d056f9f3e8
Changed the name to reflect the sorted nature of the set, added some fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@810 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 22:34:24 +00:00
aaron
831d430025
Added a collection for storing GenomeLocs, that also has functions for removing by genomic region (that may span multiple GenomeLoc's in the collection), and adding regions, which are then merged with any overlapping regions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@809 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:52:40 +00:00
depristo
7a979859a9
Intermediate checking for evaluation -- now supports transition / transversion evaluation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@793 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:05:06 +00:00
kiran
bdf772f017
Added test for determining the fraction of a sequence that's taken up by the most frequent base (quick-and-dirty homopolymer testing).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@780 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:08 +00:00
kiran
324ef9cbd1
Test class for PathUtils.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@773 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:31:22 +00:00
hanna
7161b8f927
Disable support for short name values directly abutting their arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@740 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 16:09:32 +00:00
hanna
d14cab0be7
Added IterableLocusContextQueue and test. Cleaned up tests, adding BaseTest where it didn't exist. Enhanced test runner to run only classes ending in ...Test.java, so that utility classes can sit alongside the tests but won't be run by JUnit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@693 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 21:32:05 +00:00
aaron
4ce3feba4d
my move ended up being a copy, so this is to delete dupplicate files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@651 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 02:10:26 +00:00
aaron
bae4256574
Started the process to make the GATK engine into a runnable object so we can call it from other processes. Step 1: make a configuration object that can serialize to and from an XML file. This way we can store the information everyone uses shell scripts for. Also we can now pull the list of params out of the GenomeAnalysisTK.java. More to come...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@636 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 01:25:26 +00:00
hanna
7f8850a8a2
Argument validation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@631 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 20:28:56 +00:00
hanna
d725c6cf1c
Added unit tests for parsing failures that I encountered during integration testing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@618 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 14:01:54 +00:00
hanna
4177560543
Mutually exclusive options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@616 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 13:27:48 +00:00
hanna
98716138e9
Cleanup: add support for non-public fields. Track matches as state of parsing engine as well as definitions.
...
Made fields of command-line argument system non-public by default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@606 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 19:38:05 +00:00
hanna
ef211f96b1
Remove old Apache CLI-based arg system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@604 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:37:51 +00:00
hanna
521aa40baa
Bring new command-line argument parsing system live.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@603 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:16:11 +00:00
hanna
bfd6dfe36c
Added real-world tests and tests for conditional validation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@601 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 13:38:46 +00:00
hanna
2ee9374975
Check for proper error output in case of boolean args with parameter specified.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@599 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 23:08:48 +00:00
hanna
b0cdba8bb3
Acting on Kiran's suggestion to make the doc tag in the @Argument annotation required.x
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@598 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:43:40 +00:00
hanna
ec0261275b
Lots of command line argument validation. Catches all common validation problems, including missing required arguments, invalid arguments, and several types of misplaced argument value errors.
...
Still pending:
- Help system.
- Mutually exclusive arguments.
- Design includes too many classes per file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@597 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:08:00 +00:00
hanna
6550fe6f97
Another pass of command-line arguments. Revised parser supports all types
...
of arguments that the existing parser supports, but does a poor job with
validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@591 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 22:41:23 +00:00
hanna
4f2ccda56a
Interface skeleton for a new command line argument parser. Nowhere near the point of being a drop-in replacement for apache cli yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@588 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 00:11:42 +00:00
depristo
7ed496b859
JUnit test for RefHanger
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@584 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 20:11:14 +00:00
kiran
9800d09608
A more thorough test for multinomialProbability.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@577 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:27:05 +00:00
aaron
3bf3c21ddd
Changed the assert code in the genome loc to throw exceptions, and deleted a function no one seems to be using.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@569 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 13:54:51 +00:00
hanna
ba9a0b5da8
Break out some of the weird inner classes out of the HierachicalMicroScheduler.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@566 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 21:07:07 +00:00
kiran
eeb0b78cce
Added another assert to testBinomialProbability() and added a test method for testMultinomialProbability().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@544 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 14:59:11 +00:00
hanna
e50ae97fe1
Introduce new index-based fasta reader. Clean up MicroManager code, pushing necessary code back into TraversalEngine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@531 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:40:21 +00:00
kiran
305584b69e
Test class for MathUtils with a test for binomialProbability().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@519 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:31:02 +00:00
hanna
45d962e491
I understood the contig index incorrectly when I initially wrote this code. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@517 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 22:31:43 +00:00
hanna
56f6847456
Changed interface from contig,pos,length to more common contig,start,stop interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@441 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 00:04:41 +00:00
hanna
26e84d7fd6
Added index iteration for ReferenceSequenceFile interface compatibility.
...
Added better error checking for querying past the end of a contig.
Lots more testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@429 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 17:17:11 +00:00
hanna
182626576f
Basic indexed fasta POC in place. Requires a more complete implementation of the ReferenceSequenceFile interface,
...
and much more testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@425 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 13:46:56 +00:00
aaron
96248cdec4
Added some output to all the classes, including build in runtime analysis
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@411 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:14:53 +00:00
hanna
0629f79049
Moved fasta support files into their own package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@408 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 18:13:23 +00:00
hanna
7a4a5a17c0
Made sequence index compatible with Aaron's junit changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@407 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 17:53:20 +00:00
hanna
186c799ffc
Class to read an .fai file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@405 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 17:37:18 +00:00
aaron
4b3578e1de
Added the base test case, fixed the rest of the test cases to follow suit. Added more verbose output to ant for junit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@403 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 17:11:38 +00:00
hanna
cf929a8275
Get rid of test case's dependence on transient methods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@381 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:16:42 +00:00
hanna
9c37400c4f
Added basic performance testing so I can make sure concurrent access doesn't slow down overall fasta access.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@367 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 18:05:56 +00:00
kcibul
ce72932a45
* refactored GenomeLoc to use contigIndex internally for performance and fixed several calling classes
...
* added basic unit test for GenomeLoc
* fixed bug when parsing genome locations like chr1:5000 the start position was being left as maxint rather than being set to the same as the stop position.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@365 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:25:17 +00:00
hanna
49fd951d8c
Initial test suite for FastaSequenceFile2, so I can add parallelism support with abandon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@364 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-11 21:10:42 +00:00
hanna
b17a03abbd
Fix argument parser test case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@215 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 16:05:18 +00:00
aaron
c2b2ed8e1d
added our first junit test, for the argument parser
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@176 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 21:14:30 +00:00