aaron
05c164ec69
changing the default behavior to allow any sized read pile-up (which may exceed the memory limit); the user can then select their own read limit. The default of 100K was arbitrary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1498 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 14:46:00 +00:00
ebanks
54c0b6c430
Allow this ROD to consist of just the positions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1497 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 12:43:18 +00:00
aaron
4a1d79cd7b
added a flag, maximum_reads_at_locus, shortName "mrl", which limits the number of reads we add to the locusByHanger. In some bam files misalignment produces pile-ups of 750K or more reads. We now limit this to the default of 100K reads.
...
The user is warned if a locus exceeds this threshold, and no more reads are added.
Also CombineDup walker had an incorrect package name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1496 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 04:21:58 +00:00
ebanks
0addae967a
IndelArtifact filter can now handle filtering false SNPs that occur within the span of an indel but after the first position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1495 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 03:34:39 +00:00
ebanks
8e3c3324fa
Added filter for SNPs cleaned out by the realigner.
...
It uses the realigner output for filtering; in addition, dbsnp indels partially work; IndelGenotyper calls don't yet work.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1489 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 04:32:32 +00:00
ebanks
8bc7afe781
Smarter SW penalties
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1488 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 04:29:19 +00:00
ebanks
1a299dd459
Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1486 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 03:31:37 +00:00
ebanks
215e908a11
Reworking of the VariantFiltration system to allow for a windowed view of variants and inclusion of more data to the various filters.
...
This now allows us to incorporate both the clustered SNP filter and a SNP-near-indels filter, which otherwise wasn't possible.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1484 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 02:16:39 +00:00
depristo
813a4e838f
Removing old code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1482 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 19:27:11 +00:00
depristo
49a7babb2c
Better organization of Genotype likelihood calculations. NewHotness is now just GenotypeLikelihoods. There are 1, 3, and empirical base error models available as subclasses, along with a simple way to make this (see the factory).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1481 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 19:16:30 +00:00
depristo
522e4a77ae
Caching support across multiple technologies
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1480 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 18:10:14 +00:00
depristo
5af4bb628b
Intermediate checking before code reorganization. Full blown support for empirical transition probs in SSG for all platforms. Support for defaultPlatform arg in SSG. Renaming classes for final cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1479 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 17:34:43 +00:00
depristo
bde67428fd
Better formatting of the code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1477 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-29 21:46:47 +00:00
aaron
8331c195fb
changed the full name of maximum_reads to maximum_iterations for consistancy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1475 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 16:03:46 +00:00
depristo
8e129d76fd
Support for original quality scores OQ flag. pQ flag in TableRecalibation to preserve quality scores below a threshold (defaulting to 5)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1474 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 14:14:21 +00:00
depristo
bf60980653
Experitmental support for empirical P(B_true | B_miscall). --useEmpiricalTransitions flag to SSG enables this support. Much better implementation of Genotype likelihoods -- the system should scream along now. Continuing progress towards deleting old model
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1469 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 00:17:24 +00:00
depristo
7cf9a54b64
change for new char/byte in BaseUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1467 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 23:47:56 +00:00
hanna
e5115409fa
Force columnSpacing to be at least one. We need a general-purpose, working tool for outputting columnar data to a PrintStream; will add JIRA.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1457 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 19:54:54 +00:00
hanna
ccdb4a0313
General-purpose management of output streams.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-23 00:56:02 +00:00
aaron
cd711d7697
Added detection of interval files with zero length to the GATK, and removed it from the interval merger walker: this was a critical blocking emergency issue for Eric.
...
also fixed some verbage in the GAEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1449 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 05:35:49 +00:00
aaron
6313c465fb
we want the RMS of the reads qualities not the RMS of the RMS of the read qualities.
...
Also the VCF version tag seems to be standardized as VCR. Updated the VCF code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1447 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 21:56:29 +00:00
ebanks
ed8c92a12a
make isReference do the right thing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1439 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 20:32:29 +00:00
ebanks
b3fe566c0c
Fix descriptions of walker args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1436 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 19:46:48 +00:00
ebanks
53153fcd79
Allow RODs to specify that incomplete records are okay (i.e. that they allow optional fields)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1433 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 15:26:10 +00:00
ebanks
b2a18a9d61
- first pass at a basic indel filter (for now, based on size and homopolymer runs)
...
- fix simple indel rod printout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1431 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 03:04:12 +00:00
jmaguire
1e8b97b560
quietly skip empty intervals files rather than crash.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1428 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 20:19:14 +00:00
jmaguire
92c63fb530
It's just "lod" not discovery_lod now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1427 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 18:44:09 +00:00
ebanks
df5744bcd3
update this walker so any variants can be passed in
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1426 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 16:30:39 +00:00
aaron
d101c20b30
added the ability to pass in a csv file of ROD triplets (one triplet per line) to the -B option
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1412 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 22:10:20 +00:00
ebanks
2c3f56cb8d
fix length calculation (it was including +/- char when it shouldn't)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1410 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 20:28:24 +00:00
aaron
fc1c76f1d2
fixing a bug where reads in overlapping interval based locus traversals could get assigned to only one of two the regions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1407 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 17:50:16 +00:00
ebanks
ecae619a1b
warn user when dbSNP rod looks suspicious
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1400 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 20:20:20 +00:00
asivache
2841e151d0
javadoc comments only
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1399 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 18:44:35 +00:00
ebanks
02f1af0743
Don't die when a readgroup is absent from the covariates table - it could
...
happen when all reads are unmapped (or have MQ0); instead, just don't alter
the quals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1394 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 03:10:33 +00:00
depristo
6d3ef73868
Now includes statistics on the allele agreement with dbSNP -- counts concordant calls as dbSNP = A/C and we say A/C, vs. we say A/T
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1392 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:37:07 +00:00
depristo
20baa80751
Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1391 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:01:04 +00:00
depristo
bbd7bec5db
Continuing cleanup of SSG. GenotypeLikelihoods now have extensive testing routines. DiploidGenotype supports het, homref, etc calculations. SSG has been cleaned up to remove old garbage functionality. Also now supports output to standard output by simply omitting varout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1387 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 22:25:30 +00:00
hanna
48713e154c
Windowed access to the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1383 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 16:29:15 +00:00
depristo
65e9dcf5b7
Fully operational version of the new genotype likelihoods class. (1) Much cleaner interface. Now explicitly stores likelihoods, priors, and posteriors in separate arrays indexed by an enum, (2) no longer can be used to make calls, it relies on SSGGenotypeCall to order the likelihoods, calculate best to ref, etc, this is just for calculating genotype likelihoods now; (3) Now performs extensive error checking with validate() to ensure the system is behaving properly. (4) fixed incorrect treatment of N bases, which we being counted against everyone (5) likely found a stats bug in which heterozyosity was being applied incorrectly to the genotype priors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1382 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 01:00:55 +00:00
depristo
4dc23f2763
Trivial formatting changes as I moved more legacy code into this system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1381 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:54:26 +00:00
depristo
34af669dbb
Explicit ENUM representation of the diploid genotypes. Please use this from now on to represent strings like AA or AT
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1380 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:53:43 +00:00
hanna
21d1eba502
Cleaned division of responsibilities between arguments to map function. Reference has been changed
...
from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect
the fact that it contains contextual information only about the alignments, not the locus in general.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:01:37 +00:00
depristo
20ff603339
New hotness and old and Busted genotype likelihood objects are now in the code base as I work towards a bug-free SSG along with a cleaner interface to the genotype likelihood object
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1372 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 23:07:53 +00:00
depristo
4986b2abd6
Fixing bug in SSG -- genotyping and discovery were mixed up by name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1371 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 22:13:35 +00:00
depristo
3485397483
Reorganization of the genotyping system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1370 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 20:55:31 +00:00
depristo
d840a47b11
Slight reorganization of genotype interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1366 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:17:15 +00:00
ebanks
4366ce16e0
Made sure all RODs have a (good) toString() method - and use it in the Venn walker. (thanks, Mark)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1339 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 14:53:27 +00:00
ebanks
feb7238f10
Wasn't always returning the correct alt base
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1337 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 03:08:04 +00:00
hanna
5429b4d4a8
A bit of reorganization to help with more flexible output streams. Pushed construction of data
...
sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler
to just microschedule.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1336 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 23:00:15 +00:00
hanna
7a13647c35
Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very*
...
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 16:11:45 +00:00
ebanks
3c4410f104
-add basic indel metrics to variant eval
...
-variants need a length method (can't assume it's a SNP)!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1324 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 03:25:03 +00:00
aaron
f1109e9070
Added the interator to SAMDataSource to prevent seeing dupplicate reads, only in a byReads traversal. The iterator discards any reads in the current interval that would have been seen in the previous interval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1317 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-25 22:36:29 +00:00
asivache
a361e7b342
SAMDataSource is now exposed by GATK engine; SamFileHeaderMerger is exposed from Resources all the way up to SAMDataSource, so now we can see underlying individual readers should we need them; GATK engine has new methods getSamplesByReaders(), getLibrariesByReaders(), and getMergedReadGroupsByReaders(): each of these methods returns a list of sets, with each element (set) holding, respectively, samples, libraries, or (merged) read groups coming from an individual input bam file (so now when using multiple -I options we can still find out which of the input bams each read comes from)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1315 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:59:49 +00:00
hanna
2024fb3e32
Better division of responsibilities between sources and type descriptors.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1314 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:15:57 +00:00
ebanks
59f0c00d77
-set indel cleaning walkers to be in core package
...
-move Andrey's alignment utility classes to core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1307 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 05:23:29 +00:00
aaron
0b16253db3
an iterator to fix the problem where read-based interval traversals are getting duplicate reads because reads span the two intervals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1305 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 23:59:48 +00:00
ebanks
477502338f
moved major indel cleaning pieces to core (yippee!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1301 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:59:51 +00:00
ebanks
4efe26c59a
Major: allow genotyper to optionally output in 1KG format, including outputting the samples in which indels are found.
...
Minor: refactor 454 filtering
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1300 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:53:51 +00:00
ebanks
ee8ed534e0
print full genotype for alt allele
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1297 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 01:35:23 +00:00
depristo
9c12c02768
AlleleBalance and on/off primary base filters -- version 0.0.1 -- for experimental use only
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1294 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 17:54:44 +00:00
hanna
6e4fd8db4a
Better formatting of available walkers, and only output them along with help. Cleanup JVMUtils.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1290 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 22:23:28 +00:00
depristo
761d70faa1
Better printing of multiple rods -- now produces a comma-separated set of values
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1289 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 21:58:27 +00:00
depristo
8588f75eb6
Better printing with toSimpleString() -- now prints out chip-genotype string
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1288 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 21:57:59 +00:00
hanna
1843684cd2
Cleanup: GATKEngine no longer needs to be lazy loaded, b/c the plugin directory no longer exists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1287 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:50:51 +00:00
hanna
b43925c01e
Switched to Reflections ( http://code.google.com/p/reflections/ ) project for
...
inspecting the source tree and loading walkers, rather than trying to roll
our own by hand.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1286 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:32:22 +00:00
kiran
436a196e2b
Bug fixes to support hapmap genotyping concordance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1285 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 16:20:10 +00:00
aaron
f13a1e8591
adding a couple of small changes to support contract with VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1283 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 03:49:15 +00:00
aaron
b4adb5133a
GLF rod as a AllelicVariant object.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1282 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 00:55:52 +00:00
ebanks
54fce98056
duh, don't print newline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1280 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-20 03:04:27 +00:00
ebanks
1d2b545608
add FLT toString method (to be used in PrintRODs) and add it to ROD list
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1279 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-20 02:47:50 +00:00
ebanks
387316ebe1
added indel rod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1276 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 16:05:51 +00:00
ebanks
da4af3b620
print indels in the format required for 1KG submissions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1275 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 15:59:18 +00:00
ebanks
d45c90b166
ROD to represent simple output from IndelGenotyper
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1274 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 14:36:12 +00:00
hanna
df1c61e049
Re-add the plugin path.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1271 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 22:48:44 +00:00
hanna
7c30c30d26
Cleaned up some duplicate code in preparation for making plugin dir configurable.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1270 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 22:02:21 +00:00
depristo
107f42a01e
Hacks for getting GLFs support in the Rod system working
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1268 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 21:03:47 +00:00
ebanks
88ffb08af4
Need to return real values for some of the AllelicVariant methods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1264 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 02:31:10 +00:00
ebanks
ba349e8d52
add FLT ROD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1257 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 19:40:50 +00:00
ebanks
800f7e6360
make AllelicVariant extend ReferenceOrderedDatum (not Comparable) since ROD itself is Comparable. Then we can generalize RMD tags.
...
Blame Matt if this doesn't work - he said it wouldn't break anything.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1256 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 19:25:06 +00:00
ebanks
5be5e1d45f
added conversion from iupac format and new rod to deal with FLT file format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1254 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 18:34:41 +00:00
aaron
d36e232ed3
adding GLF rods to the module list
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1252 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 15:42:34 +00:00
aaron
9ecb3e0015
adding GLFRods with tests and some other code changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1251 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 15:30:19 +00:00
hanna
c25f84a01c
Regression: we lost our hack to work around BAM files with index problems (affects BAM files created before 23 Apr 2009 and traversed by interval). Added the hack back in, along with a much more explicit comment about why its there.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1248 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:41:37 +00:00
ebanks
513d43b5f3
now implements AllelicVariant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1246 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:06:25 +00:00
ebanks
d369136bda
depricate this ROD yet again
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1245 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 13:33:03 +00:00
ebanks
efcbb16688
un-deprecate this ROD and make it implement Genotype
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1240 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 19:45:41 +00:00
depristo
84d407ff3f
Fixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1239 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 18:53:27 +00:00
hanna
76b09a879b
Display a more intelligent error message if the user runs a locus traversal across an unmapped reads file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1238 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 18:36:09 +00:00
hanna
99f9cd84ed
Warning for possibly mismatched reads / reference was very aggressive. Relax
...
the criteria a bit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1234 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 16:21:22 +00:00
hanna
12b5d9c70c
The number of loci can easily overflow an int. Change reduce type to a Long.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1233 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 16:07:00 +00:00
depristo
5bf7647498
0.2.3 -- now preserves Q0 bases throughout the reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1232 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 12:27:31 +00:00
hanna
0f6bfaaf73
Skip validation in case of no reads aligning.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1230 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 02:03:36 +00:00
hanna
bfe90af5e2
Some quick and dirty fixes to support querying unmapped BAM files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1228 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 01:25:20 +00:00
hanna
9f0fb9f3aa
Fix for GSA-90: GATK banner and error messages should point to the wiki website.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1226 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 21:56:41 +00:00
hanna
b18caa2052
Fix for GSA-90: System isn't failing with an error when you use the wrong reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1225 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 20:42:12 +00:00
hanna
5c321f9630
Oops! Accidentally deactivated the ArgumentFactory, needed by the CleanedReadInjector, while refactoring last night.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1223 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 16:41:55 +00:00
ebanks
0070b8ea6a
Until 454 goes far, far away, at least we can completely ignore it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1219 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 18:31:53 +00:00
asivache
b08b121756
synchronyzing; debug statements commented out, so nothing changed really
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1215 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 16:38:33 +00:00
asivache
a1eb128377
few more detailed debug printouts conditioned on if (DEBUG), so no real changes...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1214 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 16:36:57 +00:00
hanna
03e1713988
Better support for specifying read filters to apply directly from the walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1212 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 23:59:53 +00:00
aaron
ce08f5f0c3
Removed some unused variables, fixed some javadoc. The usual.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1211 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 22:10:22 +00:00
aaron
9cfd89c54f
a small refactoring, and some documentation cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1210 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 22:03:45 +00:00
aaron
d86717db93
Refactoring of the traversal engine base class, I removed a lot of old code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1209 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 21:57:00 +00:00
ebanks
3519323156
Output the correct geli text format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1208 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 19:45:18 +00:00
ebanks
99631cdaa1
fix and then deprecate the rodGELI class (GELIs suck)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1207 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 19:18:13 +00:00
hanna
5e26770634
Hack the MicroScheduler to be tolerant of RefWalkers. We need to implement a longer-term solution to make it easier for datasources to report problems they've encountered along the way (GSA-103).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1205 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 17:26:59 +00:00
ebanks
3fe7104963
Added walker to filter out clustered SNPs from a call set
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1203 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 03:16:27 +00:00
hanna
3f0304de5a
Get rid of unused iterator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1200 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:39:16 +00:00
hanna
da4d26b1ea
Enum support for command-line argument system, and some cleanup for hacks to the CleanedReadInjector that were required because Enum support was missing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1199 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:26:16 +00:00
ebanks
aacec3aeb0
rod for binary GELI files (still needs to be tested)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1198 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:25:56 +00:00
hanna
433ad1f060
Cleanup...deprecate FastaSequenceFile2 in favor of IndexedFastaSequenceFile or ReferenceSequenceFile from Picard, depending on the application.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1196 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 18:49:08 +00:00
hanna
d8fbb2b62c
Refactoring; make a better home for the MalformedReadFilteringIterator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1194 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:54:20 +00:00
hanna
4ba2194b5e
Filter reads whose alignment starts past the end of the contig to which it allegedly aligns.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1188 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 22:27:44 +00:00
hanna
5d7393d7cb
Temporary fix for Eric's problems with SOLiD reads: make sure the command-line argument system takes the --validation-strictness command-line argument into account when creating SAMFileReaders.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1183 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 15:18:05 +00:00
hanna
5735c87581
Basic infrastructure for filtering malformed reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1178 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:50:22 +00:00
hanna
31313481f6
Temporary patch to filter out bad alignments that aren't quite fully reported as bad.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1176 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 18:41:55 +00:00
hanna
d19366eaad
Cleanup emergency fixes for out-of-bounds issues in reference retrieval. Fix spelling mistakes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1173 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 15:41:30 +00:00
jmaguire
4019cd2bd7
Added ROD for parsing hapmap3 genotype files.
...
Tweak to TabularROD to allow HapMapGenotypeROD to work.
Added HapMapGenotypeROD to list of RODs in ReferenceOrderedData.java.
Modified MultiSampleCaller to return a single object with most of the relvant information.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1169 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 16:28:24 +00:00
ebanks
e5e249d4ac
temporary fix to deal with screwy SOLiD reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1168 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 03:25:57 +00:00
depristo
cf1854b339
Fix for monsterous problems with solid data -- now can dynamically expand recalibration tables on the fly as reads declare additional read groups -- use assumeFaultyHeader flag
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1167 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 17:15:49 +00:00
depristo
bcda66d2db
Simple performance improvements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1166 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 16:45:23 +00:00
hanna
0d00823332
Fix for performance bug in extending the read with X's in cases where the read is aligned off the end of the contig.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1165 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 16:17:38 +00:00
hanna
62807139fc
Cleanup pileup and depth of coverage in preparation for release. Add pileup, depth of coverage, and print reads to package for distribution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1159 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:54:01 +00:00
aaron
1c83b4d949
forgot to take out some test code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1157 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:18:37 +00:00
aaron
bc17ff567a
When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions. Now with a test case!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1156 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:15:50 +00:00
depristo
47cb9f169e
Stable tool that's the reverse of merging -- splits a file into individual BAM files, one for each sample ID in the SAM header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1155 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 12:56:46 +00:00
aaron
bb92eb8b1c
added a fix for overlapping reads in the locus context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1153 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 02:08:59 +00:00
hanna
9b182e3063
Prep for documenting command-line arguments: delete some arguments that don't make sense any more given
...
the state of the traversals and GATK input requirements: all_loci (replaced by walker annotation), max
OTF sorts (bam files must be sorted and indexed), threaded io (replaced by data sharding framework).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1144 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 18:23:35 +00:00
aaron
d58eeb7539
Don't cry wolf: only one warning is now emitted, instead of tons of warnings.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1139 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:50:37 +00:00
hanna
a3e0ec20c4
Kill the TraverseByLocusWindows traversal. TraverseLocusWindows will take its place.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1138 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:46:35 +00:00
hanna
e93f751bd7
First step in replacing the Hello, World! document. Revamped the HelloWalker and checked it into the source tree, created a special build file for it, and added it to the packaging tool.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1135 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 21:59:54 +00:00
aaron
f5cba5a6bb
Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1132 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:17:24 +00:00
depristo
9fca79ed62
Read groups are now sorted in the output data, for convenience
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1129 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:50:44 +00:00
aaron
03f8177a53
When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1121 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 20:51:55 +00:00
depristo
7ecc43e9a7
Fixed subtle null ptr exception discovered by Kiran. Now deals with the rare situation where you have only say Q28 bases at dbSNP sites, so you fail in the Table recalibration step with a null pointer error into the data structure indexed by quality score. If you are Q score above those seen before you aren't modified in any way.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1118 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 18:57:42 +00:00
ebanks
95e2ae0171
Deal with reads whose ends are aligned off the end of a chromosome.
...
Includes update to ignore non-ATCG bases (not just 'N')
(Also, create a BWA dir for future work)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1117 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:50:05 +00:00
jmaguire
65a788f18a
Added a ROD (SangerSNP) for parsing the Sanger's chr20 pilot1 SNP calls.
...
Some doodling around with indel calling in an EM context.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1116 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:32:12 +00:00
aaron
d7d4298917
Some files to support generic genotype outputing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1112 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:43:41 +00:00
hanna
491ed70b44
TraverseByLocusWindow -- asstd bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1109 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:51:38 +00:00
depristo
5289230eb8
Version 0.2.1 (released) of the TableRecalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
hanna
ad3a3aa350
First pass at passing lists of files / lists of interval arguments work. Note that the interval
...
ROD system will throw up its hands and not deal with intervals at all if multiple interval files
are passed in (see JIRA GSA-95).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1105 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:44:23 +00:00
ebanks
83816fb801
Stop using the annoying refIterator (temp change until new traversal is green lighted)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1103 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:05:39 +00:00
ebanks
0d9041380d
remove printouts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1100 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:54:14 +00:00
hanna
102b38c055
Sketch of new version of TraverseByLocusWindow, and a flag to conditionally turn it on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1097 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:20:56 +00:00
aaron
5b1c23a7f2
changes to fix and test the interval based traversals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1095 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:54:15 +00:00
ebanks
347608cfe0
remove hacked traversal in preparation for move to Matt's new one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1091 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:32:05 +00:00
depristo
0a50f2e160
Updated and near final version of tabular recalibration system. Uses 'yates' correction for low-occupancy quality bins. Faster and more robust handling of input and output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1082 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 03:52:12 +00:00
hanna
ef546868bf
Pooling of unmapped reads -- improves runtime of files with tons of unmapped reads by an order of magnitude.
...
Desperately needs cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1080 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 23:48:06 +00:00
aaron
8b4d0412ca
Changed the duplicate traversal over to the new style of traversal and plumbed into the genome analysis engine. Also added a CountDuplicates walker, to validate the engine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1072 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:11:18 +00:00
aaron
bcb64d92e9
Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1069 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 14:39:41 +00:00
depristo
26eb362f52
Added novel / known split to variant eval. That is, emits all of the standard analyses on SNP partitioned into those known in the provided known db and those novel. Also fixed problem with counting bases within subsets
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1068 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 21:27:40 +00:00
depristo
d3f0c51944
longer update times so we don't overwhelm when running genome-wide
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1067 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 14:10:02 +00:00
depristo
9e26550b0d
Apprach v2. Added python analysis script, so java no longer must be used to analyses quality score data. About to refactor out lots of unneeded code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1063 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-20 16:00:23 +00:00
hanna
dde52e33eb
Cleanup of the cleaned read injector based on Eric's feedback.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1062 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 22:04:47 +00:00
depristo
8ac40e8e2d
Updated version of the recalibration tool
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 17:45:47 +00:00
depristo
d748c85dc4
Cleaned code and reorganized -- moving in the right direction for v2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1052 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:28:34 +00:00
depristo
1bca144119
Moving things around
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1049 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:06:46 +00:00
depristo
3c40db260d
Added REFERENCE_BASES required annotation for performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1047 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:03:57 +00:00
kiran
7a921c908c
Can now adjust the genotype likelihoods of a variant returned from the rod. This automatically causes the lodBtr, lodBtnb, and genotype to be recomputed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1041 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 07:26:37 +00:00
kiran
c4d9058f32
Added module rodVariants.class to the list of allowable RODs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1037 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 21:33:13 +00:00
kiran
ab2a80f3ea
A new ROD type that allows one to input a geli.calls file back into a walker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1036 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 21:32:21 +00:00
hanna
cba9025983
More package-level documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1030 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 16:28:45 +00:00
hanna
43a28750e0
Package level documentation -- helps new users get acclimated to the codebase more quickly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1029 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 16:27:48 +00:00
aaron
6ee64c7e43
added changes to support alec toUnmappedRead seek. Huge improvements (orders of magnitude) in unmapped read performance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1021 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 22:15:56 +00:00
aaron
7db4497013
fixing the readTraversal output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1019 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 19:44:38 +00:00
ebanks
647b8a1ab0
Fix TabularROD printing and testing so Aaron stops nagging me.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1016 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 15:49:26 +00:00
aaron
a0a549557f
added a check of the sort ordering to the query methods, so that we detect if a file is unsorted much earlier. Also added some verbosity to the exception; it now contains an information about the raw attribute we saw for 'SO', the sort order of the bam file.
...
Also fixed a bunch of documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1015 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 22:15:03 +00:00
ebanks
11aa715630
added capability for filtering by platform
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1011 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 19:19:50 +00:00
ebanks
8f4bc8cb6e
Move filtering functionality into the PrintReadsWalker. More to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1010 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 16:38:08 +00:00
kiran
0583459839
Another formatting change to make Hapmap sites more clearly visible.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1004 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 19:53:21 +00:00
kiran
e9be2a9c60
Changed a formatting issue.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1002 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 19:40:32 +00:00
ebanks
032d0436e6
Added ROD for 1KG SNP calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@988 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 19:53:51 +00:00
ebanks
ffffe3b2f6
-Support for 1KG SNP calls in RODs
...
-Minor bug fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@987 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:56:37 +00:00
aaron
63b5c12cbd
Changed dataSources to datasources, to be consistant with the rest of our package names. Also, this makes me champion in the largest check-in contest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@985 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:13:22 +00:00
hanna
678ddd914f
Stopgap fixes GFF, DbSNP being half-open rather than half-closed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@980 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 21:38:57 +00:00
aaron
94b0e46d12
checked in a sample xml file used to store the defaults for the SomaticCoverage tool, and added it to the SomaticCoverage.jar in build.sml. Also added a inputStream marshalling method to the GATKArgumentCollection.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@979 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:46:16 +00:00
aaron
72a81f8f25
removed the requirement that a bam file list be present in the XML version of the command line arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@975 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:01:13 +00:00
aaron
f304803811
initial check-in of an easy way to create command line tools based on the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@970 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:34:02 +00:00
kiran
b0cc763eb5
Added some methods to format bases such that read bases on the forward strand are in uppercase, while those on the negative strand are lowercase. This does *not* affect the default functionality of the standard PileupWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@969 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:31:00 +00:00
hanna
ad80894afa
Bumped picard to latest svn version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@965 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:36:34 +00:00
aaron
ec2f015447
fixed a bunch of comments and license headers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@964 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:10:46 +00:00
hanna
dc6a9ca196
Pooling resources to lower memory consumption.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@962 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 13:39:32 +00:00
kiran
3adb4239e4
Same as regular Pileup, but also allows you to see flanking region around locus. This will be useful in determining that some SNPs are spurious due to being at the ends of homopolymer regions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@959 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:19:31 +00:00
depristo
7fa84ea157
10x speedup of recalibration walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@954 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 15:39:40 +00:00
aaron
a62bc6b05d
fixed some documentation and attached a correct license
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@953 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:44:27 +00:00
aaron
bf6190b471
cleaned up the PrintReadsWalker, and added a lot of documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@952 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:28:32 +00:00
kiran
fecba2cae5
Disabled option to show secondary quals as the definition has changed to conform to the spec and thus this printout is non-sensical.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@950 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 03:21:14 +00:00
aaron
a8a2d0eab9
added support for the -M option in traversals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@935 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 15:12:24 +00:00
asivache
9f35a5aa32
Insidious bug: clipped sequences (S cigar elements) where a) processed incorrectly; b) sometimes caused IntervalCleaner to crash, if such sequence occured at the boundary of the interval. The following inconsistency occurs: LocusWindow traversal instantiates interval reference stretch up to rightmost read.getAlignmentEnd(), but this does not include clipped bases; then IntervalCleaner takes all read bases (as a string) and does not check if some of them were clipped. Inside the interval this would cause counting mismatches on clipped bases, at the boundary of the interval the clipped bases would stick outside the passed reference stretch and index-out-of-bound exception would be thrown. THIS IS A PARTIAL, TEMPORARY FIX of the problem: mismatchQualitySum() is fixed, in that it does not count mismatches on clipped bases anymore; however, we do not attempt yet to realign only meaningful, unclipped part of the read; instead all reads that have clipped bases are assigned to the original reference and we do not attempt to realign them at all (we'd need to be careful to preserve the cigar if we wanted to do this)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@933 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 05:20:29 +00:00
depristo
98396732ba
Bug fixes for Andrey
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@930 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 18:19:51 +00:00
depristo
819862e04e
major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 23:34:37 +00:00
aaron
199be46c36
changed the warning that is outputted when the GenomeLoc constructor can't find the given contig in the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@913 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:49:03 +00:00
aaron
b323c58ef2
add a place to store the walker return value, along with a method to retrieve it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@910 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 14:41:42 +00:00
ebanks
36fb6ca3c5
Allow user to specify the compression to be used when writing out BAM files.
...
Updated most of the walkers to reflect this change.
Now it won't take forever to write BAMs!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@909 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 08:48:34 +00:00
ebanks
45eeefbb80
Deal with randomly occurring unmapped reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@906 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 02:55:53 +00:00
aaron
109bef6c08
We're no longer in the read-dropping business.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@901 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:37:51 +00:00
depristo
13be846c2a
qualsAsInt argument for Pileup -- fixing stupid bug [again]
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@898 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:52:12 +00:00
depristo
97c8ff75dd
qualsAsInt argument for Pileup -- fixing stupid bug
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@897 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:51:17 +00:00
depristo
9de3e58aa8
qualsAsInt argument for Pileup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@896 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:37:39 +00:00
asivache
bcc7bacba1
added List<Transcript> getTranscripts(); also more comments added
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@894 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 16:25:14 +00:00