depristo
c5f6ab3dd5
CoverageHistogram now sees 0 coverage sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1266 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 20:58:41 +00:00
ebanks
8bc0832215
Generate chip concordance table.
...
This should work, although I need to test it with some real GLFs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1265 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 17:44:47 +00:00
ebanks
88ffb08af4
Need to return real values for some of the AllelicVariant methods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1264 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 02:31:10 +00:00
kcibul
e1055bcc4c
moving to new external repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1261 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 20:46:08 +00:00
kcibul
4a730adfc1
committing latest changes before moving repositories
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1260 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 20:44:02 +00:00
ebanks
692b1e206f
stop throwing an exception here: we don't always have allele counts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1259 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 20:34:01 +00:00
ebanks
a245ee32fa
A walker to split 2 call sets into their intersection/union/disjoint (sub)sets.
...
Yes, the name is retarded, but I'm under pressure here...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1258 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 20:20:47 +00:00
ebanks
ba349e8d52
add FLT ROD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1257 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 19:40:50 +00:00
ebanks
800f7e6360
make AllelicVariant extend ReferenceOrderedDatum (not Comparable) since ROD itself is Comparable. Then we can generalize RMD tags.
...
Blame Matt if this doesn't work - he said it wouldn't break anything.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1256 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 19:25:06 +00:00
kcibul
00d49976fb
committing latest changes before moving repositories
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1255 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 18:41:52 +00:00
ebanks
5be5e1d45f
added conversion from iupac format and new rod to deal with FLT file format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1254 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 18:34:41 +00:00
aaron
d36e232ed3
adding GLF rods to the module list
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1252 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 15:42:34 +00:00
aaron
9ecb3e0015
adding GLFRods with tests and some other code changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1251 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 15:30:19 +00:00
hanna
c25f84a01c
Regression: we lost our hack to work around BAM files with index problems (affects BAM files created before 23 Apr 2009 and traversed by interval). Added the hack back in, along with a much more explicit comment about why its there.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1248 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:41:37 +00:00
depristo
1798aff01b
VariantEval now understands the difference between a population-level analysis and a genotype analysis, and handles both. All analyses annotated as supporting one or the other or both. Preparation for genotype chip concordance calculations as well as called sites, etc analyses
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1247 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:07:13 +00:00
ebanks
513d43b5f3
now implements AllelicVariant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1246 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:06:25 +00:00
ebanks
d369136bda
depricate this ROD yet again
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1245 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 13:33:03 +00:00
ebanks
efcbb16688
un-deprecate this ROD and make it implement Genotype
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1240 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 19:45:41 +00:00
depristo
84d407ff3f
Fixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1239 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 18:53:27 +00:00
hanna
76b09a879b
Display a more intelligent error message if the user runs a locus traversal across an unmapped reads file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1238 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 18:36:09 +00:00
aaron
99ddd8ab15
bug fix for transitioning between chromosomes in GLF output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1237 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 17:58:04 +00:00
aaron
7d755a4c90
GenotypeLikelihoods doesn't emit metrics, they don't make sense
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1236 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 17:22:28 +00:00
aaron
01fc8da270
adding the GenotypeLikelihoodsWalker, which generates GLF genotype likelihoods that are pretty much identical to the samtools calls.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1235 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 16:57:18 +00:00
hanna
99f9cd84ed
Warning for possibly mismatched reads / reference was very aggressive. Relax
...
the criteria a bit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1234 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 16:21:22 +00:00
hanna
12b5d9c70c
The number of loci can easily overflow an int. Change reduce type to a Long.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1233 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 16:07:00 +00:00
depristo
5bf7647498
0.2.3 -- now preserves Q0 bases throughout the reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1232 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 12:27:31 +00:00
aaron
36819ed908
Initial changes to the SSG to output GLF by default
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1231 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 08:46:04 +00:00
hanna
0f6bfaaf73
Skip validation in case of no reads aligning.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1230 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 02:03:36 +00:00
ebanks
a1d33f8791
-Added walker to dump strand test results to file
...
-Refactored strand filter to handle calls from the walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1229 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 01:56:50 +00:00
hanna
bfe90af5e2
Some quick and dirty fixes to support querying unmapped BAM files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1228 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 01:25:20 +00:00
aaron
e4152af387
added a big speed-up for interval list input processing. With large interval sets this was taking way too long...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1227 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 22:00:00 +00:00
hanna
9f0fb9f3aa
Fix for GSA-90: GATK banner and error messages should point to the wiki website.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1226 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 21:56:41 +00:00
hanna
b18caa2052
Fix for GSA-90: System isn't failing with an error when you use the wrong reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1225 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 20:42:12 +00:00
ebanks
52659d02d4
ignore unmapped reads in all the indel walkers (since they're giving me overhead issues)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1224 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 16:51:11 +00:00
hanna
5c321f9630
Oops! Accidentally deactivated the ArgumentFactory, needed by the CleanedReadInjector, while refactoring last night.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1223 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 16:41:55 +00:00
hanna
b61f9af4d7
Cleaning up, preparing to incorporate a better fix for Eric's problems with validation stringency in BAM files opened directly from the walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1222 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 01:42:13 +00:00
ebanks
4c02607297
genotyper also needs to have 454 reads filtered out
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1221 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 23:19:28 +00:00
ebanks
dea72c576e
use the filter to ignore 454 reads in the traversal to speed up cleaning
...
(since there's less area to actually clean against)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1220 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 18:34:44 +00:00
ebanks
0070b8ea6a
Until 454 goes far, far away, at least we can completely ignore it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1219 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 18:31:53 +00:00
asivache
1401606344
move warning about strictly adjacent intervals in a contig from 'remap' to 'read', so it is issued only once
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1218 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 17:58:11 +00:00
hanna
aa4f60d980
Make sure that only reads marked as 'mapped' are filtered based on validity of alignment.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1217 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 17:44:06 +00:00
asivache
e01d37024a
now updates mapping quality (to an arbitrary chosen value of 37 if the resulting mapping is unique) and X0, X1 tags after remapping (in REDUCE mode)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1216 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 16:40:52 +00:00
asivache
b08b121756
synchronyzing; debug statements commented out, so nothing changed really
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1215 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 16:38:33 +00:00
asivache
a1eb128377
few more detailed debug printouts conditioned on if (DEBUG), so no real changes...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1214 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 16:36:57 +00:00
hanna
03e1713988
Better support for specifying read filters to apply directly from the walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1212 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 23:59:53 +00:00
aaron
ce08f5f0c3
Removed some unused variables, fixed some javadoc. The usual.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1211 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 22:10:22 +00:00
aaron
9cfd89c54f
a small refactoring, and some documentation cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1210 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 22:03:45 +00:00
aaron
d86717db93
Refactoring of the traversal engine base class, I removed a lot of old code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1209 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 21:57:00 +00:00
ebanks
3519323156
Output the correct geli text format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1208 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 19:45:18 +00:00
ebanks
99631cdaa1
fix and then deprecate the rodGELI class (GELIs suck)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1207 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 19:18:13 +00:00
hanna
60a86fb34a
Better handling of fasta files with non-standard extensions.x
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1206 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 18:18:48 +00:00
hanna
5e26770634
Hack the MicroScheduler to be tolerant of RefWalkers. We need to implement a longer-term solution to make it easier for datasources to report problems they've encountered along the way (GSA-103).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1205 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 17:26:59 +00:00
kcibul
bc44e08225
refactored output logic
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1204 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 16:13:01 +00:00
ebanks
3fe7104963
Added walker to filter out clustered SNPs from a call set
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1203 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 03:16:27 +00:00
aaron
8ee5c7de8e
GLF reader and writer check in.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1202 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 23:06:37 +00:00
andrewk
c8fcecbc6f
Added ParseDCCSequenceData.py to repository and made changes that allow an analysis of quantity of sequence data by platform and project, moved table / record system to a new module called FlatFileTable.py and built that into ParseDCCSequenceData and CoverageEval.py; changed lod threshold in CoverageEvalWalker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1201 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 22:04:26 +00:00
hanna
3f0304de5a
Get rid of unused iterator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1200 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:39:16 +00:00
hanna
da4d26b1ea
Enum support for command-line argument system, and some cleanup for hacks to the CleanedReadInjector that were required because Enum support was missing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1199 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:26:16 +00:00
ebanks
aacec3aeb0
rod for binary GELI files (still needs to be tested)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1198 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:25:56 +00:00
aaron
e106cf73d8
A quick change to provide more verbose output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1197 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 19:08:19 +00:00
hanna
433ad1f060
Cleanup...deprecate FastaSequenceFile2 in favor of IndexedFastaSequenceFile or ReferenceSequenceFile from Picard, depending on the application.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1196 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 18:49:08 +00:00
jmaguire
0a67386525
.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1195 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:59:36 +00:00
hanna
d8fbb2b62c
Refactoring; make a better home for the MalformedReadFilteringIterator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1194 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:54:20 +00:00
kiran
c78a72e775
Applies Fisher's Exact Test to determine whether there's a strand bias and, if so, filters the call out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1193 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:14:11 +00:00
kiran
b211f500a3
Applies secondary base feature to variants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1192 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:13:29 +00:00
kiran
6e31057e6b
Some changes involving output of marginal calls to different, per-filter files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1191 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:12:57 +00:00
ebanks
787c84d68b
only compare pair position for paired end reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1190 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 04:07:08 +00:00
andrewk
d3daecfc4d
Added unit tests for function in ListUtils to randomly sample lists with replacement, updated AlleleFrequencyEstimate to provide a callType of HomRef, HetSNP, HomSNP, update indices in CoverageEval.py, and made a lot of changes to CoverageWalker biggest one being that it directly calls SingleSampleGenotyper instead of implementing some parts of SSG itself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1189 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 02:05:40 +00:00
hanna
4ba2194b5e
Filter reads whose alignment starts past the end of the contig to which it allegedly aligns.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1188 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 22:27:44 +00:00
jmaguire
1db15ee468
made some things protected so that I can inherit them in MultiSampleCallerAccuracyTest
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1185 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 15:50:28 +00:00
jmaguire
1fa71aa31d
Now outputs stats. Doesn't do the downsampling thing because I think I'll have enough counts.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1184 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 15:29:31 +00:00
hanna
5d7393d7cb
Temporary fix for Eric's problems with SOLiD reads: make sure the command-line argument system takes the --validation-strictness command-line argument into account when creating SAMFileReaders.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1183 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 15:18:05 +00:00
aaron
033bafe7a1
fixed sam by reads test for the new filtering code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1180 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 05:45:50 +00:00
aaron
2a86f2f833
an initial pass at the GLF reader, and some other genotype changes to phase out the LikelihoodObject I created.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1179 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 04:30:27 +00:00
hanna
5735c87581
Basic infrastructure for filtering malformed reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1178 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:50:22 +00:00
depristo
b9d533042e
Two-tailed HardyWeinberg test implemented. VariantEval now separate violations from summary outputs for clarity; Fixing problems with CovariateCounterTest and TabularRodTest
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1177 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:02:04 +00:00
hanna
31313481f6
Temporary patch to filter out bad alignments that aren't quite fully reported as bad.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1176 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 18:41:55 +00:00
mmelgar
6580211c2a
First version of depth of coverage filter. Right now it takes in a maximum coverage threshold given by the user.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1175 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 18:22:46 +00:00
ebanks
fac7ac5142
Don't print out 0 coverage (which is always 0)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1174 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 17:44:32 +00:00
hanna
d19366eaad
Cleanup emergency fixes for out-of-bounds issues in reference retrieval. Fix spelling mistakes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1173 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 15:41:30 +00:00
kcibul
000d92a545
added gc calculation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1172 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 13:07:04 +00:00
ebanks
338cdbebad
deal with screwy solid reads in the cleaner (no cigar strings)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1171 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 16:49:58 +00:00
jmaguire
8bcbf7f18a
First draft of multi sample caller accuracy test.
...
Doesn't do it's job yet but the pieces are in place.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1170 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 16:29:13 +00:00
jmaguire
4019cd2bd7
Added ROD for parsing hapmap3 genotype files.
...
Tweak to TabularROD to allow HapMapGenotypeROD to work.
Added HapMapGenotypeROD to list of RODs in ReferenceOrderedData.java.
Modified MultiSampleCaller to return a single object with most of the relvant information.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1169 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 16:28:24 +00:00
ebanks
e5e249d4ac
temporary fix to deal with screwy SOLiD reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1168 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 03:25:57 +00:00
depristo
cf1854b339
Fix for monsterous problems with solid data -- now can dynamically expand recalibration tables on the fly as reads declare additional read groups -- use assumeFaultyHeader flag
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1167 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 17:15:49 +00:00
depristo
bcda66d2db
Simple performance improvements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1166 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 16:45:23 +00:00
hanna
0d00823332
Fix for performance bug in extending the read with X's in cases where the read is aligned off the end of the contig.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1165 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 16:17:38 +00:00
kcibul
be2f8478c0
added supression of failure messages
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1164 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 15:19:37 +00:00
kcibul
25c30b12bb
added MAF-style output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1163 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 15:10:19 +00:00
andrewk
dcb8892568
Lot of code for coverage evaluation tools including first version of python script to evaluate the downsampled SSG callls made and the java code to make all the calls at Hapmap chip sites at various downsampling levels; ListUtils contains functions for randomnly subsetting lists (with replacement) which are useful for subsetting the same elements in both the reads and the offsets lists of a LocusWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1162 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 08:07:02 +00:00
asivache
d603145cb0
Meaning of input arguments has CHANGED: minFraction is now a minimum fraction of CONSENSUS indel observation, out of all reads covering the site, required to make the call. minConsensusFraction is still the minimum fraction of CONSENSUS indel observation out of all indel observations at the site
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1160 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 20:38:10 +00:00
hanna
62807139fc
Cleanup pileup and depth of coverage in preparation for release. Add pileup, depth of coverage, and print reads to package for distribution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1159 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:54:01 +00:00
kcibul
6a25f0b9c5
refactored into new package
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1158 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:37:54 +00:00
aaron
1c83b4d949
forgot to take out some test code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1157 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:18:37 +00:00
aaron
bc17ff567a
When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions. Now with a test case!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1156 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:15:50 +00:00
depristo
47cb9f169e
Stable tool that's the reverse of merging -- splits a file into individual BAM files, one for each sample ID in the SAM header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1155 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 12:56:46 +00:00
depristo
6684cb8bc9
copySamFileHeader() utility function
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1154 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 12:55:51 +00:00
aaron
bb92eb8b1c
added a fix for overlapping reads in the locus context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1153 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 02:08:59 +00:00
aaron
d4d3af20f2
made a fake fasta generator, so we can now generate a complete bam / fasta combo of made up data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1150 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 21:35:34 +00:00
asivache
c2e5a68aaf
output format changed in --verbose --somatic mode: now also prints the <#reads with indels>/<coverage> for normal samples, rather than only for the tumor; also, code cleaned up a little
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1149 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 20:56:16 +00:00
andrewk
4cbf069de1
First version of coverage evaluation tool
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1148 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 20:52:25 +00:00
asivache
7462f3f344
Bug in setContig() fixed: sequence dictionary's .getSequences().contains() and .getSequences().indexOf() do NOT work when applied to contig names (Strings), since getSequences() returns a list of SAMSequenceRecord's; changed to querying the dictionary directly for specified contig name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1147 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 20:50:09 +00:00
ebanks
76fd4b3848
deal with different contigs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1146 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 19:17:27 +00:00
ebanks
20fab507a8
Choose the REF if it scores equal to consensus!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1145 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 18:54:27 +00:00
hanna
9b182e3063
Prep for documenting command-line arguments: delete some arguments that don't make sense any more given
...
the state of the traversals and GATK input requirements: all_loci (replaced by walker annotation), max
OTF sorts (bam files must be sorted and indexed), threaded io (replaced by data sharding framework).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1144 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 18:23:35 +00:00
ebanks
5a5103cfd2
Heads up, everyone: command-line args no longer need to be public.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1143 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 16:09:22 +00:00
hanna
b43d4d909e
Fix CleanedReadInjectorTest to work with new CleanedReadInjector.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1142 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 15:48:06 +00:00
aaron
d58eeb7539
Don't cry wolf: only one warning is now emitted, instead of tons of warnings.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1139 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:50:37 +00:00
hanna
a3e0ec20c4
Kill the TraverseByLocusWindows traversal. TraverseLocusWindows will take its place.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1138 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:46:35 +00:00
hanna
93da64db10
Update naming for consistency.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1136 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 22:03:21 +00:00
hanna
e93f751bd7
First step in replacing the Hello, World! document. Revamped the HelloWalker and checked it into the source tree, created a special build file for it, and added it to the packaging tool.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1135 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 21:59:54 +00:00
ebanks
8d3dc57c3d
Commit to emit in sorted order so we don't have to use /tmp
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1133 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:47:15 +00:00
aaron
f5cba5a6bb
Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1132 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:17:24 +00:00
asivache
177d6d00b8
added setContigIndex(). NOTE: both setContig() and setContigIndex are UNSAFE as one does not automatically involve updating the other, and there's also no validation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1130 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 17:40:37 +00:00
depristo
9fca79ed62
Read groups are now sorted in the output data, for convenience
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1129 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:50:44 +00:00
ebanks
08df4771c8
count X/N/etc. as mismatches for the NM attribute in the BAMs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1127 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:08:55 +00:00
kiran
d412c5dc2f
Updated to use SecondaryBaseAnnotator class.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1126 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:08:43 +00:00
kiran
e3cdf7ef4b
A single class that can be handed reads for training and basecalling. When in training mode, we accumulate no more than 10000 reads and always replace the lowest-quality reads with superior quality reads. Thus, the training set always contains 10000 of the best reads available. After training is complete, the class can be interrogated to return the SQ tag for a given RawRead object.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1125 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:03:15 +00:00
ebanks
8aa3b65e7f
fix to guarantee emission in sorted order
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1122 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 13:48:41 +00:00
aaron
03f8177a53
When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1121 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 20:51:55 +00:00
jmaguire
a17bf145f6
fix to respond to the change in IndelLikelihood constructor.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1119 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 19:05:33 +00:00
depristo
7ecc43e9a7
Fixed subtle null ptr exception discovered by Kiran. Now deals with the rare situation where you have only say Q28 bases at dbSNP sites, so you fail in the Table recalibration step with a null pointer error into the data structure indexed by quality score. If you are Q score above those seen before you aren't modified in any way.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1118 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 18:57:42 +00:00
ebanks
95e2ae0171
Deal with reads whose ends are aligned off the end of a chromosome.
...
Includes update to ignore non-ATCG bases (not just 'N')
(Also, create a BWA dir for future work)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1117 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:50:05 +00:00
jmaguire
65a788f18a
Added a ROD (SangerSNP) for parsing the Sanger's chr20 pilot1 SNP calls.
...
Some doodling around with indel calling in an EM context.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1116 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:32:12 +00:00
asivache
ceeeec13b8
Computes a vector of numbers of reads falling into successive intervals of specified length (e.g. numbers of reads per every 1Mbase)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1115 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:12:21 +00:00
ebanks
eb74b16e39
updated what constitutes removing entropy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1113 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 18:29:00 +00:00
aaron
d7d4298917
Some files to support generic genotype outputing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1112 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:43:41 +00:00
asivache
1a97c86f95
don't crash when an unmapped read is encountered, just write it into the output file, it should be ok
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1111 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:33:59 +00:00
hanna
491ed70b44
TraverseByLocusWindow -- asstd bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1109 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:51:38 +00:00
depristo
5289230eb8
Version 0.2.1 (released) of the TableRecalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
asivache
73caf5db15
This is, strictly speaking, NOT a GATK module. Standalone, picard-level executable except that it uses couple of gatk utils (GenomeLoc). Remaps alignments from cutom reference (such as transcritome, hyb-sel etc) onto the 'master' reference
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1107 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:04:18 +00:00
kiran
ee2af3b423
I committed this too soon... reverting...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1106 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:49:12 +00:00
hanna
ad3a3aa350
First pass at passing lists of files / lists of interval arguments work. Note that the interval
...
ROD system will throw up its hands and not deal with intervals at all if multiple interval files
are passed in (see JIRA GSA-95).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1105 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:44:23 +00:00
kiran
23680a9a16
Replaced an expensive sort with an inexpensive direct computation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1104 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:25:12 +00:00
ebanks
83816fb801
Stop using the annoying refIterator (temp change until new traversal is green lighted)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1103 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:05:39 +00:00
aaron
0c3aabd1c5
logger output should be less verbose by default. Also fixed a printout in my read validation walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1102 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:47:29 +00:00
kcibul
11d83ac7d0
pushing up to test on unix box
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1101 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:00:48 +00:00
ebanks
0d9041380d
remove printouts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1100 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:54:14 +00:00
jmaguire
2c97c5e873
Compute a simple histogram of depth of coverage.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1098 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:30:11 +00:00
hanna
102b38c055
Sketch of new version of TraverseByLocusWindow, and a flag to conditionally turn it on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1097 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:20:56 +00:00
aaron
4e04370f14
forgot a file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1096 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:56:17 +00:00
aaron
5b1c23a7f2
changes to fix and test the interval based traversals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1095 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:54:15 +00:00
kcibul
3b24264c2b
incorporating skew check, further output of metrics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1094 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 16:01:07 +00:00
ebanks
ea2426dcd0
one more change needed to commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1093 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 15:09:53 +00:00
ebanks
347608cfe0
remove hacked traversal in preparation for move to Matt's new one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1091 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:32:05 +00:00
ebanks
940d75171a
Big cleaner changes:
...
1. Added a Walker to merge intervals before cleaning
2. (Almost) all Walkers can filter out 454 reads (and do by default)
3. Got rid of -all command and related pieces (time to switch to CleanedReadsInjector)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1090 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:31:24 +00:00
asivache
3cb6d7048e
don't freak out if two reference intervals a custom contig is built of are strictly adjacent; instead politely warn user that her data suck and proceed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1089 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 19:08:10 +00:00
asivache
d4f3ca1a10
A utility class for keeping the mapping from 'custom' reference (e.g. transcriptome) onto the 'master' reference (e.g. whole genome), and for remapping SAM records from the former onto the latter. It's Arachne's BaitMultiMap, pretty much
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1088 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 18:16:15 +00:00
kiran
69dc502174
I forgot that this depends on BoundedScoringSet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1087 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 17:18:53 +00:00