ebanks
95e2ae0171
Deal with reads whose ends are aligned off the end of a chromosome.
...
Includes update to ignore non-ATCG bases (not just 'N')
(Also, create a BWA dir for future work)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1117 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:50:05 +00:00
jmaguire
65a788f18a
Added a ROD (SangerSNP) for parsing the Sanger's chr20 pilot1 SNP calls.
...
Some doodling around with indel calling in an EM context.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1116 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:32:12 +00:00
asivache
ceeeec13b8
Computes a vector of numbers of reads falling into successive intervals of specified length (e.g. numbers of reads per every 1Mbase)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1115 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:12:21 +00:00
ebanks
eb74b16e39
updated what constitutes removing entropy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1113 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 18:29:00 +00:00
asivache
1a97c86f95
don't crash when an unmapped read is encountered, just write it into the output file, it should be ok
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1111 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:33:59 +00:00
depristo
5289230eb8
Version 0.2.1 (released) of the TableRecalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
asivache
73caf5db15
This is, strictly speaking, NOT a GATK module. Standalone, picard-level executable except that it uses couple of gatk utils (GenomeLoc). Remaps alignments from cutom reference (such as transcritome, hyb-sel etc) onto the 'master' reference
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1107 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:04:18 +00:00
hanna
ad3a3aa350
First pass at passing lists of files / lists of interval arguments work. Note that the interval
...
ROD system will throw up its hands and not deal with intervals at all if multiple interval files
are passed in (see JIRA GSA-95).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1105 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:44:23 +00:00
aaron
0c3aabd1c5
logger output should be less verbose by default. Also fixed a printout in my read validation walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1102 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:47:29 +00:00
kcibul
11d83ac7d0
pushing up to test on unix box
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1101 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:00:48 +00:00
ebanks
0d9041380d
remove printouts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1100 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:54:14 +00:00
jmaguire
2c97c5e873
Compute a simple histogram of depth of coverage.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1098 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:30:11 +00:00
kcibul
3b24264c2b
incorporating skew check, further output of metrics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1094 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 16:01:07 +00:00
ebanks
940d75171a
Big cleaner changes:
...
1. Added a Walker to merge intervals before cleaning
2. (Almost) all Walkers can filter out 454 reads (and do by default)
3. Got rid of -all command and related pieces (time to switch to CleanedReadsInjector)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1090 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:31:24 +00:00
asivache
3cb6d7048e
don't freak out if two reference intervals a custom contig is built of are strictly adjacent; instead politely warn user that her data suck and proceed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1089 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 19:08:10 +00:00
asivache
d4f3ca1a10
A utility class for keeping the mapping from 'custom' reference (e.g. transcriptome) onto the 'master' reference (e.g. whole genome), and for remapping SAM records from the former onto the latter. It's Arachne's BaitMultiMap, pretty much
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1088 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 18:16:15 +00:00
kiran
69dc502174
I forgot that this depends on BoundedScoringSet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1087 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 17:18:53 +00:00
asivache
a9c30c5fcc
added -nosort cmdline flag; if specified, the output writer does not attempt to sort reads on the fly (sorting involves use of sorting collection backed up by temporary disk storage and can lead to crashes if temp size is low and/or filesystem is not behaving). Output can be later sorted externally by samtools
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1085 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:58:00 +00:00
kiran
3112302ec9
A priority-queue-like container that allows you to add a specified number of elements. When the limit has been reached, new additions replace the lower scoring elements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1083 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:39:47 +00:00
asivache
dfa2efbcf5
not crashing when refseq annotation track is not requested is a nice added feature
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1079 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 22:52:40 +00:00
kcibul
eb999f880a
incorporating skew check
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1078 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 19:51:51 +00:00
asivache
1339f3f3e3
make refseq annotation file an optional argument; if specified, indels will be annotated as genomic/utr/intron/coding (accidentally appearing 'unknowns' probably mean that there's something wrong with refseq annotations?)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1077 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 18:17:03 +00:00
aaron
9c0dba6979
Some quick documentation and typo changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1076 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 13:40:13 +00:00
ebanks
cb9c6f18ef
spelling fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1074 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 01:46:35 +00:00
kiran
630d9e6a37
Fixed a typo.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1073 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:37:46 +00:00
aaron
8b4d0412ca
Changed the duplicate traversal over to the new style of traversal and plumbed into the genome analysis engine. Also added a CountDuplicates walker, to validate the engine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1072 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:11:18 +00:00
ebanks
9e25229014
use better entropy threshold and don't print out "new" SNPs (since they're just an antrifact of the low (arbitrary) threshold
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1070 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 15:30:08 +00:00
aaron
bcb64d92e9
Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1069 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 14:39:41 +00:00
depristo
26eb362f52
Added novel / known split to variant eval. That is, emits all of the standard analyses on SNP partitioned into those known in the provided known db and those novel. Also fixed problem with counting bases within subsets
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1068 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 21:27:40 +00:00
ebanks
a21c2a7e48
don't make mapping quality too high
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1066 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 04:51:42 +00:00
ebanks
686c8133ed
massive change in the way the cleaner works, mostly revolving around the fact
...
that we no longer trust indels from the alignments (although we do use it as
a good alternate consensus possibility).
Other changes include better "greedy mode" performance and allowing the user
to have just the cleaned reads themselves be printed out (mostly for Matt's
CleanedReadInjector).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1065 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 03:56:59 +00:00
hanna
dde52e33eb
Cleanup of the cleaned read injector based on Eric's feedback.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1062 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 22:04:47 +00:00
kiran
a0a3cf2f9f
VariantFiltrationWalker can now apply specified exclusion tests after the feature tests. For a given variant, all reasons for exclusions are printed to screen.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1061 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 21:12:01 +00:00
jmaguire
58b132ee10
Eliminate redundant computation.
...
Still room for more optimization, but I called chr20 (60Mb) in a couple hours on the queue this morning.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1058 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 16:31:57 +00:00
jmaguire
3a1b58ca65
remove unused argument lodThreshold.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1057 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 12:40:12 +00:00
kiran
9a0151b7e1
Added an option to list all available feature classes and exit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1056 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 00:00:12 +00:00
kiran
ed7afd8b70
Added javadocs. Now throws an exception if an unknown feature is specified. General cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1055 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 23:28:38 +00:00
kiran
284fd6a5fb
VariantFiltrationWalker now inspects its parent package and determines the list of features that can be applied. Command-line specification of filters to run look at the simple names of these features and do a case-insensitive match to determine which features to apply. A new verbose mode allows the user to see how the likelihoods are changing with the application of each subsequent feature.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1054 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:45:36 +00:00
hanna
af7a759ba4
Convert the somatic coverage tool to output from the packaging tool rather than from the dist target.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1050 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:29:30 +00:00
depristo
1bca144119
Moving things around
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1049 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:06:46 +00:00
depristo
ca8a3bd85e
Another temp checking for rearranging things
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1048 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:04:36 +00:00
kiran
a4fa02f11c
Moved output outside of for loop so I don't have 10 different versions of the same variant (though, now that I think of it, that's not necessarily a terrible thing for debugging...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1045 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 19:59:26 +00:00
kiran
768a16e791
An experimental, tile-parallel version of the secondary base annotator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1044 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 19:58:09 +00:00
kiran
e26df45e8e
Different features can now be specified by repeatedly supplying the -F "featurename:arguments" option.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1043 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 18:45:03 +00:00
kiran
7a921c908c
Can now adjust the genotype likelihoods of a variant returned from the rod. This automatically causes the lodBtr, lodBtnb, and genotype to be recomputed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1041 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 07:26:37 +00:00
kiran
9a7cec7d2e
Directory to house variant calling and filtration tools.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1040 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 07:20:38 +00:00
jmaguire
5992d88409
skip N's in the reference (rather than crash. doh!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1039 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 23:22:35 +00:00
kiran
9ef391706c
Added outputting of genotype posteriors to geli.calls file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1035 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 21:31:46 +00:00
kcibul
615572ea06
output to out... not System.out...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1034 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 20:43:10 +00:00
kcibul
673205ed5f
additional output tweaking
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1028 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 15:37:38 +00:00
depristo
7d281296a7
Finishing checking for building
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1027 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 14:12:40 +00:00
depristo
d1e25bfe88
Intermediate checkin for safety -- now compiles
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1026 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 13:16:55 +00:00
depristo
2250769a42
Intermediate checkin for safety -- do not use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1025 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 13:07:19 +00:00
depristo
86c8c08375
Intermediate checkin for safety -- do not use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1024 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 13:06:24 +00:00
aaron
6ee64c7e43
added changes to support alec toUnmappedRead seek. Huge improvements (orders of magnitude) in unmapped read performance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1021 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 22:15:56 +00:00
jmaguire
4f6d26849f
Behold MultiSampleCaller!
...
Complete re-write of PoolCaller algorithm, now basically beta quality code.
Improvements over PoolCaller include:
- more correct strand test
- fractional counts from genotypes (which means no individual lod threshold needed)
- signifigantly cleaner code; first beta-quality code I've written since BaitDesigner so long ago.
- faster, less likely to crash!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1020 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 20:03:24 +00:00
aaron
b11c5a7cd5
doing some read validation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1018 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 19:25:43 +00:00
asivache
010304fe44
bug: printing incorrect coordinates into output, finally fixed (?)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1017 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 18:08:56 +00:00
asivache
2259dc3a8f
added filtering out indels with large levels of noise (mismatches) remaining in the close proximity; also a bug in recording deletion coordinates is fixed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1014 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 21:13:28 +00:00
ebanks
a6477df6d1
Now optionally outputs whether "SNPs" are maintained/cleaned out/introduced by cleaning
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1013 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 20:02:02 +00:00
ebanks
8f4bc8cb6e
Move filtering functionality into the PrintReadsWalker. More to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1010 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 16:38:08 +00:00
kiran
161c74716c
Forgot to change some direct references to variables in SSG. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1009 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 14:16:18 +00:00
kiran
9eeb5f79d4
Various refactoring to achieve hapmap and dbsnp awareness, the ability to set pop-gen and secondary base priors from the command-line, and general code cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1008 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 07:21:08 +00:00
kiran
f2946fa3e8
Various refactoring to achieve hapmap and dbsnp awareness, the ability to set pop-gen and secondary base priors from the command-line, and general code cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1007 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 07:20:22 +00:00
ebanks
f6af190b74
ignore clipped reads for realigning indel positions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1006 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 01:01:27 +00:00
asivache
811f560efb
add refseq annotations to single sample calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1003 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 19:43:30 +00:00
asivache
ca09a10b76
refseq annotation rod is now manually bound to tell coding indels from non-coding ones
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1001 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 19:27:37 +00:00
hanna
5859948e80
Fixed bugs in CleanedReadInjector arising from integration testing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@999 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 17:37:33 +00:00
depristo
fb7ba47fff
Now does really neightbor distance calculation, as well as true snp cluster counting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@998 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 16:29:26 +00:00
jmaguire
dbf2cc037c
don't have a null-pointer hissy fit when the reference is N.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@997 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 13:59:16 +00:00
asivache
4eda040e0f
what used to be internal cutoff values are now exposed as cmdline parameters: minCoverage, minNormalCoverage, minFraction, minConsensusFraction
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@995 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:22:52 +00:00
kiran
41687d5237
Added accessors for the prior probabilities.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@994 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:16:10 +00:00
kiran
12dd18cdba
Now aware of Hapmap and dbSNP sites. We *can* change the priors there, but we don't yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@993 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:15:34 +00:00
asivache
d5cd883b99
bug fixed when a read with alignment end exactly at the window boundary and with last cigar element being an indel would cause index-out-of-bounds exception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@992 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:03:15 +00:00
kiran
a12009e9e7
Added a new constructor in which priors for hom-ref, het, and hom-var can be specified. Otherwise, it uses the default values of 0.999, 1e-3, and 1e-5 respectively.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@991 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:33:45 +00:00
kiran
909fefa40a
Argumentized priors for hom-ref, het, and hom-var.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@990 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:32:44 +00:00
hanna
71e3825fa1
First pass of a walker for Eric that searches through an input BAM file for unclean reads, injecting the cleaned reads in their place and outputting the composite result.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@989 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:18:13 +00:00
ebanks
ffffe3b2f6
-Support for 1KG SNP calls in RODs
...
-Minor bug fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@987 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:56:37 +00:00
ebanks
599ceeddd8
Better method for downsampling deep regions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@983 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 16:57:40 +00:00
ebanks
4d9a88153a
Update inferred insert size of cleaned reads when they are paired
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@982 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 16:29:13 +00:00
ebanks
3796654069
Added walker to emit intervals of clustered SNP calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@981 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 00:57:14 +00:00
aaron
94b0e46d12
checked in a sample xml file used to store the defaults for the SomaticCoverage tool, and added it to the SomaticCoverage.jar in build.sml. Also added a inputStream marshalling method to the GATKArgumentCollection.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@979 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:46:16 +00:00
asivache
8d25f1a105
should be a little faster
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@978 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:33:45 +00:00
aaron
026f68fb41
a couple of quick name changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@976 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:02:52 +00:00
ebanks
b1f90635c1
1. downsample when there are too many mismatching reads (needs perfecting)
...
2. allow user to specify that no reads be emitted
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@974 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 19:55:42 +00:00
asivache
39dcd4f11f
an attempt to bail out when unmapped reads are reached at the end of the file(s). still testing...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@973 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 19:53:50 +00:00
asivache
030efc468f
added naive ad-hoc cutoff for the pile size the cleaner will attempt to process; use --maxPileSize argument to force any pile larger than specified cutoff to be directly written to the output without cleaning
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@972 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:52:35 +00:00
ebanks
f9be175f44
Be smart about trying alternate consenses:
...
try prior indels first and only 1 instance of them
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@971 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:43:22 +00:00
aaron
f304803811
initial check-in of an easy way to create command line tools based on the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@970 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:34:02 +00:00
depristo
9ebcd6546d
Convenience printing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@968 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:07:38 +00:00
asivache
06e5a765f8
now has two modes: one sample - just call indel sites; two samples - call somatic-looking variants only. Still uses heuristic count-based cutoffs, cutoffs are hardcoded and are pretty conservative...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@967 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 16:41:38 +00:00
ebanks
5451bbfd5a
-move final vars to command-line args
...
-Per Andrey: ignore indels from aligner when testing against alt consensus
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@966 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 16:39:00 +00:00
kiran
6bb7f7e9d8
Commented some stuff out so that things compile.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@963 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:06:33 +00:00
kiran
87ba8b3451
Removed some useless code. Don't apply second-base test if the coverage is too high, since the binomial probs explode and return NaN or Infinite values.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@961 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:27:06 +00:00
kiran
a12ed404ce
Changed method name from applyFourBaseDistributionPrior to applySecondBaseDistributionPrior. 'Cause that's how I roll.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@960 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:21:22 +00:00
hanna
e77dfe9983
Allow script to be easily modified to support different platforms.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@955 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 16:06:57 +00:00
depristo
7fa84ea157
10x speedup of recalibration walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@954 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 15:39:40 +00:00
ebanks
b45b1d5f2b
border case bug fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@951 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 04:33:15 +00:00
asivache
13eb868536
helper class. array-like random access and fast shift. good for sliding windows (e.g. keeping coverage over last 100 bases while sliding along the reference)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@942 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:11:57 +00:00
asivache
3d6e738a60
still under development. does not genotype yet, but walks and talks (counts overal coverage and indel variant occurences at every reference position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@941 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:10:31 +00:00
ebanks
58f7ae8628
better filtering, plus deal with case where user doesn't input maxlength
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@939 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 18:44:29 +00:00
asivache
b4ef16ced2
extractIndels() now should deal correctly with soft- and hard-clipped bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@936 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 16:04:49 +00:00
hanna
e2ed56dc96
Add a MAX_READ_GROUPS sanity parameter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@934 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 13:57:43 +00:00
asivache
9f35a5aa32
Insidious bug: clipped sequences (S cigar elements) where a) processed incorrectly; b) sometimes caused IntervalCleaner to crash, if such sequence occured at the boundary of the interval. The following inconsistency occurs: LocusWindow traversal instantiates interval reference stretch up to rightmost read.getAlignmentEnd(), but this does not include clipped bases; then IntervalCleaner takes all read bases (as a string) and does not check if some of them were clipped. Inside the interval this would cause counting mismatches on clipped bases, at the boundary of the interval the clipped bases would stick outside the passed reference stretch and index-out-of-bound exception would be thrown. THIS IS A PARTIAL, TEMPORARY FIX of the problem: mismatchQualitySum() is fixed, in that it does not count mismatches on clipped bases anymore; however, we do not attempt yet to realign only meaningful, unclipped part of the read; instead all reads that have clipped bases are assigned to the original reference and we do not attempt to realign them at all (we'd need to be careful to preserve the cigar if we wanted to do this)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@933 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 05:20:29 +00:00
ebanks
3a8219a469
use knowledge from other reads to find a consensus
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@932 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 21:22:17 +00:00
hanna
596773e6c6
Cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@931 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 20:25:08 +00:00
depristo
98396732ba
Bug fixes for Andrey
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@930 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 18:19:51 +00:00
asivache
b48508a226
indelRealignment() signature changed. The only difference about consensus sequences is that they are passed along with alignment cigars that start inside the sequence, while for 'conventional' reads cigar always starts at position 0 on the read. Logically, indelRealignment() should not know what 'consensus' is. Instead, now it receives an additional int parameter, start of the cigar on the 'read' sequence
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@929 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 17:42:19 +00:00
asivache
9eb38c0222
mostly synchronizing with the main branch. Based on anecdotal evidence (too few examples in the data), realignment (shifting indel left across a repeat) works correctly on non-homonucleotide repeats
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@928 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 16:39:16 +00:00
ebanks
c6634e3121
cleaned up some code and minor bug fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@927 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 03:14:21 +00:00
asivache
99c105790b
Now indelRealignment should be correct... The old version could only condense to the left homo-nucleotide indels. New version should be able to detect and shift left arbitrary repeated sequence (e.g. deletion of ATA after ATAATAATA will be shifted left to the first occurence of ATA on the ref! NOT THOROUGHLY TESTED YET, will test tonight../somaticIndels.pl --dir . --cutoff 100 -filter EXON --mode SOMATIC --condense 5 --format bed > 0883.indel.somatic.exon.100.bed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@926 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 23:54:07 +00:00
hanna
40ac3b7816
Inject read group into covars_out file's toString output. Continue fixing systematic bug in the code where flattenData is not joined to the read group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@924 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 20:43:28 +00:00
asivache
0bb4565798
added AlignmentUtils.getNumAlignmentBlocks(read) - a faster alternative to read.getAlignmentBlocks().size(); IntervalCleaner updated accordingly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@923 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 19:35:21 +00:00
asivache
92b054b71b
moved another variant of numMismatches to AlignmentUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@922 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 18:07:48 +00:00
asivache
7018dd1469
moved another variant of numMismatches to AlignmentUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@921 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 18:05:29 +00:00
hanna
ac5b7dd453
Fixed order-of-operations bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@919 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 03:22:56 +00:00
depristo
819862e04e
major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 23:34:37 +00:00
asivache
400399f1b8
fixed (?) a bug in insertion realignment
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@917 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 22:04:37 +00:00
hanna
34bb43a6c8
Saw that one of the offsets needed to be changed from - 1 to -2 and changed the wrong damn offset. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@915 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 19:18:34 +00:00
ebanks
4623a34ad3
Fix bug in realigning insertion cigar strings
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@914 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 18:46:41 +00:00
ebanks
092a754071
Make sure indel position from SW alignment is leftmost possible
...
(and improve printouts)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@912 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:36:10 +00:00
ebanks
36fb6ca3c5
Allow user to specify the compression to be used when writing out BAM files.
...
Updated most of the walkers to reflect this change.
Now it won't take forever to write BAMs!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@909 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 08:48:34 +00:00
ebanks
c1792de44f
First pass at fixing the incorrect border-case behavior of the cleaner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@908 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 07:55:06 +00:00
hanna
9da04fd9ac
Cleaned up error warning in case no PL groups are present.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@907 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 03:14:17 +00:00
hanna
fdfc3abf80
Better handling for case where PL attribute is missing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@905 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 02:52:30 +00:00
hanna
9689bb3331
Very early draft of script integrating the covariant counting / logistic regression. Deleted some unused code and spurious debug info.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@902 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:52:11 +00:00
ebanks
4d880477d6
Deal with ends of contigs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@900 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 20:09:53 +00:00
hanna
40bc4ae39a
The building blocks for segmenting covariate counting data by read group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@899 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 19:55:24 +00:00
depristo
b492192838
Pairwise SNP distance metrics now enabled
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@892 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 00:11:29 +00:00
hanna
8672ae6019
Now seeing results from the training data. There are still some critical problems in the quality of the output, but we're at least getting training output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@891 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 20:41:07 +00:00
ebanks
4e41646c88
print out stats for Andrey
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@890 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 17:45:35 +00:00
andrewk
dfe464cd81
Updated CovariateCounterWalker to be read group aware
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@889 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 10:06:06 +00:00
aaron
107b5d73b5
The flagStatReadWalker generates the exact same statistical output as the samtools flagstat command, so the two outputs can be diff'ed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@883 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 21:23:56 +00:00
kcibul
a1218ef508
changed default value for failure output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@880 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 19:32:29 +00:00
depristo
7e7c83ddca
fixing insidious bugs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@879 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 18:33:45 +00:00
kcibul
ad5b057140
parameterized a bit more
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@877 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 17:58:26 +00:00
andrewk
587d07da00
Merged functionality of two python scripts into LogRegression.py, some clarity updates to covariate and regression java files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@876 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 16:55:05 +00:00
kcibul
c4cb867d74
basic clustering of reads to reduce artifacts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@873 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 02:54:21 +00:00
jmaguire
417f5b145e
Strand test and misc touch-ups
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@871 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 17:13:21 +00:00
depristo
f19d7abba9
Added geli compatibility mode to SingleSampleGenotyper, to enable easy linking to the geli2popsnps.py script
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@866 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 14:32:12 +00:00
kcibul
4d6398cef9
a lot of people have been asking me for the equivalent of the old "PrintCoverage" command from Arachne. Even though I show them the pileup, and they agree that's more accurate/complete, they don't want to modify their scripts and/or write a translator. It was simple enough to write, so here it is.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@863 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-31 01:45:23 +00:00
asivache
c8347c3c94
set proper package name (...walkers.indels), remove couple of unused import statements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@861 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 22:02:14 +00:00
asivache
c549c34caa
still in development and testing; kinda works
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@860 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 21:59:03 +00:00
depristo
e0803eabd9
enabled underlying filtering of zero mapping quality reads, vastly improves system performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@853 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 14:51:08 +00:00
hanna
5e8c08ee63
Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
ebanks
19f9ac2b05
Realign existing indels (from the aligner) to leftmost position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@848 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 04:56:51 +00:00
depristo
ce6a0f522b
First incarnation of the population-based SNP analysis tool. Also bug fixes throughout the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@845 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 22:02:24 +00:00
ebanks
e533c64b8f
Walker to pull out the reference for given intervals and emit them in fasta format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@843 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:39:09 +00:00
asivache
c3678c7bb9
moved from playground to gatk
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@837 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:18:08 +00:00
asivache
5b310e48f5
changed to use factored out Transcript class; some docs added (not much)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@836 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:17:23 +00:00
ebanks
9bd6489f8e
Output indels in the format appropriate for low-coverage indel submission
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@832 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 17:32:15 +00:00
ebanks
919e995b7f
-Moved my walkers to indels directory
...
-Removed entropy walker and replaced it with mismatch (column) walker
-Some improvements to the cleaner (more to come)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@830 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 16:34:24 +00:00
asivache
b81135c606
bug fixed; this rod seems to work now...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@826 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 22:25:34 +00:00
asivache
ab7bb5800a
forgot to remove debug print statement
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@823 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:38:27 +00:00
asivache
568a0d3c27
exon coordinates are now parsed correctly (?). IF DELIMITER IS THE LAST CHARACTER IN A STRING, String.split() DOES NOT return empty field as the last one; instead, the last field returned will be the one immediately before such delimiter! Wicked.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@822 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:36:50 +00:00
asivache
f4119c17de
still working on it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@821 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:07:38 +00:00
asivache
23b7a28015
simple walker that works off pre-computed tumor/normal genotyping calls (e.g. samtools pileup). Collects overal stats and also writes somatic variants into IGV-compatible bed file if asked to. NOT finished. NOT tested
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@819 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:05:47 +00:00
asivache
8f1cabd33d
cmd line args changed - again; internally uses VariantType enum
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@818 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:03:58 +00:00
asivache
4edcdffe45
refseq annotation track: should be able to provide (multiple) transcript annotations available over a given genomic position. NOT finished and NOT tested!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@815 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:07:15 +00:00
andrewk
149cc9989b
spaces!!!!!!!!!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@814 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 19:40:25 +00:00
ebanks
c2df35b7fe
- get leftmost position of indel correct
...
- don't try to clean reads with mapping quality of 0
- un-deprecate
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@813 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 17:24:58 +00:00
asivache
02fc4f145f
refactoring: a couple of general purpose (hopefully useful?) methods/classes extracted into a standalone utils class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@802 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 18:54:40 +00:00
asivache
4b718688d5
no changes, really, just synchronizing (instead of reversing) to increase the amount of entropy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@801 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:27:28 +00:00
asivache
893f1b6427
updated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@800 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:25:50 +00:00
asivache
a9dfbfb309
internal changes and some refactoring. slightly different final report. Now can take tracks that implement either Genotype or GenotypeList; takes an arg specifying what variants to look for (POINT - aka snp - or INDEL); takes an arg specifying whether default ref/ref call of one type (INDEL/POINT) should be implicitly assumed if another call (POINT/INDEL respectively) was made at the same position [this is probably most useful for indels and only (?) for sam pileups: if we have only point mutation call at a given position, it does mean that we do have coverage, and that there was no evidence whatsoever for an indel, so we have an implicit 'no-indel' call]
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@799 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:25:09 +00:00
depristo
d9fc84f1e3
actually checking in the first pass
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@795 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:13:27 +00:00
depristo
7a979859a9
Intermediate checking for evaluation -- now supports transition / transversion evaluation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@793 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:05:06 +00:00
jmaguire
9902ce8073
properly flush the gzip output stream. this was a subtle inheritance bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@791 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 13:57:58 +00:00
asivache
63caca31bf
minor update in report printout format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@790 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 13:56:09 +00:00
asivache
7afc10fd6f
updated, reports more stuff now, including stats for external consistency checks
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@789 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:28:18 +00:00
depristo
30c63daf89
More improvements to the duplicate quality combiner, making progress towards a clean system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@788 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:26:57 +00:00
jmaguire
b5ad5176f7
stick headers on the output tables
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@782 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:50 +00:00
ebanks
0d58e4ccc9
-check original alignments for indels when computing mismatch score
...
-move logging to debug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@778 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:55:42 +00:00
ebanks
e6200fe5b5
don't ignore reads when maxReadLength isn't set
...
also, print out LOD score for cleaning
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@771 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:24:10 +00:00
andrewk
0219d33e10
QualityUtils: added reverse function to reverse an array of bytes (and not complement it), BaseUtils: split qualToProb into itself and qualToErrProb, CovariateCounterWalker and LogisticRecalibrationWalker: several changes including a properly acocunting (only partly complete) for reversing AND complementing bases that are negative strand, PrintReadsWalker: created option to output reads to a BAM file rather than just to the sceern (useful for creating a downsampled BAM file)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@770 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 18:30:45 +00:00
asivache
7e77c62b49
auxiliary class, a simple struct to keep together info like numbers of covered, assessed, ref/variant bases across the sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@769 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 16:30:16 +00:00
ebanks
34f9820299
update mapping quality score and edit distance attribute for reads when they are cleaned
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@763 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 17:51:31 +00:00
hanna
01a3cb27c7
@Required / @Allows flags for main arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@751 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-19 23:26:17 +00:00
jmaguire
3441795d9c
better handling of edge cases (zero coverage, reference mistakes, etc.)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@747 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 18:04:37 +00:00
asivache
a39c8839c8
print percentage sign!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@745 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 14:38:20 +00:00
jmaguire
94e324b844
Write N for the alt allele when we're hom-ref.
...
Stop EM loop when we've converged (likelihood[t-1] == likelihood[t]).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@737 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 13:58:11 +00:00
kcibul
bd53bc18f9
added new required annotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@736 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 12:24:06 +00:00
ebanks
81fac73c01
LOD checks for normal and brute force versions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@732 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 02:56:03 +00:00
jmaguire
527df6e57b
Massive speed-up, clean-up and tabular output.
...
This program is going to rule.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@731 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-16 16:52:40 +00:00
jmaguire
3b57a35009
don't be tricked by multiple read groups with the same sample id!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@730 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-16 15:28:55 +00:00
jmaguire
947bac5cdc
vast speedup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@729 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-16 15:27:58 +00:00
ebanks
f33f3c0434
added LOD threshold for determining when to clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@725 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:23:59 +00:00
kcibul
d1f3000afa
bed-style output for IGV
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@721 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 17:58:44 +00:00
jmaguire
641afc4e76
fix a crash in the event that the input file has no read groups!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@714 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 19:27:41 +00:00
ebanks
7a1f85ff86
option to print out the indels found by the cleaner to a file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@709 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 17:50:08 +00:00
ebanks
5dda448ae0
1. Add printouts for the cleaner
...
2. First pass at the entropy interval walker (still needs work)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@696 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 13:59:48 +00:00
asivache
7b59f63f12
and don't forget to close sam writer after we are done...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@692 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 20:46:36 +00:00
asivache
de0cce87ea
new optional arg added that allows to specify a separate bam file to send all piles that fail to realign to; plus minor fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@691 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 20:24:23 +00:00
jmaguire
7084ecdeb6
a few changes; checked in to allow debugging.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@688 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 15:50:48 +00:00
kiran
4e4767e5de
Moved to org.broadinstitute.sting.secondarybase
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@682 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:26:43 +00:00
kiran
219eb60716
Added newly-required documentation to arguments so that build can complete successfully.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@681 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:26:10 +00:00
kiran
688358190c
Moved secondary base stuff out of playground for the purpose of making it a core utility. Modified package names and imports such that things would build properly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@680 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:24:18 +00:00
kcibul
8079acb1d3
basic step0 implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@679 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:49:39 +00:00
kiran
57ecb7fbf1
Nicer reporting functions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@678 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:48:30 +00:00
hanna
ee99320c83
Removed at Mark's request.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@677 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:48:21 +00:00
kiran
f1de3d6366
Minor tweaks to how probs are supplied.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@676 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:47:41 +00:00
kiran
095dacd154
Experimental refactoring.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@675 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:46:50 +00:00
kiran
758f8aa89b
Experimental refactoring.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@674 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:46:34 +00:00
andrewk
1518f8f9bf
Update training data creation in CovariateCounterWalker to output much smaller files by counting the number of occurences of each data point combination rather than outputting a line for each data point (i.e. each base). Also fixed bug in LogisticRecalibrationWalker where a null SAMHeader was being pulled from a function that is now marked deprecated.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@673 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:23:14 +00:00
ebanks
4c12df372c
Dumb, dumb bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@672 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:21:33 +00:00
ebanks
630066cc0a
1. Merge LocusWindows whose reads overlap.
...
2. Fix bug (we weren't clearing the "to emit" list)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@670 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 17:33:23 +00:00
jmaguire
c4d89997ca
put in a dummy sample_name so it'll compile
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@668 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:12:42 +00:00
jmaguire
c8d7223789
do pooled calling properly for 1kg
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@667 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:12:13 +00:00
jmaguire
313a6d0fb5
lots of changes to facilitate calling indels and 1kG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@666 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:11:42 +00:00
jmaguire
add7b6cf65
add sample_name to constructor, misc bug fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@665 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:10:17 +00:00
jmaguire
0267ccae7f
add code for computing indel genotype likelihoods
...
make reference lods negative
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@664 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:09:29 +00:00
hanna
ee9077fc69
LocusIterator iterated through LocusContexts, which was fine until now when we need something
...
that iterates through loci (GenomeLocs). Rename LocusIterator to LocusContextIterator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@662 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:54:57 +00:00
hanna
0bca588629
Botched some boolean logic.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@658 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:53:52 +00:00
hanna
23e9e29964
Changed reads traversals from providing a LocusContext from which the reference sequence
...
could be extracted to a char[] containing the reference bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@657 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:45:11 +00:00
hanna
052819bed5
Switched dependencies of GenomeAnalysisTK to depend on GenomeAnalysisEngine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@656 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:33:00 +00:00
ebanks
009e71fcd9
We need to sort cleaned reads ourselves (instead of letting SAMFileWriter
...
do it) because the SAM headers are often screwed up and claim to be
"unsorted". While here, I broke off the module from the SortSamIterator
in case someone else wants to use it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@654 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 15:43:42 +00:00
ebanks
3aabc144c6
Added functionality to allow for a contract between LocusWindowTraversalEngine and LocusWindowWalker which allows the Walker to act upon reads outside of the provided intervals.
...
(Really, all we want to do is spit out all reads, but this allows the Walker to do other things with the reads if it wants)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@641 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 17:28:16 +00:00
hanna
226edbdef6
Hypen-style xml output. Much sexier.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@635 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 01:04:40 +00:00
aaron
21536df308
Change the sample XML marshalling code over to simple XML, and take out the castor lines in the ivy.xml
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@633 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 00:08:25 +00:00
depristo
5a6892900e
fixing oddities in duplicates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@628 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:55:45 +00:00
depristo
4a26f35caa
new default syntax
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@627 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:16:53 +00:00
ebanks
283a4d1b54
Fix some special-case cleaner issues.
...
We now do the same as brute force in all examples to date.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@626 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:16:35 +00:00
depristo
2204be43eb
System for traversing duplicate reads, along with a walker to compute quality scores among duplicates and a smarter method to combine quality scores across duplicates -- v1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@624 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:06:02 +00:00
hanna
752928df94
Switch to better mechanism for supplying a default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@615 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 01:22:01 +00:00
asivache
072808858e
added COUNT_CUTOFF arg: it is nor possible to tell the code to try to realign all read piles over trains of nearby indels with at least one indel observed in COUNT_CUTOFF or more different alignments (set the arg to 1 to realign around all indels); also, some diagnostic printouts added to the output (time spent on loading the reference, time spent on scrolling through the input bam file, counts of discarded reads)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@611 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:59:33 +00:00
ebanks
5be75e0ae6
First version of indel cleaner walker that works on intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@607 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 20:20:48 +00:00
hanna
521aa40baa
Bring new command-line argument parsing system live.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@603 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:16:11 +00:00
hanna
b0cdba8bb3
Acting on Kiran's suggestion to make the doc tag in the @Argument annotation required.x
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@598 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:43:40 +00:00
depristo
8925df2e1e
More information from the duplicate combiner quality metrics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@590 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 21:51:01 +00:00
kcibul
2b6466ea00
coverage calculator based on Gabor's Pilot 3 Coverage Metrics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@589 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 14:18:16 +00:00
kiran
df88c4d6b0
Added some code to determine the on-genotype and off-genotype secondary base distributions (which, at the moment, is commented out).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@582 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:48:19 +00:00
kiran
e7534b292f
Optionally applies secondary base distribution priors to normal single-sample genotyper posteriors.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@581 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:36:32 +00:00
kiran
58c80d8d87
For on and off-genotype primary bases, optionally compute the concordance of the secondary bases to their expected distributions. Each genotype has slightly different profiles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@580 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:33:48 +00:00
depristo
84dae06d5a
Initial version of ByDuplicates traversal, as well as a duplicate quality score estimator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@576 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:16:21 +00:00
andrewk
b630f2f2f1
More tables output by CovariateCounterWalker AND made CovariateCounterWalker and LogisticRecalibration aware of positive and negative strandedness of data which changes the regression output significantly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@568 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 01:22:50 +00:00
kiran
0a707a887b
Added ability to evaluate best + random base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@564 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 20:05:36 +00:00
kcibul
334f158e5a
added parameters for mapping quality and duplicate filters
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@563 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 18:05:34 +00:00
ebanks
7de5da7065
Start getting the cleaner working in Walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@561 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 14:59:53 +00:00
kcibul
f557da0a78
Calculate interval-based statistics for Hybrid Selection
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@558 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 04:01:24 +00:00
andrewk
58b2578c44
Several changes to CovariateCounter walker to print more tables (called vs. observed Q scores), bug fixes to LogisticRecalibrationWalker and LogisticRegressor, and print string functionality added to Pair.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@550 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 00:37:48 +00:00
ebanks
a0a581171b
print out the last interval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@549 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 20:43:06 +00:00
aaron
a343f3eab7
Fixed bug where we weren't setting the reads group correctly. Also added code to set the printMetrics field of the singleSampleGenotyper from the Pool caller, it was null excepting out for me without that set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@548 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:17:20 +00:00
kiran
1daf8e0987
A utility to compare the results of the SingleSampleGenotyper in 1-base and 4-base mode.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@547 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:10:08 +00:00
kiran
444bc18183
Removed binomialProb() method. Set better values for qHom, qHet, and qHomNonRef and allowed those to be set from the command-line.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@546 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:09:02 +00:00
ebanks
0c76a70313
Renamed traversal by "interval" to "locusWindow"
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@537 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 02:26:08 +00:00
depristo
40a2b3eeb3
Basic logistic regression support for calibrating qualities; mostly for Andrew to experiment with
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@529 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:09:50 +00:00
andrewk
061f4328b1
Covariate counter now outputs files used by R to do logistic regression.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@527 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 17:11:57 +00:00
jmaguire
4e4fd33584
First draft of actual pooled EM caller.
...
Produces sane looking output on region of 1kG pilot1:
CALL NA12813.SRP000031.2009_02.bam CC 0.609084 0.609084
CALL NA12003.SRP000031.2009_02.bam CC 2.114234 2.114234 CCCCC
CALL NA06994.SRP000031.2009_02.bam CC 0.910114 0.910114 C
CALL NA18940.SRP000031.2009_02.bam CT 2.589749 0.910114 T
CALL NA18555.SRP000031.2009_02.bam CC 0.609084 0.609084
Next up, eval vs. Baseline pilot1 calls and pilot3 deep-coverage truth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@526 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 13:43:41 +00:00
jmaguire
dd408a2a9a
First draft of actual pooled EM caller.
...
Produces sane looking output on region of 1kG pilot1:
CALL NA12813.SRP000031.2009_02.bam CC 0.609084 0.609084
CALL NA12003.SRP000031.2009_02.bam CC 2.114234 2.114234 CCCCC
CALL NA06994.SRP000031.2009_02.bam CC 0.910114 0.910114 C
CALL NA18940.SRP000031.2009_02.bam CT 2.589749 0.910114 T
CALL NA18555.SRP000031.2009_02.bam CC 0.609084 0.609084
Next up, eval vs. Baseline pilot1 calls and pilot3 deep-coverage truth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@525 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 13:42:15 +00:00
ebanks
13d4692d2e
1. Added a by-interval traversal.
...
2. Added a shell for the indel cleaner walker (it's currently being used to test the interval traversal).
3. Fixed small bug in downsampling (make sure to downsample the offsets too)
4. GenomeAnalysisTK.execute => anyone object to my change to "instanceof" instead of trying to catch a ClassCastException (yuck)?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@524 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 04:33:35 +00:00
kiran
1984bb2d13
Made num_loci_total public because I'm lazy. I'll change it back later.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@523 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:57:23 +00:00
kiran
7ce11e152b
Simplified. Added option to perform four-base retest of a putative variant.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@522 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:56:15 +00:00
aaron
3dc2afd7ab
Added the ability to get a merged header in a LociByReference traversal
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@514 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:34:52 +00:00
andrewk
32715a6c47
First check-in of walker that produces tables showing covariation of read cycle, and dinucleotide with quality score in a format usable for R analysis and for doing logistic regression.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@510 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 18:58:25 +00:00
ebanks
cae54ec52d
Walker for creating intervals to be used in the indel cleaner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@508 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:58:19 +00:00
kiran
96db1477d4
I meant for default lod threshold to be 5.0, not 0.0.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@507 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:46:08 +00:00
kiran
11e85f1969
Four-base mode now estimates the genotype using the one-base method and retests the site if the one-base method suggests the site is a het.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@503 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:23:24 +00:00
kiran
bd719f9c06
When checking that values are not infinite, also prints out the position so that I know which site was giving the error and I can just go there and debug it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@502 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:21:58 +00:00
kiran
efba30f1a1
Added a constructor in which the lod threshold can be set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@501 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:20:48 +00:00
jmaguire
8c1905c7d9
Simple walker to print all of the sample names present in a merged bam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@500 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 12:26:56 +00:00
kiran
a3a1c9dae8
Suppressed emission of duplicate paths through a four-base pileup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@498 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 21:08:45 +00:00
jmaguire
6cef8bd76c
added k-best quality path enumeration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@497 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 20:26:51 +00:00
ebanks
d99d67d51c
Refactored to clean it up a bit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@495 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 19:18:46 +00:00
kiran
ffcd672c1c
Intermediate commit while working on getting four-base probs to work in the single sample genotyper. Has infrastructure for the new combinatorial approach and just choosing the best base more intelligently given a probability distribution over bases and the reference base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@492 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 18:06:50 +00:00
asivache
5f37ba8f26
now can be asked to log at INFO level all concordant or discordant sites, or both
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@480 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:03:44 +00:00
asivache
1f84b9647d
auxiliary data structure for mendelian concordance reporting; it's nice to have the latest version checked in in order for the code to compile...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@479 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:02:40 +00:00
asivache
ece3e9969e
one trivial walker to filter reads; bam in -> filter -> bam out
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@478 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 20:39:29 +00:00
asivache
61e855200d
latest version...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@477 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 20:38:37 +00:00
kcibul
64b2fd866f
* extracted core quality-score based genotype likelihood code
...
* precompute expensive operations (log/pow) based on Picard experience
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@476 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 18:58:43 +00:00
jmaguire
11c520b283
completed my old draft of the old school single sample genotype walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@475 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 05:38:04 +00:00
depristo
b8233d92c8
Simple IO walker to test / crush file systems and evalute I/O performance in general
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@474 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-20 14:07:14 +00:00
jmaguire
bf76eab955
whoops; fix a comment line.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@473 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-19 17:54:54 +00:00
jmaguire
bcba1ff424
Fix a minor rounding bug and putz around with fractional counts in the pooled caller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@472 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-19 17:52:24 +00:00
jmaguire
af6788fa3d
Misc:
...
1. Added logGamma function to utils
2. Required asserts to be enabled in the allele caller (run with java -ea)
3. put checks and asserts of NaN and Infinity in AlleleFrequencyEstimate
4. Added option FRACTIONAL_COUNTS to the pooled caller (not working right yet)
AlleleFrequencyWalker:
5. Made FORCE_1BASE_PROBS not static in AlleleFrequencyWalker (an argument should never be static! Jeez.)
6. changed quality_precision to be 1e-4 (Q40)
7. don't adjust by quality_precision unless the qual is actually zero.
8. added more asserts for NaN and Infinity
9. put in a correction for zero probs in P_D_q
10. changed pG to be hardy-weinberg in the presence of an allele frequency prior (duh)
11. rewrote binomialProb() to not overflow on deep coverage
12. rewrote nchoosek() to behave right on deep coverage
13. put in some binomailProb() tests in the main() routine (they come out right when compared with R)
Hunt for loci where 4bp should change things:
14. added FindNonrandomSecondBestBasePiles walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@471 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-19 15:35:07 +00:00
ebanks
758db73b98
Fixed SLOWNESS issue.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@469 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 20:10:34 +00:00
asivache
2a937fa8d3
set SAM file header's sorting order to unsorted, hopefully it will help to speed things up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@468 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 19:32:24 +00:00
asivache
03ec3452f2
a first, simplest version of a walker that filters out reads based on user-specified criteria and writes remaining reads into a new bam file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@467 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 18:51:39 +00:00
asivache
55537c0d1e
chnage class name, now it compiles...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@451 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 16:51:00 +00:00
asivache
4f9bc7206f
some cleanup, also ensuring that all reads get written into output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@450 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 16:49:25 +00:00
asivache
e8a6cdb386
renamed standalone main
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@449 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:56:46 +00:00
asivache
832afd3d60
renamed standalone main
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@448 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:56:27 +00:00
asivache
85308f4ddc
resurrected indel tool's standalone main
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@447 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:55:52 +00:00
kcibul
6f56938d42
* added a bit more debugging output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@446 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:20:26 +00:00
asivache
240eb18564
fix a few related issues when not all the reads were written into the output files. now cleaned output still contains all reads either with modified alignments or untouched
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@444 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 03:56:47 +00:00
kcibul
7e05b43f40
* added some error checking for read groups
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@442 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 03:22:49 +00:00
kcibul
3fda8613c3
* minor formatting changes
...
* support for "extended" output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@428 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 15:11:05 +00:00
kiran
7949e377e4
Intermediate commit. Refactored some simple base manipulation stuff into BaseUtils.java. Generalized some likelihood computation logic to make future possible EM-ing easier.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@424 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 04:18:07 +00:00
kiran
d0b8d311e6
Can now optionally print the read and the alignment region of the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@423 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 04:10:30 +00:00
kcibul
d4aaa1bef4
* fixed (with Matt's help) the argument parsing
...
* outputting UCSC wiggle format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@422 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 02:17:39 +00:00
depristo
24722a442e
Slight code cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@421 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:21:36 +00:00
asivache
baae98c6d5
and don't allocate new 200M string every time please, just pass byte array!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@417 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 21:55:33 +00:00
asivache
9d56355abe
bug fixed when reference name was passed as a string instead of actual reference bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@416 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 21:46:27 +00:00
kiran
222c4e5865
Commented out some debugging lines
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@415 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:15:41 +00:00
kiran
49d76014d1
Commented out a debugging line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@414 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:15:11 +00:00
kiran
b39e584787
Primary or secondary bases that got a quality score of literally zero led to unfortunate infinities. Added an epsilon (1e-5) to every prob.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@413 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:04:49 +00:00
jmaguire
d28e9f9b98
search over q's for finding argmax[q] p(D|q)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@412 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:15:45 +00:00
ebanks
647827b18c
Transitioned indel code to use GATK and Walkers
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@410 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:14:15 +00:00
jmaguire
961dbbd4ef
Now output bases and qhat and qstar into the GFF.
...
Quals coming soon (four-base)
QHAT : Most likely alt allele freq (unconstrained by number of chromosomes).
QSTAR : Most likely alt allele freq (constrained by number of chromosomes).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@402 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 15:23:00 +00:00
kiran
dafdff1974
All bases are now indexed as A:0, C:1, G:2, T:3.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@401 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 14:49:43 +00:00
asivache
bc43c0eefc
there are really cases when we can not merge until we get just two pilesant now we do not crash in those cases but print a warning and just show the resulting n piles even when n>2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@390 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:45:47 +00:00
kiran
f838a5e511
Changed some double comparisons of the form a == b to abs(a - b) <= precision. Now we shouldn't be passing or failing some if conditions due to floating-point precision.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@388 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 20:05:46 +00:00
asivache
d44c30154a
added MAX_READ_LENGTH - now we can ignore long reads (454?); a bad idea in general, but the performance hit is to hard to take, at least for preliminary testing runs...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@384 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 16:53:12 +00:00
jmaguire
6652f13a17
more verbose gff output!
...
EVEN MORE verbosity to come!
Tremble in anticipation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@382 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:21:23 +00:00
jmaguire
6e180ed44e
Unified caller is go.
...
AlleleFrequencyWalker and related classes work equally well for 2 or 200 chromosomes.
Single Sample Calling:
Allele Frequency Metrics (LOD >= 5)
-------------------------------------------------
Total loci : 171575
Total called with confidence : 168615 (98.27%)
Number of variants : 111 (0.07%) (1/1519)
Fraction of variant sites in dbSNP : 87.39%
-------------------------------------------------
Hapmap metrics are coming up all zero. Will fix.
Pooled Calling:
AAF r-squared after EM is 0.99.
AAF r-squared after EM for alleles < 20% (in pools of ~100-200 chromosomes) is 0.95 (0.75 before EM)
Still not using fractional genotype counts in EM. That should improve r-squared for low frequency alleles.
Chores still outstanding:
- make a real pooled caller walker (as opposed to my experiment framework).
- add fractional genotype counts to EM cycle.
- add pool metrics to the metrics class? *shrug* we don't really have truth outside of a contrived experiment...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@380 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 12:29:51 +00:00
asivache
b4136b6d6e
a few tweaks to make it more robust: ignore reads with cigars containing anything but I,D,M; don't set up contig ordering manually, rely upon reference sequence and its dictionary; don't die if a record does not have NM tag, but faal back to direct counting instead; now requires reference as a cmdline arg
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@378 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 04:49:19 +00:00
kiran
c51f51f255
Make sure we always write at least 1000 points per base in each cycle's scatterplot. Print the disagreement rate between Bustard and FourBaseRecaller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@375 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:49:41 +00:00
kiran
35fc002d5d
Debugging information is now written in such a way to make it easier to import into R.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@372 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:45:33 +00:00
kiran
6ee4fe5a20
Fixed a Bustard/Firecrest file synchronization bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@371 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:44:07 +00:00
kiran
817278be46
If a SAMRecord is on the negative strand, reverse complement the SQ tag.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@370 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:42:24 +00:00
kiran
1d5a22cacf
Extracts a Fastq file and the SQ tags to a separate file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@369 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:41:44 +00:00
kiran
e410c005c0
A debugging tool to ensure the SQ tag in a four-prob SAM file matches the SAMRecord strand orientation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@368 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:40:42 +00:00
kcibul
ce72932a45
* refactored GenomeLoc to use contigIndex internally for performance and fixed several calling classes
...
* added basic unit test for GenomeLoc
* fixed bug when parsing genome locations like chr1:5000 the start position was being left as maxint rather than being set to the same as the stop position.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@365 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:25:17 +00:00
kiran
2b59110dca
CombineSamAndFourProbs is better.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@358 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:19:53 +00:00
kiran
56aa98ad30
Ignore null values.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@357 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:18:20 +00:00
kiran
2ef2c9e121
Fixed an issue wherein the SQ field was only being pulled from the first read of the pileup, no matter what. Fixed an issue wherein Andrew enumerates his bases as A:0, C:1, T:2, G:3, and Kiran's QualityUtils methods enumerate bases as A:0, C:1, G:2, T:3 (we should standardize this). Fixed an issue wherein the remaining probability was being divided by 3 rather than 2 when four-base probs are enabled.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@356 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:17:53 +00:00
depristo
17b3d5b554
New ROD accessing system, including a generalized interface for binding ROD on the command line that doesn't require you to chance GenomeAnalysisTK.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@355 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 22:04:59 +00:00
kiran
f5cc2d8b0b
Commented out import of IlluminaParser.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@354 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:30:29 +00:00
kiran
c5220c0822
Four-base probs are now decoded with the relevant method in QualityUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@351 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:52:17 +00:00
kiran
9bc763a835
A better (aka 'working') tool for combining four-base probs with an aligned sam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@350 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:51:37 +00:00
kiran
b7a2e82b46
Can optionally process raw or corrected intensities.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@349 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:50:11 +00:00
kiran
6cdad10dd1
Make output type identical to the bustard parser so the values can be easily swapped for one another.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@348 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:49:34 +00:00
kiran
d0ce56e018
Remember to take the strand flag into account when calculating error rate per cycle as a surrogate for instrument performance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@347 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:48:45 +00:00
kcibul
c556a97f17
Skeleton of Somatic Coverage tool
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@342 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 02:34:03 +00:00
kiran
089bf30cf4
Send things to the out file via the logger.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@339 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 21:49:03 +00:00
kiran
6db9a00a0b
SAMFileWriter doesn't appear to flush the buffer when its destructor is called. You have to call the close() method. Also, choose a random base for Ns in the forward and reverse strands so that samtools doesn't pitch a fit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@338 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 21:48:24 +00:00
kiran
eb2f0ebd62
If the first base of a read is 'N', and the alignment cigar says every base matches, samtools calls shennanigans. Now I just output an A, but the real way to do this is to modify the cigar string accordingly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@337 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 19:58:18 +00:00
kiran
0e7d962eca
Oops. Slight twiddle of the math here so that I'm not asking if bestBase == nextBestBase.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@336 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 19:56:54 +00:00
kiran
62ac7366ed
A quick hack to ensure that the sequence, qualities, and secondary qualities are in accordance with the strand flag.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@331 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 15:57:28 +00:00
kiran
25474ebe7e
Computes the read error rate for a bam file. Ignores reads with indels, treats low-quality and high-quality reference bases the same. Does not count ambiguous reference bases as mismatches. Optionally allows for best two bases in read to be used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@330 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 15:56:10 +00:00
asivache
8d48bdc9ec
it walks... the version committed actually counts snps only
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@328 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 02:00:41 +00:00
asivache
62d75ced3c
nothing fancy, just a wrapper (aka struct) to pass around a bunch of counts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@327 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 01:58:57 +00:00
hanna
202c501939
Added a sample xml marshaller / unmarshaller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@322 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:28:16 +00:00
kiran
99579a1ef8
Math correction.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@310 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 02:18:13 +00:00
kiran
9be978e006
Intermediate commit (debugging info).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@309 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 01:20:15 +00:00
kiran
5a5c6d1276
Added some debugging stuff (writes model parameters to one file per cycle).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@304 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 22:00:58 +00:00
ebanks
3f75fc4e83
Unfortunately, because BWA occasionally outputs crazy reads, we need
...
to make sure not to have an ArrayIndexOutOfBoundsException thrown.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@297 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 03:51:35 +00:00
kiran
f12d40dde8
Simplified SAMRecord construction and emission.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@296 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-05 04:48:31 +00:00
depristo
4eac3193f7
Added RefMetaDataTracker system as a replacement for the List<RefenenceOrderedData> going into walkers. This system allows you to more easily get a tracker for processing using the lookup(name, default) system. See Pileup for an example.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@292 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 19:54:54 +00:00
kiran
ef06924f73
JavaDocs!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@290 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 19:19:17 +00:00
andrewk
bef475778f
- Updated --hapmap switch to --hapmap-chip to reflect the data being chip data for an individual rather than population allele frequency data in Hapmap
...
- Corrected some bugs to get metrics logging working
- Added a switch --force_1base_probs to ignore 4-base probalities if they exist
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@287 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 17:32:31 +00:00
depristo
edc44807af
rod's now have names. Use getName() to access it. Next step is better interface to accessing rods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@286 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 16:41:33 +00:00
kiran
5019971290
Now outputs four-base SAM record (read name prefixed with KIR) and bustard SAM record (prefixed with BUS) for easy debugging.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@285 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 15:48:51 +00:00
kiran
15151ac125
Corrected the use of the prior.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@284 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 15:47:47 +00:00
kcibul
9bbce32064
Basic dbSNP and HapMap frequency aware SNP caller... still in progress
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@282 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 14:24:09 +00:00
depristo
f031d882c6
ByReference traversals!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@281 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 13:23:18 +00:00
andrewk
e3ac0cb500
- A lot of code cleaned up; separated metrics code from AlleleFrequencyMetricsWalker into AlleleMetrics and eliminated the former class. AFMW (aside from being a name so long that it warrants an acronym) can now be implemented by passing an option to AlleleFreqeuncyWalker that logs metrics to a file.
...
- AlleleMetrics and AlleleMetricrsWalker are now ready to take a list of clasess that implement the AllelicVariant interface
- Switched a genome location in AlleleFrequencyEstimate from String to GenomeLoc which makes way more sense.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@280 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 02:09:10 +00:00
kiran
7d889c0661
Refactored into oblivon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@276 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:12:15 +00:00
kiran
dffc879240
Should now be appropriately using Bustard data to call bases (there are some mathematical subtleties that arise when no longer using ICs as initialization data. Also writes some more relevant fields in the SAM records. WAAAAAY simpler than old version. Like, super way.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@275 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:10:13 +00:00
kiran
59334b0270
A convenience class for manipulation base probability distributions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@274 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:08:31 +00:00
kiran
399d9b8c1e
A class that represents the model parameters for all of the Gaussian models for all cycles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@273 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:08:10 +00:00
kiran
f0f94b6c72
A class that represents the model parameters for all of the Gaussian models at a given cycle. Handles the accumulation of parameter initialization data and provides for efficient computation of base probability distribution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@272 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:07:47 +00:00
jmaguire
8ce4dabd7c
Print coverage per reference base for each sample in a merged BAM file.
...
This is a good example for how to untangle a merged BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@269 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 21:35:31 +00:00
asivache
5d9b068b8b
generic declarations added here and there to eliminate a few annoying warnings; no consequential changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@268 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 20:53:01 +00:00
kcibul
c192a95998
changes in three files to make the HapMap RODs work:
...
- HapMapAlleleFrequenciesROD.java - the referenceOrderedDatum implementation
- PrepareROD.java - has a static block that loads the known ROD classes, had to add the above
- GenomeAnalysisTK.java - when supplied a hapmap argument... loads the ROD
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@265 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 19:55:19 +00:00
jmaguire
d202264b23
initial add of pooled calling experiment walker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@262 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 17:55:40 +00:00
depristo
24e8581c30
Slight improvements to allele caller interface; fixed problem with printing progress
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@260 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 16:44:12 +00:00
asivache
20d4bcbb2e
I said - delete!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@259 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 16:21:21 +00:00
jmaguire
25ace306b9
GenomeAnalysisTK: better documentation of validation option.
...
AlleleFrequencyWalker: output the last reference interval if it's left hanging open.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@258 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 16:11:20 +00:00
asivache
f26055c926
interface representing allele variants/genotype calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@256 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 15:57:19 +00:00
jmaguire
f42b75da72
restore GFF_OUTPUT_FILE to a required argument.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@255 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 14:34:08 +00:00
depristo
2cd9a1597f
Simple improvements to allele caller
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@254 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 14:09:14 +00:00
jmaguire
4faacac315
Now handle the case where we don't actually SEE all of the positions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@248 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 19:50:07 +00:00
jmaguire
675505646d
now makes confident reference intervals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@247 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 18:46:14 +00:00
jmaguire
ede52f7359
- take command line arguments
...
- output GFF lines to a file (specified by a command line argument)
- improve the GFF output string
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@240 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 18:43:00 +00:00
ebanks
907c183242
update walkers so that onTraversalDone works (it now takes an arg)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@235 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 15:05:33 +00:00
ebanks
3896cc8f17
Moved avg depth of coverage functionality into the core depth of coverage
...
walker. Used new command line args for walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@234 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 05:02:33 +00:00
ebanks
007ecc8616
Added a stateless walker to give the average depth of coverage for given reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@233 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 02:33:59 +00:00
jmaguire
875802e8fc
print output as a GFF line.
...
still need to add printing GFF intervals for stretches of confident reference calls.
does the GFF ROD class handle intervals?? We'll find out. >:)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@225 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-30 17:47:35 +00:00
jmaguire
b752960586
rearranged some stuff and eliminated the binomial prior in the N!=2 case. Much faster.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@224 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-30 17:26:05 +00:00
depristo
d7c0bcc223
Reorganized GenomeLoc code to more clearly and better use the picard SequenceDictionary information.
...
All GenomeLoc[] are not ArrayList<GenomeLoc> for clarity and consistency
Parsing now recursively merges contiguous elements chr1:1-10;chr1:11-20 => chr1:1-20
Added support for TraversingByLoci over all reference positions specified by the provided location array. System dynamically determines which traversal system to use.
Pileup now marks, very clearly, reference positions without covered reads.
Made changes around the codebase to deal with new GenomeLoc structure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@218 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-28 20:37:27 +00:00
hanna
4a6be896b9
Provide out and err PrintStreams to the walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@213 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 15:03:32 +00:00
asivache
c6d9848d08
synchronizing latest changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@212 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 14:15:44 +00:00
hanna
53fe9acf65
Make command-line arguments available in walker constructor, provide back door from
...
walker into GATK itself, do some cleanup of output messages, and add some bug fixes.
Command-line arguments in walkers are now feature-complete, but still a bit messy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@203 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 20:45:27 +00:00
hanna
5f9010116a
Collapse the walker hierarchy, in preparation for in-walker output streams less hokey walker args.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@201 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 16:22:35 +00:00
depristo
7cad3acc61
Support for dynamically merging data files. Preliminary only -- everything in these systems is still being tested
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@200 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 14:40:50 +00:00
asivache
f47a214f96
massive changes everywhere; lots of bugs fixed; methods moved around; computation and printout of overall stats added; now decides whether to accept or reject 'improvement'; writes alignments into two output sam files (unmodified reads/failed piles into one, realigned piles into the other); special treat for paranoids: writes third sam file with all the analyzed reads, unmodified
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@197 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 02:26:17 +00:00
andrewk
0331cd8e95
Updated AlleleFrequency* classes to calculate separate lods for VarVsRef and BestVsNextBest mixture (qstar) theories; AFWMetrics now reports single sample performance w.r.t. Hapmap chip using the appropriate lod for gentoyping (BestVsNextBest) or variant / reference calling (VarVsRef).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@196 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 02:10:18 +00:00
andrewk
c88a17dfee
AlleleFrequencyWalker now can parse 4-base probs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@195 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 20:33:05 +00:00
jmaguire
2ed63fe17c
a bunch of changes that support pools.
...
they don't appear to break single sample:
Allele Frequency Metrics (LOD >= 5)
-------------------------------------------------
Total loci : 9000
Total called with confidence : 8138 (90.42%)
Number of variants : 11 (0.14%) (1/739)
Fraction of variant sites in dbSNP : 81.82%
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@192 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 18:52:42 +00:00
kiran
607731da91
Fixed a harmless (but annoying) bug wherein the read name for the SAMRecords increases by two on every iteration rather than one.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@189 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 15:20:29 +00:00
jmaguire
44acc358b7
Add a "notes" member to the AlleleFreqencyEstimate, e.g. for hapmap metadata.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@188 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 15:18:10 +00:00
asivache
4c29dca70d
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@186 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 09:23:42 +00:00
asivache
71d3e8e99b
fixed another bug in gapped alignment computation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@185 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 08:33:57 +00:00
asivache
40f45c2333
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@184 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 05:48:10 +00:00
andrewk
30babbf5b9
Restructured AlleleFrequencyMetricsWalker to correctly report Hapmap concordance numbers for genotyping and added reporting for Hapmap reference/variant calling. Also, tiny bugfix in interval code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@181 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 01:12:05 +00:00
kiran
28c1330b4b
Fixed a bug wherein the loop variable for the second end of the pair was actually looping over the entire raw read (first and second ends combined).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@178 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 21:59:25 +00:00
kiran
499c422de6
A version of the four-base caller that computes the probability distribution over base call space by initializing off the Bustard calls rather than the ICs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@173 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 20:11:39 +00:00
asivache
4222016bf5
stop printing sw matrix and other debug infoant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@171 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 18:15:52 +00:00
asivache
8ea8a74fbf
fixed bug in calculation of alignment start offset for negative offsets; toString() added
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@170 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 18:05:28 +00:00
asivache
9aa1ccd9b7
fixed some bugs in calling the optimal path; parameters adjusted (?)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@169 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 17:27:51 +00:00
kiran
88d94d407a
Fixed a bug in the parsing of the second end of the pair.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@168 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 14:34:37 +00:00
asivache
786a7845dd
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@167 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 14:06:44 +00:00
asivache
3d1e0bf079
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@166 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 14:06:24 +00:00
asivache
908065125f
computes Smith-Waterman pairwise alignment
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@164 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 05:36:37 +00:00
andrewk
9dee9ab51c
Added Hapmap data track (using rodGFF class for GFF file format) to toolkit as a command line option, Hapmap metrics to AlleleFrequencyMetricsWalker, and a python Geli2GFF file converter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@163 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 03:58:03 +00:00
hanna
63cd1fe201
Push core / playground lower into the tree.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@160 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-23 23:19:54 +00:00