Commit Graph

1155 Commits (d0cef5ff9dba1bafdb611846726ff5995aa2875e)

Author SHA1 Message Date
hanna d0cef5ff9d Oops. Specified incorrect classname in packgae for depth of coverage walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1161 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 21:40:40 +00:00
asivache d603145cb0 Meaning of input arguments has CHANGED: minFraction is now a minimum fraction of CONSENSUS indel observation, out of all reads covering the site, required to make the call. minConsensusFraction is still the minimum fraction of CONSENSUS indel observation out of all indel observations at the site
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1160 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 20:38:10 +00:00
hanna 62807139fc Cleanup pileup and depth of coverage in preparation for release. Add pileup, depth of coverage, and print reads to package for distribution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1159 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:54:01 +00:00
kcibul 6a25f0b9c5 refactored into new package
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1158 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:37:54 +00:00
aaron 1c83b4d949 forgot to take out some test code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1157 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:18:37 +00:00
aaron bc17ff567a When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions. Now with a test case!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1156 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:15:50 +00:00
depristo 47cb9f169e Stable tool that's the reverse of merging -- splits a file into individual BAM files, one for each sample ID in the SAM header
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1155 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 12:56:46 +00:00
depristo 6684cb8bc9 copySamFileHeader() utility function
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1154 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 12:55:51 +00:00
aaron bb92eb8b1c added a fix for overlapping reads in the locus context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1153 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 02:08:59 +00:00
aaron 6570ce0b5b added the example files to the distro
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1152 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 23:16:42 +00:00
aaron 9d659199f3 bam and fasta files generated with the artificial tools. These will be included in the GATK distro.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1151 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 22:49:31 +00:00
aaron d4d3af20f2 made a fake fasta generator, so we can now generate a complete bam / fasta combo of made up data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1150 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 21:35:34 +00:00
asivache c2e5a68aaf output format changed in --verbose --somatic mode: now also prints the <#reads with indels>/<coverage> for normal samples, rather than only for the tumor; also, code cleaned up a little
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1149 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 20:56:16 +00:00
andrewk 4cbf069de1 First version of coverage evaluation tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1148 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 20:52:25 +00:00
asivache 7462f3f344 Bug in setContig() fixed: sequence dictionary's .getSequences().contains() and .getSequences().indexOf() do NOT work when applied to contig names (Strings), since getSequences() returns a list of SAMSequenceRecord's; changed to querying the dictionary directly for specified contig name
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1147 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 20:50:09 +00:00
ebanks 76fd4b3848 deal with different contigs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1146 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 19:17:27 +00:00
ebanks 20fab507a8 Choose the REF if it scores equal to consensus!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1145 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 18:54:27 +00:00
hanna 9b182e3063 Prep for documenting command-line arguments: delete some arguments that don't make sense any more given
the state of the traversals and GATK input requirements: all_loci (replaced by walker annotation), max
OTF sorts (bam files must be sorted and indexed), threaded io (replaced by data sharding framework).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1144 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 18:23:35 +00:00
ebanks 5a5103cfd2 Heads up, everyone: command-line args no longer need to be public.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1143 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 16:09:22 +00:00
hanna b43d4d909e Fix CleanedReadInjectorTest to work with new CleanedReadInjector.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1142 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 15:48:06 +00:00
aaron 891f4c2bd9 up on the wiki, removing it from the repo. Ignore my last commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1141 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 15:42:02 +00:00
aaron 05c5659053 This document is now up as content on the wiki, so I'm removing it from svn.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1140 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 15:40:32 +00:00
aaron d58eeb7539 Don't cry wolf: only one warning is now emitted, instead of tons of warnings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1139 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:50:37 +00:00
hanna a3e0ec20c4 Kill the TraverseByLocusWindows traversal. TraverseLocusWindows will take its place.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1138 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:46:35 +00:00
hanna 74e9bb46b4 Contents of the Hello World doc are now in the wiki.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1137 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 22:32:56 +00:00
hanna 93da64db10 Update naming for consistency.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1136 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 22:03:21 +00:00
hanna e93f751bd7 First step in replacing the Hello, World! document. Revamped the HelloWalker and checked it into the source tree, created a special build file for it, and added it to the packaging tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1135 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 21:59:54 +00:00
ebanks fdff233d70 new injector args and address Kiran's question
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1134 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:49:22 +00:00
ebanks 8d3dc57c3d Commit to emit in sorted order so we don't have to use /tmp
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1133 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:47:15 +00:00
aaron f5cba5a6bb Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1132 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:17:24 +00:00
hanna 455275996f Added contents to the wiki.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1131 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 18:29:46 +00:00
asivache 177d6d00b8 added setContigIndex(). NOTE: both setContig() and setContigIndex are UNSAFE as one does not automatically involve updating the other, and there's also no validation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1130 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 17:40:37 +00:00
depristo 9fca79ed62 Read groups are now sorted in the output data, for convenience
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1129 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:50:44 +00:00
hanna fe421e5712 All IntelliJ best practices info is now on the wiki.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1128 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:45:52 +00:00
ebanks 08df4771c8 count X/N/etc. as mismatches for the NM attribute in the BAMs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1127 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:08:55 +00:00
kiran d412c5dc2f Updated to use SecondaryBaseAnnotator class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1126 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:08:43 +00:00
kiran e3cdf7ef4b A single class that can be handed reads for training and basecalling. When in training mode, we accumulate no more than 10000 reads and always replace the lowest-quality reads with superior quality reads. Thus, the training set always contains 10000 of the best reads available. After training is complete, the class can be interrogated to return the SQ tag for a given RawRead object.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1125 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:03:15 +00:00
hanna 74cc7136f7 All info from the user manual is now in the wiki. Deleting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1124 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 15:29:59 +00:00
hanna ddf4003536 Updates to picard public / private and sam.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1123 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 14:50:55 +00:00
ebanks 8aa3b65e7f fix to guarantee emission in sorted order
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1122 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 13:48:41 +00:00
aaron 03f8177a53 When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1121 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 20:51:55 +00:00
aaron 1dcababad1 a fix to make the test run
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1120 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 20:24:32 +00:00
jmaguire a17bf145f6 fix to respond to the change in IndelLikelihood constructor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1119 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 19:05:33 +00:00
depristo 7ecc43e9a7 Fixed subtle null ptr exception discovered by Kiran. Now deals with the rare situation where you have only say Q28 bases at dbSNP sites, so you fail in the Table recalibration step with a null pointer error into the data structure indexed by quality score. If you are Q score above those seen before you aren't modified in any way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1118 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 18:57:42 +00:00
ebanks 95e2ae0171 Deal with reads whose ends are aligned off the end of a chromosome.
Includes update to ignore non-ATCG bases (not just 'N')
(Also, create a BWA dir for future work)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1117 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:50:05 +00:00
jmaguire 65a788f18a Added a ROD (SangerSNP) for parsing the Sanger's chr20 pilot1 SNP calls.
Some doodling around with indel calling in an EM context.
 



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1116 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:32:12 +00:00
asivache ceeeec13b8 Computes a vector of numbers of reads falling into successive intervals of specified length (e.g. numbers of reads per every 1Mbase)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1115 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:12:21 +00:00
ebanks 3bacb3db03 updated some defaults
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1114 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 19:28:05 +00:00
ebanks eb74b16e39 updated what constitutes removing entropy
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1113 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 18:29:00 +00:00
aaron d7d4298917 Some files to support generic genotype outputing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1112 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:43:41 +00:00