Commit Graph

552 Commits (53153fcd79bd08a78868e21998bdda009b83a8d3)

Author SHA1 Message Date
ebanks 9b1d7921e8 added filter based on concordance to another call set
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1432 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 15:16:30 +00:00
ebanks b2a18a9d61 - first pass at a basic indel filter (for now, based on size and homopolymer runs)
- fix simple indel rod printout


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1431 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 03:04:12 +00:00
ebanks 78439f7305 Modify Sequenom input format based on official documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1430 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 01:42:57 +00:00
ebanks d4808433a1 Added option to output the locations of indels in the alternate reference
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1424 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-16 03:46:36 +00:00
ebanks 4b6ddc55bd Merge our 2 fastq writers into 1: incorporate Kiran's secondary-base file writer into the fasta/fastq writers
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1423 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-14 20:55:23 +00:00
ebanks 0ec581080c Refactoring the code; also, now it prints continuously instead of potentially storing one long string.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1421 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-13 01:32:46 +00:00
asivache 2a01e71277 A very simple standalone filter for fooling around with the data: can extract only mapped or only unmapped reads, only reads with mapping quals > X, reads with average base qual > Y, reads with min base qual > Z, reads with edit distance from the ref > MIN and/or < MAX
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1420 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:28:51 +00:00
asivache ebec0ec171 A standalone companion to BamToFastqWalker: does the same thing but without calling in gatk's heavy artillery (does not "require" a reference either). Extracts seqs and quals and places them into fastq; along the way it also reverse complements reads that align to the negative strand (so that fastq contains reads as they come from the machine).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1419 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:24:37 +00:00
asivache 112a283f54 be nice, don't forget to close the reader when done
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1418 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:19:56 +00:00
asivache ba2a3d8a58 Reverse qualities when read seq. is reverse complemented
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1417 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:17:35 +00:00
ebanks 143f8eea4e option to output in sequenom input format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1415 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 16:50:37 +00:00
ebanks 7f1159b6a9 Added option to mask out SNP sites with "N"s in the new reference.
This is useful when producing Sequenom input files for validating indels...


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1414 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 15:17:45 +00:00
ebanks 43f63b7530 Added a walker to convert a bam file to fastq format (including the option to re-reverse the negative strand reads).
Picard has such a tool but it is geared towards their pipeline and requires intimate knowledge of the lanes/flowcells,etc.  This is just easy.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1413 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 15:10:40 +00:00
asivache e4acd14675 Now GenomicMap maps (and RemapAlignment outputs) regions between intervals on the master reference as 'N' cigar elements, not 'D'. 'D' is now used only for bona fide deletions.
Also: do not die if alignment record does not have NM tags (but mapping quality will not be recomputed after remapping/reducing for the lack of required data)

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1411 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 21:10:17 +00:00
ebanks 5fab934f4e - moved the reference maker to its own directory
- added first version of a more complicated reference maker which takes in RODs and creates an alternative reference based on the variants (indels and/or SNPs)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1409 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 18:01:06 +00:00
sjia 1851613de4 Now using larger database of HLA alleles
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1405 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 03:11:14 +00:00
asivache 3208eaabcc A standalone picard-level tool for breaking individual reads into "pairs" of first/last N bases. Supports:
* splitting off only start or end of the read, or both; the output will contain 
     chopped sequences AND corresponding base qualities
   * splitting arbitrary number of bases off each end (different numbers
     for left and right segments can be specified; segments can overlap)
   * splitting only unmapped reads, ignoring mapped ones
   * writing splitted ends into separate sam/bam files, or into a single output file
   * decorating original read names with user-specified suffixes for each end
     (e.g. _1 and _2 for left and right parts of the read); default: no decoration, 
     original read names are used
   *  when mapped reads are split, the alignment cigars are chopped appropriately
     and the alignment start positions are adjusted (for the right end) to correctly 
    specify the alignment of the selected part of the read


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1402 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 20:42:49 +00:00
asivache 36312ae4b2 tiny cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1401 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 20:26:52 +00:00
asivache 921d4f4e95 RemapAlignments is a standalone picard-level tool that does not use gatk engine; moved to 'tools'
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1396 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 15:41:07 +00:00
depristo 089dab00e2 Was discordance rate, now concordance rate
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1393 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:37:52 +00:00
depristo 6d3ef73868 Now includes statistics on the allele agreement with dbSNP -- counts concordant calls as dbSNP = A/C and we say A/C, vs. we say A/T
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1392 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:37:07 +00:00
depristo a864c2f025 Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1390 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:00:06 +00:00
ebanks db250f8d3e Don't print if not in learning mode
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1389 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 06:08:02 +00:00
ebanks 4c1fa52ddf -Added mapping quality zero filter
-Set some reasonable defaults (based on pilot2)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1388 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 03:18:02 +00:00
sjia d60d5aa516 Fixed bug: previously reset likelihoods after each region/exon.
Better comments/documentation added

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1386 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 18:44:46 +00:00
kcibul 0d47798721 made booster distance a parameter
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1385 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 18:29:21 +00:00
ebanks 3b74b3ba74 print out ref/alt ratio, not major/minor
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1384 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 16:36:25 +00:00
depristo 65e9dcf5b7 Fully operational version of the new genotype likelihoods class. (1) Much cleaner interface. Now explicitly stores likelihoods, priors, and posteriors in separate arrays indexed by an enum, (2) no longer can be used to make calls, it relies on SSGGenotypeCall to order the likelihoods, calculate best to ref, etc, this is just for calculating genotype likelihoods now; (3) Now performs extensive error checking with validate() to ensure the system is behaving properly. (4) fixed incorrect treatment of N bases, which we being counted against everyone (5) likely found a stats bug in which heterozyosity was being applied incorrectly to the genotype priors
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1382 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 01:00:55 +00:00
sjia 68309408e4 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1378 348d0f76-0448-11de-a6fe-93d51630548a 2009-08-04 21:23:01 +00:00
sjia 45ab212f22 Post-presentation update
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1377 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:21:12 +00:00
hanna 21d1eba502 Cleaned division of responsibilities between arguments to map function. Reference has been changed
from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect
the fact that it contains contextual information only about the alignments, not the locus in general.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:01:37 +00:00
kcibul a5a7d7dab8 added "booster" metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1375 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 20:53:45 +00:00
ebanks 3a8d923785 minor output changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1374 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 20:12:16 +00:00
mmelgar 939b19e715 Committing the first version of the homopolymer filter. Removes SNPs that occur at the edges of homopolymer runs and whose nonref allele matches the repeated base in the homopolymer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1373 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 14:35:51 +00:00
depristo 20ff603339 New hotness and old and Busted genotype likelihood objects are now in the code base as I work towards a bug-free SSG along with a cleaner interface to the genotype likelihood object
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1372 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 23:07:53 +00:00
depristo 3485397483 Reorganization of the genotyping system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1370 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 20:55:31 +00:00
ebanks 9f1d3aed26 -Output single filtration stats file with input from all filters
-move out isHet test to GenotypeUtils so all can use it


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1369 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 20:44:21 +00:00
depristo d840a47b11 Slight reorganization of genotype interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1366 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:17:15 +00:00
depristo 20986a03de cleanup before moving files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1365 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:08:24 +00:00
ebanks e3b08f245f Pull out RMS calculation into MathUtils for all to use
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1364 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 17:00:20 +00:00
ebanks e495b836d3 - added mapping quality filter
- make the filters brainless in that they strictly have thresholds and filter based on them; require user to calculate and input these thresholds.
- update filters in preparation for migration to new output format



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1363 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 16:46:51 +00:00
kiran 8bc925a216 Commit on the behalf of Mark: cleaning up some old and busted code in GenotypeLikelihood and associated objects.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1361 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 21:18:30 +00:00
aaron 9dfee7a75c the "-genotype" option now acts correctly as a discovery mode caller in SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1359 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 18:31:45 +00:00
sjia 9dada95ec3 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1357 348d0f76-0448-11de-a6fe-93d51630548a 2009-07-31 16:21:16 +00:00
andrewk 678c2533ca Removed custom output stream for file and replaced with the standard out PrintStream
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1350 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 22:36:42 +00:00
andrewk 44673b2dce Removed a debugging println that was accidentally checked in
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1348 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 22:23:27 +00:00
andrewk 845488ff94 VariantEval now decides whether a variant is not confidently called using BestVsNetxBest if genotypes are being evaluated and BestVsRef if not (variant discovery only). Also, the absolute value of the BestVsRef LOD (getVariantionConfidence) is used so that confident reference calls (if the GELI has output them) will show up in the final table as reference calls rather than no calls.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1347 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 21:54:06 +00:00
andrewk fdc7cc555b Removed extra column name from geliHeaderString that was mislabeling the 10 genotype likelihoods by shifting them over by onex
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1345 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 21:42:02 +00:00
aaron 0087234ed7 small code cleanup, a couple of little changes to SSGGenotypeCall
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1343 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 19:47:37 +00:00
ebanks fbc7d44bc7 don't allow users to input priors anymore; they should be using heterozygosity and having the SSG calculate priors.
Note that nothing was changed for dnSNP/hapmap priors (not sure what we want to do with these yet - any thoughts?)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1342 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 19:10:33 +00:00