asivache
8f1cabd33d
cmd line args changed - again; internally uses VariantType enum
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@818 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:03:58 +00:00
asivache
9ef1a21112
minor changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@817 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:03:06 +00:00
aaron
d994544c47
Added back end code support for Sharding based on genomic location for reads. Changed the sharding
...
code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier
and cleaner test cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@816 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:57:46 +00:00
asivache
4edcdffe45
refseq annotation track: should be able to provide (multiple) transcript annotations available over a given genomic position. NOT finished and NOT tested!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@815 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:07:15 +00:00
andrewk
149cc9989b
spaces!!!!!!!!!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@814 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 19:40:25 +00:00
ebanks
c2df35b7fe
- get leftmost position of indel correct
...
- don't try to clean reads with mapping quality of 0
- un-deprecate
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@813 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 17:24:58 +00:00
hanna
54bb643d19
Validated Mark's assertion that GSA-27 is fixed. Also did some cleanup on the pileup walker so that it doesn't output to System.out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@812 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 15:58:21 +00:00
hanna
008d677bea
Fixed ValidatingPileup to work with Andrey's new rodSAMPileup -> GenotypeList type hierarchy.
...
Fixed reference-ordered data validation system to validate class hierarchies instead of specific class types.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@811 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-23 20:50:28 +00:00
aaron
d056f9f3e8
Changed the name to reflect the sorted nature of the set, added some fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@810 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 22:34:24 +00:00
aaron
831d430025
Added a collection for storing GenomeLocs, that also has functions for removing by genomic region (that may span multiple GenomeLoc's in the collection), and adding regions, which are then merged with any overlapping regions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@809 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:52:40 +00:00
hanna
34413362fd
Bugfix: handle case where queue is empty.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@808 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:45:22 +00:00
hanna
ec2e8d5726
Fixes for getting ValidatingPileup running in parallel.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@807 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:20:24 +00:00
kiran
cd80e3f372
Replaced dumb training function with a version that creates a training set slightly more sensibly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@806 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:34:33 +00:00
kiran
02c0afdb85
Added the ability to specify the sorted, unaligned bam and/or the sorted, aligned bam such that broken computations can be restarted.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@805 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:33:34 +00:00
kiran
454a6d1df7
Fixed an egregious error in simpleReverseComplement wherein the RC'd string would be composed entirely of the last base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@804 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:32:20 +00:00
hanna
2a5be1debe
Cleanup in datasources.providers namespace. Make it easier for others writing traversal engines to use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@803 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:12:00 +00:00
asivache
02fc4f145f
refactoring: a couple of general purpose (hopefully useful?) methods/classes extracted into a standalone utils class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@802 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 18:54:40 +00:00
asivache
4b718688d5
no changes, really, just synchronizing (instead of reversing) to increase the amount of entropy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@801 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:27:28 +00:00
asivache
893f1b6427
updated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@800 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:25:50 +00:00
asivache
a9dfbfb309
internal changes and some refactoring. slightly different final report. Now can take tracks that implement either Genotype or GenotypeList; takes an arg specifying what variants to look for (POINT - aka snp - or INDEL); takes an arg specifying whether default ref/ref call of one type (INDEL/POINT) should be implicitly assumed if another call (POINT/INDEL respectively) was made at the same position [this is probably most useful for indels and only (?) for sam pileups: if we have only point mutation call at a given position, it does mean that we do have coverage, and that there was no evidence whatsoever for an indel, so we have an implicit 'no-indel' call]
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@799 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:25:09 +00:00
asivache
d5bb4d9ba9
Auxiliary class that can read one line from samtools pileup file. Used by rodSAMPileup to read pairs of lines as needed. NOTE: this class implements Genotype and (a trivial) GenotypeList, but it is NOT a rod!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@798 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:20:01 +00:00
asivache
732fed9aad
ALERT, ALERT! rodSAMPileup is now a GenotypeList, not a Genotype! Now it can intelligently read full samtools pileup files (containing, in general, both point and indel genotypes at the same position). No need to split/synchronize pileups from different individuals anymore, hooray!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@797 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:17:59 +00:00
asivache
26633957d9
Genotype interface is extended: now it requires implementing object to be able to tell whether it isPointGenotype() or isIndelGenotype() (and the contract requires, e.g. alleles to be represented differently)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@796 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:14:46 +00:00
depristo
d9fc84f1e3
actually checking in the first pass
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@795 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:13:27 +00:00
asivache
8773b3a430
a trivial wrapper interface for the objects capable of holding 'full' genotype, i.e. both point (as in ref/snp) and indel variants at the same reference position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@794 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:12:01 +00:00
depristo
7a979859a9
Intermediate checking for evaluation -- now supports transition / transversion evaluation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@793 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:05:06 +00:00
ebanks
f2ea193149
For some reason the apostraphes in the comments were throwing annoying
...
compile-time warnings: "unmappable character for encoding UTF8"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@792 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 14:07:07 +00:00
jmaguire
9902ce8073
properly flush the gzip output stream. this was a subtle inheritance bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@791 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 13:57:58 +00:00
asivache
63caca31bf
minor update in report printout format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@790 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 13:56:09 +00:00
asivache
7afc10fd6f
updated, reports more stuff now, including stats for external consistency checks
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@789 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:28:18 +00:00
depristo
30c63daf89
More improvements to the duplicate quality combiner, making progress towards a clean system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@788 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:26:57 +00:00
depristo
65995887fc
Releasable version of the Pileup walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@786 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:25:37 +00:00
depristo
dc17a5661d
Better accessors for dealing with second base prob pileups
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@785 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:25:16 +00:00
depristo
d261459c48
Useful function to create a string with N copies of a same char
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@784 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:23:52 +00:00
kiran
287bb52e81
Refreshes the mount points that we'll be using (so that the program will play nicely with LSF).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@783 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:36:12 +00:00
jmaguire
b5ad5176f7
stick headers on the output tables
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@782 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:50 +00:00
kiran
83e1454a11
Added a method to determine the fraction of a sequence that's taken up by the most frequent base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@781 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:31 +00:00
kiran
bdf772f017
Added test for determining the fraction of a sequence that's taken up by the most frequent base (quick-and-dirty homopolymer testing).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@780 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:08 +00:00
hanna
d61a5261c1
Better integration of reference-ordered data into the data sharding system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@779 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:09:32 +00:00
ebanks
0d58e4ccc9
-check original alignments for indels when computing mismatch score
...
-move logging to debug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@778 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:55:42 +00:00
kiran
5f67914b08
Added loads of documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@777 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:40:47 +00:00
kiran
1a9d5cea29
Added a method to reverse-complement a String object, preserving 'N' and '.' bases.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@776 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:39:39 +00:00
kiran
a687c6bc03
Added a method to refresh an NFS mount point (necessary to prevent NFS flakiness when running on the LSF farm.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@774 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:31:54 +00:00
kiran
324ef9cbd1
Test class for PathUtils.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@773 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:31:22 +00:00
aaron
8515247575
Adding some functions I keep reinventing, especially for testing purposes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@772 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:30:44 +00:00
ebanks
e6200fe5b5
don't ignore reads when maxReadLength isn't set
...
also, print out LOD score for cleaning
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@771 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:24:10 +00:00
andrewk
0219d33e10
QualityUtils: added reverse function to reverse an array of bytes (and not complement it), BaseUtils: split qualToProb into itself and qualToErrProb, CovariateCounterWalker and LogisticRecalibrationWalker: several changes including a properly acocunting (only partly complete) for reversing AND complementing bases that are negative strand, PrintReadsWalker: created option to output reads to a BAM file rather than just to the sceern (useful for creating a downsampled BAM file)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@770 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 18:30:45 +00:00
asivache
7e77c62b49
auxiliary class, a simple struct to keep together info like numbers of covered, assessed, ref/variant bases across the sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@769 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 16:30:16 +00:00
asivache
7e5e422591
ReferenceOrdereData now inspects the ROD class using reflection. If the ROD declares a static Iterator<ROD> createIterator(String rodName, File rodFile) factory method, it is wrapped and used by the ReferenceOrderedData to read records from rodFile. If the ROD does not provide such factory method, the old behavior is the default: ReferenceOrderedData uses its own simple default iterator to read the file line by line (assuming there is only one line per record/position).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@768 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 15:23:22 +00:00
hanna
26dd3cd50e
Cleanup. Move filtering functions closer to where they're used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@767 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 21:42:48 +00:00
hanna
e7a6f8cdc4
Removed evidence of a previous incarnation of data sharding.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@766 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 20:48:33 +00:00
hanna
3cad580655
Catch and rethrow the walker's required argument, so that command-line arguments will be displayed when the GATK throws an argument exception.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@765 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 19:17:16 +00:00
hanna
dc748d9c9c
Integrate more feedback on command-line argument system. Focus on help
...
formatter: separate required from optional but otherwise keep ordering
the same, reorder GATK arguments by usage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@764 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 19:01:25 +00:00
ebanks
34f9820299
update mapping quality score and edit distance attribute for reads when they are cleaned
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@763 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 17:51:31 +00:00
ebanks
57918de753
add the @Requires for this walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@762 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 17:03:12 +00:00
kiran
747521c849
Fixed the simplest of typos.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@761 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 16:00:30 +00:00
kiran
e48078b476
Updated to reflect change to BasecallingReadModel constructor.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@760 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 15:43:26 +00:00
kiran
505f588768
Forgot to say that the mate is unmapped too. This is necessary to prevent SAM-JDK from yelling at me about an invalid SAM file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@759 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 15:38:51 +00:00
hanna
96e73e496a
Delete deprecated old-school traversals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@758 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 14:57:17 +00:00
aaron
b840dd1320
Added some code to change the instrumentation for tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@756 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 05:15:27 +00:00
kiran
6c5fbb988b
Now basecalls an entire read (both ends of the pair, barcode... everything) at once. After, RawRead and FourProbRead can be asked to return a specified subset (corresponding to the ranges specified for each end of the read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@754 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 00:09:20 +00:00
kiran
e293d65ede
Refactored to allow the user to specify the range of cycles they wish to call. Simply specify a single range (i.e. '0-75') or two ranges ('0-75,76-151'). This allows single and paired-end read processing to coexist happily. Also implements annotation of an aligned bam file (which should hopefully fit in under two gigs now, but I'm waiting on a bug fix or a clarification from the Picard team.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@753 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 00:07:24 +00:00
kiran
08c9f4d86b
Renamed to BasecallingTrainer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@752 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 00:03:46 +00:00
hanna
01a3cb27c7
@Required / @Allows flags for main arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@751 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-19 23:26:17 +00:00
kiran
40dbc21df7
Moved ParseException to it's own file and made it public.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@750 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-19 14:42:44 +00:00
hanna
ff798fe483
Reintroduce support for interval-based traversals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@749 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 22:54:18 +00:00
jmaguire
3441795d9c
better handling of edge cases (zero coverage, reference mistakes, etc.)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@747 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 18:04:37 +00:00
kiran
7c615c8fb0
Some changes to the system for annotating a pre-aligned bam file. Doesn't fit within 2gigs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@746 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 17:42:08 +00:00
asivache
a39c8839c8
print percentage sign!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@745 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 14:38:20 +00:00
hanna
c10741e9f5
Rename TraverseLociByReference to TraverseLoci to represent its new function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@743 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 01:31:57 +00:00
hanna
e6ce80c8e3
Fix for GSA-44...don't throw exception when user specifies -h.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@742 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 00:42:00 +00:00
hanna
d35e20ce21
Better error checking for missing .dict file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@741 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 21:57:12 +00:00
hanna
7161b8f927
Disable support for short name values directly abutting their arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@740 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 16:09:32 +00:00
hanna
d152c2b911
New GATKArgumentCollection caused a subtle bug with argument grouping and the help system. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@738 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 14:54:25 +00:00
jmaguire
94e324b844
Write N for the alt allele when we're hom-ref.
...
Stop EM loop when we've converged (likelihood[t-1] == likelihood[t]).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@737 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 13:58:11 +00:00
kcibul
bd53bc18f9
added new required annotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@736 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 12:24:06 +00:00
kiran
28bf7ec8ad
Aesthetic cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@735 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 04:09:23 +00:00
kiran
a0464633fd
Whoops. Changed denominator from reads to bases.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@734 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 03:42:25 +00:00
kiran
5d60efc498
Factored out some simple stats accumulation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@733 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 03:37:57 +00:00
ebanks
81fac73c01
LOD checks for normal and brute force versions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@732 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 02:56:03 +00:00
jmaguire
527df6e57b
Massive speed-up, clean-up and tabular output.
...
This program is going to rule.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@731 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-16 16:52:40 +00:00
jmaguire
3b57a35009
don't be tricked by multiple read groups with the same sample id!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@730 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-16 15:28:55 +00:00
jmaguire
947bac5cdc
vast speedup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@729 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-16 15:27:58 +00:00
kiran
6f1559bd77
Cleaned up a bit. Added some documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@728 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 21:22:24 +00:00
hanna
2c4de7b5c5
Switch TraverseByLoci over to new sharding system, and cleanup some code in passing read files along
...
the pathway from command line to traversal engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@727 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 21:02:12 +00:00
ebanks
f33f3c0434
added LOD threshold for determining when to clean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@725 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:23:59 +00:00
aaron
99d4ebc26d
Added functionality to return the final accumulator of a traversal, so external tools can get the result of a walker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@724 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:20:27 +00:00
kiran
dae77bf14a
Fixed a typo in a comment.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@723 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:07:31 +00:00
kiran
bfc40f54f0
Nicer output when training off of perfect reads. Not that that works yet...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@722 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:07:08 +00:00
kcibul
d1f3000afa
bed-style output for IGV
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@721 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 17:58:44 +00:00
kiran
36db44620b
Improved output. Can optionally limit the number reads actually called.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@720 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 00:07:57 +00:00
depristo
7834b969b4
Better interface to the tabular ROD, now makes writing files easier. Also has corresponding test files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@719 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 23:20:11 +00:00
aaron
50f32b7f61
Added a shard strategy for the reduce-by-interval traversals. Also fixed bugs that I found along the way.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@718 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:20:18 +00:00
depristo
0f8e6061b6
Simple interface improvements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@717 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:08:09 +00:00
depristo
8e9e2f4502
Revised ROD system. Split the system in Basic type and interface. Enabled more control over rod accessing, including an initialize() function to fetch headers and other options from the file. Added general tabular rod, which has a named columns and supports a map<String,String> interface. Comes with shiny new Junit system for RODs. Also, added simple python script for accessing picard data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@716 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:06:28 +00:00
hanna
67293168e7
Support periods in sequence names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@715 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 20:17:57 +00:00
jmaguire
641afc4e76
fix a crash in the event that the input file has no read groups!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@714 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 19:27:41 +00:00
aaron
d8c1b010f1
Fixing the naming of the function I checked in earlier.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@713 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 19:27:10 +00:00
kiran
5858f20902
Documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@712 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 18:58:43 +00:00
kiran
68c9455c0f
Moved the base complement method to BaseUtils.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@711 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 18:57:48 +00:00
kiran
3761c0900b
Added Bustard vs. Four-prob percent bases consistent output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@710 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 18:01:41 +00:00
ebanks
7a1f85ff86
option to print out the indels found by the cleaner to a file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@709 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 17:50:08 +00:00
ebanks
b62bddee42
The header was never being set.
...
Added this hack for now and will alert the authorities ASAP...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@708 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 17:18:51 +00:00
kiran
959cf09d4b
Removed some debugging print statements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@707 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 17:12:42 +00:00
kiran
2f42a643a8
A new, much simpler (and now, complete) driver program for four-base probs. Serves as a model for anyone who wants to write their own driver program that trains and calls with data from a different source than the raw Illumina data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@706 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:58:22 +00:00
kiran
5824dea0c1
Trains and calls a read at a time rather than a base at a time (which, given it's name, it should have done in the first place)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@705 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:57:00 +00:00
kiran
e4770885fd
The four-probs for all bases in a single read. Some utility functions for generating the primary and secondary base strings, as well as generating the SQ tag byte array in a manner that's consistent with the Bustard base calls (meaning the primary Bustard call and the secondary Four-Prob call are not permitted to be the same).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@704 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:55:49 +00:00
kiran
fdd123fe16
A parser the raw Illumina data. Allows one to arbitrarily jump from one tile to another.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@703 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:53:07 +00:00
aaron
7aa90757ac
Moved the iterators over to the StingSAMIterator interface. This will help us ensure that iterators that need to be closed get closed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@702 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:52:18 +00:00
kiran
6d98234555
Holds raw intensities, sequence, and quality scores.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@701 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:52:03 +00:00
kiran
241de0b235
A class that implements multiple training strategies and presents the training data in a common form.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@700 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:51:29 +00:00
kiran
64c65c7751
New methods to generated compressed SQ quality elements in line with the SAM spec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@699 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:50:31 +00:00
aaron
c3b2c66911
The GATK doesn't need the rest
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@698 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:20:45 +00:00
aaron
0215905bb6
Added an adapter class, that will adapt plain iterators and closeable iterators of SAMRecords into STingSAMIterators. Also unit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@697 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 15:17:32 +00:00
ebanks
5dda448ae0
1. Add printouts for the cleaner
...
2. First pass at the entropy interval walker (still needs work)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@696 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 13:59:48 +00:00
hanna
80c13f7127
Added a getter for command-line arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@695 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 13:55:52 +00:00
hanna
307c6e4ecf
Oops. Forgot to add new file to svn.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@694 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 00:52:30 +00:00
hanna
d14cab0be7
Added IterableLocusContextQueue and test. Cleaned up tests, adding BaseTest where it didn't exist. Enhanced test runner to run only classes ending in ...Test.java, so that utility classes can sit alongside the tests but won't be run by JUnit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@693 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 21:32:05 +00:00
asivache
7b59f63f12
and don't forget to close sam writer after we are done...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@692 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 20:46:36 +00:00
asivache
de0cce87ea
new optional arg added that allows to specify a separate bam file to send all piles that fail to realign to; plus minor fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@691 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 20:24:23 +00:00
hanna
12ae3a22b6
Break locus context data access providers into modular components in preparation for traverse by loci.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@689 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 18:51:16 +00:00
jmaguire
7084ecdeb6
a few changes; checked in to allow debugging.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@688 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 15:50:48 +00:00
depristo
5b47c5ab6c
fixing kiran's busted build
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@686 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 21:29:04 +00:00
kiran
4f2c8bf0a3
Fixed an import statement that broke when all the files were moved to this directory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@685 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:43:16 +00:00
kiran
cedc4c9ccb
Refactored into oblivion.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@684 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:33:07 +00:00
kiran
4e4767e5de
Moved to org.broadinstitute.sting.secondarybase
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@682 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:26:43 +00:00
kiran
219eb60716
Added newly-required documentation to arguments so that build can complete successfully.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@681 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:26:10 +00:00
kiran
688358190c
Moved secondary base stuff out of playground for the purpose of making it a core utility. Modified package names and imports such that things would build properly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@680 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:24:18 +00:00
kcibul
8079acb1d3
basic step0 implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@679 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:49:39 +00:00
kiran
57ecb7fbf1
Nicer reporting functions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@678 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:48:30 +00:00
hanna
ee99320c83
Removed at Mark's request.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@677 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:48:21 +00:00
kiran
f1de3d6366
Minor tweaks to how probs are supplied.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@676 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:47:41 +00:00
kiran
095dacd154
Experimental refactoring.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@675 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:46:50 +00:00
kiran
758f8aa89b
Experimental refactoring.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@674 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:46:34 +00:00
andrewk
1518f8f9bf
Update training data creation in CovariateCounterWalker to output much smaller files by counting the number of occurences of each data point combination rather than outputting a line for each data point (i.e. each base). Also fixed bug in LogisticRecalibrationWalker where a null SAMHeader was being pulled from a function that is now marked deprecated.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@673 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:23:14 +00:00
ebanks
4c12df372c
Dumb, dumb bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@672 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:21:33 +00:00
aaron
6e69193e3c
Deprecated calls to getSamReader on both the GenomeAnalysisEngine and the TraversalEngine. This call fails in the new style traversals, but it won't disapear until the cut-over to the new traversals is complete.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@671 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 18:52:42 +00:00
ebanks
630066cc0a
1. Merge LocusWindows whose reads overlap.
...
2. Fix bug (we weren't clearing the "to emit" list)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@670 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 17:33:23 +00:00
aaron
9f942fdfa0
Added code to correct the violation of the parsing interface. Now the analysis type resides in the command line arg, but is stored into the argument collection before it's passed to the genomeAnalysisEngine.
...
Also fixed a bug where we'd exception-out if we didn't provide a interval region.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@669 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:33:55 +00:00
jmaguire
c4d89997ca
put in a dummy sample_name so it'll compile
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@668 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:12:42 +00:00
jmaguire
c8d7223789
do pooled calling properly for 1kg
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@667 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:12:13 +00:00
jmaguire
313a6d0fb5
lots of changes to facilitate calling indels and 1kG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@666 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:11:42 +00:00
jmaguire
add7b6cf65
add sample_name to constructor, misc bug fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@665 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:10:17 +00:00
jmaguire
0267ccae7f
add code for computing indel genotype likelihoods
...
make reference lods negative
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@664 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:09:29 +00:00
jmaguire
11723fbcc2
added method indelPileup. Generates a pileup of indel alleles given reads and ofsets (as from a locus walker).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@663 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:08:24 +00:00
hanna
ee9077fc69
LocusIterator iterated through LocusContexts, which was fine until now when we need something
...
that iterates through loci (GenomeLocs). Rename LocusIterator to LocusContextIterator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@662 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:54:57 +00:00
hanna
608948210c
Check for a reference before extraction.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@661 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:29:44 +00:00
hanna
32696b13f5
Fixed method override issue with old-style traversals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@660 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:22:18 +00:00
hanna
862b8a6787
intervals_file + genome_loc => intervals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@659 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:04:18 +00:00
hanna
0bca588629
Botched some boolean logic.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@658 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:53:52 +00:00