Eric Banks
64aad67b5f
Fixing dbSNP adaptor for complex indels (wasn)
2011-07-27 16:13:45 -04:00
Mark DePristo
15be383d5b
Merge branch 'master' into rodRefactor
2011-07-27 15:36:49 -04:00
Mark DePristo
38a2518668
Merge branch 'master' into rodRefactor
2011-07-27 15:34:54 -04:00
Mark DePristo
60db6cc836
Warnings for old ROD system use.
...
Removed unused class GATKRODFeature
2011-07-27 12:39:12 -04:00
Mark DePristo
097828a466
ParsingEngine now maintains the list of rodBindings
...
No longer try to reparser objects to find the right fields
Direct support in RodBinding for getTags()
2011-07-27 11:36:53 -04:00
Mauricio Carneiro
20a3b31b61
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-26 19:29:45 -04:00
Mauricio Carneiro
321afac4e8
Updates to the help layout.
...
*New style.css, new template for the walker auto-generated html. Short description is no longer repeated in the long description of the walker.
*Updated DiffObjectsWalker and ContigStatsWalker as "reference" documented walkers.
2011-07-26 19:29:25 -04:00
Kiran V Garimella
405e521d44
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-26 17:56:48 -04:00
Kiran V Garimella
92a11ed8dc
Updated MD5 for PhaseByTransmissionIntegrationTest
2011-07-26 17:52:25 -04:00
Kiran V Garimella
412c466de6
Bug fix, wherein triple-hets after genotype refinement need to be left unphased, not just prior to refinement
2011-07-26 17:43:43 -04:00
Mark DePristo
81f8e05bfa
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-26 17:35:46 -04:00
Mark DePristo
f6a5e0e36a
Go for global integrationtest path first, if possible.
2011-07-26 17:35:30 -04:00
Matt Hanna
fec495e292
Fix a nasty little bug in the sharding system: if the last shard in contig n
...
overlaps exactly on disk with the first shard in contig n+1, the shards
would be merged together to avoid duplicate extraction. Unfortunately,
the interval overlap filter couldn't handle shards spanning contigs, and
was choosing to filter out reads from contig n+1 which should have been
included.
I'm not completely sure why the BAM indexing code would ever specify that the
end of one chromosome had the same on-disk location as the start of the next
one. I suspect that this is a indexer performance bug.
2011-07-26 15:43:20 -04:00
Mark DePristo
9dfb57168a
RodBinding source is no longer assumed to be a file
2011-07-26 13:59:44 -04:00
Mark DePristo
d0badd5bd6
RodBinding subclassed to VariantContextRodBinding for easy access to VariantContext providing RODs
2011-07-26 13:54:55 -04:00
Mark DePristo
7ab8b53339
Support for List<RodBinding> argument type
2011-07-26 11:37:31 -04:00
Mark DePristo
38969b9783
Prototype of RODBinding @Arguments instead of -B syntax
...
Initial version of RodBinding class.
Flow from walker Rodbinding @Arguments -> RMDTriplet (old system) -> GATK engine (standard). Will need refactoring.
2011-07-26 11:09:06 -04:00
Matt Hanna
088fc39308
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-25 15:54:56 -04:00
Eric Banks
a53aeb75ab
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-25 15:10:35 -04:00
Eric Banks
a29554e565
Removing the Genomic Annotator and its supporting classes
2011-07-25 15:10:25 -04:00
Mark DePristo
3afcb3415d
Max of 1000 records will be loaded and compared to avoid heap size problem.
2011-07-25 14:58:31 -04:00
Mark DePristo
2a51543693
Actually should have been gone...
2011-07-25 13:27:42 -04:00
Mark DePristo
ebfd8df06c
Restoring accidentially deleted unit test
2011-07-25 13:25:30 -04:00
Mark DePristo
f3049fba63
refdata directory cleanup
...
Removing unused files RODRecordIterator, ReferenceOrderedData, QueryableTrack, RMDTrackCreationException, GATKFeatureIterator, ReferenceOrderedDataUnitTest
Refactored dbSNP and refseq utilities to be closer to the other files implementing these features
2011-07-25 13:21:52 -04:00
Matt Hanna
8014fad6ff
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-25 13:20:44 -04:00
Matt Hanna
2ac490dbdf
Fix improper detection of command-line arguments with missing values.
2011-07-25 13:20:00 -04:00
Mark DePristo
90947ab359
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-25 12:53:56 -04:00
Mark DePristo
44bd9ae703
Restoring UninstantiableWalker, as it is not going to be possible to run ant test; ant gatkdocs without ant clean in between
2011-07-25 12:53:06 -04:00
Mark DePristo
acda8eb09c
Commented out test that causes new CommandLineGATK() to fail
2011-07-25 12:43:27 -04:00
Mauricio Carneiro
95b48eface
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable into repval
2011-07-25 12:09:09 -04:00
Kiran V Garimella
357f503a21
Merge branch 'desktop'
2011-07-25 11:36:27 -04:00
Kiran V Garimella
0b43ee117c
Added the required=false tag to the -noST and -noEV arguments so the auto-help output doesn't look weird (i.e. listing arguments as required when their value has already been specified by default).
2011-07-25 11:35:34 -04:00
Kiran V Garimella
bbb8473f03
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-25 10:59:00 -04:00
Mark DePristo
1a268ff1fd
Refactor so that GenotypeAnnotation and InfoFieldAnnotation share common superclass VariantAnnotatorAnnotation
2011-07-25 10:55:09 -04:00
Mark DePristo
7f8e6a97ee
InfoFieldAnnotation now an abstract class extended by annotations so doc system works
2011-07-25 10:47:11 -04:00
Mauricio Carneiro
4c6c16f895
Documented following the new gatkdoc framework
2011-07-25 00:25:08 -04:00
Mark DePristo
2039ce6102
Default values now displayed in arguments
...
DiffEngine fixed so that newInstance() would work. Pretty quickly encountered a situation where newInstance() failed. Debug output now written when this occurs in the log.
Logger now used instead of standard out, with INFO the default level.
2011-07-24 22:56:55 -04:00
Mark DePristo
c43b5981f2
Hidden variables are hidden by default. Settable by command line option
...
DiffObjectsWalker test arguments removed.
Minor refactoring of GATKDoclet
2011-07-24 20:52:44 -04:00
Mark DePristo
1c1f1da349
Fixing compilation
2011-07-24 20:01:59 -04:00
Mark DePristo
9f06f6c493
Split GATKDoclet from ResourceBundleDoclet. Refactored GaTKDocWorkUnit
2011-07-24 20:00:04 -04:00
Mark DePristo
ff85687679
Merge branch 'master' into help
2011-07-24 18:14:32 -04:00
Mark DePristo
83996f7951
Enumerated types are working.
2011-07-24 18:14:21 -04:00
Mark DePristo
3c34e9fa65
Cleanup emuns and tables
2011-07-24 17:45:58 -04:00
Mark DePristo
c620d96c96
Inline enum documentation is working
2011-07-24 17:22:14 -04:00
Mark DePristo
793e7d3d1d
Improved header and argument details
...
Argument detail structure cleaned up. Only relevant pieces of information are shown now, and in a cleaner layout.
Misc. cleanup in the code.
2011-07-24 16:36:25 -04:00
Mark DePristo
c6af4efcdc
Implemented see also and version header
2011-07-24 16:10:17 -04:00
Mark DePristo
5e0fe2d0f9
Support for style.css via refactored common.html included in all files
2011-07-24 15:42:39 -04:00
Mark DePristo
d0ab6bf7a9
Now links to sub and superclass documentation, where possible.
2011-07-24 09:56:17 -04:00
Mark DePristo
e2dabb70b8
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-24 08:57:47 -04:00
Mauricio Carneiro
1ef964c92c
Merge branch 'contig'
2011-07-24 02:40:42 -04:00
Mauricio Carneiro
7ffedf211c
Contig comparator -- sorting contigs like Picard
...
This is very useful if you want to output your text files or manipulate data in the usual chromosome ordering :
1
2
3
...
21
22
X
Y
GL???
...
Just use this comparator in any SortedSet class constructor and your data will be sorted like in the BAM file.
2011-07-24 02:33:19 -04:00
Mark DePristo
6b501e267b
Includes non-concrete classes in docs
...
CommandLineGATK has extraDocs to ReadFilter and UserException as well
2011-07-23 22:15:01 -04:00
Mark DePristo
7420ed098e
Semi-working version of extraDocs tag in annotation to refer to one capability being accessible in another
...
Required a significant refactoring of the GATKDoclet, which now has a unified place where the ClassDoc, class, annotation, and handler are all stored together.
2011-07-23 22:07:30 -04:00
Mark DePristo
999acacfa1
Merge branch 'master' into help
2011-07-23 20:19:33 -04:00
Mark DePristo
1d3bcce2c4
Merge branch 'master' into NoDistributedGATK
2011-07-23 20:04:50 -04:00
Mark DePristo
e262f4e10b
gatkdoc now generalized to use @Annotation. Multiple subsystems now use annotation to receive docs
...
Index expanded to use summary() annotation field
UserExceptions, ReadFilters, GATK engine all use the system to generate docs
Doclet expanded to handle lots of new cases
2011-07-23 20:00:35 -04:00
Kiran V Garimella
0b36b6540f
Merge branch 'laptop'
2011-07-23 01:44:54 -04:00
Kiran V Garimella
e23cb27451
Modified MD5 to account for the triple hets that shouldn't be phased
2011-07-23 01:44:44 -04:00
Kiran V Garimella
1dba8b768c
Merge branch 'laptop'
2011-07-23 01:39:15 -04:00
Kiran V Garimella
57e3d136eb
Don't try to phase triple-hets either.
2011-07-23 01:38:58 -04:00
Kiran V Garimella
f366124778
Merge branch 'laptop'
2011-07-23 01:25:36 -04:00
Kiran V Garimella
45f2ca8d99
Changed MD5 to reflect latest changes to PhaseByTransmission.
2011-07-23 01:21:07 -04:00
Kiran V Garimella
5af9d50183
Merge branch 'laptop'
2011-07-23 01:12:06 -04:00
Kiran V Garimella
5521919cc9
Fixed bug where variants to phase were not being selected properly.
2011-07-23 01:11:28 -04:00
Kiran V Garimella
7da99388ac
Merge branch 'laptop'
2011-07-23 01:01:11 -04:00
Kiran V Garimella
58eed20b83
Copy all entries from the attributes map, rather than attempting to modify an unmodifiable map.
2011-07-23 01:00:46 -04:00
Kiran V Garimella
b5deff48e6
Merge branch 'laptop'
2011-07-23 00:56:50 -04:00
Kiran V Garimella
5638017137
Removed the nofilters argument specification in the integrationtest
2011-07-23 00:56:23 -04:00
Kiran V Garimella
ffa361f57f
Merge branch 'laptop'
2011-07-23 00:50:38 -04:00
Kiran V Garimella
9417ba8c2c
Modified to accept multi-sample VCFs, removed the application of filters, and changed transmission probability field to be a genotype field rather than an INFO field.
2011-07-23 00:48:26 -04:00
Mark DePristo
28b9432d26
Docs for read filters, the engine, and the UserExceptions.
2011-07-22 16:09:21 -04:00
Kiran V Garimella
051c1dc639
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-22 15:59:00 -04:00
Mark DePristo
f0be7348be
Generalized handler to allow it to be used with any arbitrary class structure.
...
DocumentedGATKFeature now includes a field for the group name.
Build.xml works with public / private now.
2011-07-22 14:07:40 -04:00
Matt Hanna
f50145b872
Reinitialize random seed in the bwa bindings from the fixed seed stored in the
...
BWA support files every time the support files are loaded.
2011-07-22 13:41:53 -04:00
Mark DePristo
453954182e
Generalized the documentation system to use a class-specific annotation and processor.
...
Need to generalize and bug fix the system. But at a high level it's working now.
2011-07-22 13:18:33 -04:00
Kiran V Garimella
b8a0fd2a8d
Multiply fractionRandom by 100.0 so that the line that indicates the percentage of variants that will be output says (for instance) 90%, not 0.9%
2011-07-22 11:54:59 -04:00
Mark DePristo
9e88d51db9
Removed now unused @version tags from walker docs.
2011-07-22 09:57:03 -04:00
Mark DePristo
421b70ca4f
Removed previous, and largely unused, help system extensions.
...
This involved deleting the utils/help/*Taglet.java classes, which parsed out these fields unnecessarily
This also involved removing the few uses of these from the codebase. For these uses, though, almost all were an identical copy of the first line of the docs, which is the default javadoc behavior anyway.
2011-07-22 09:42:44 -04:00
Mark DePristo
172b35372b
Moved all of the distributed GATK code to archive.
2011-07-22 09:20:32 -04:00
Khalid Shakir
8b8f121cfb
Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-21 23:01:11 -04:00
Khalid Shakir
59eb1f4663
Memory limits changed from Int to Double.
...
Updated LSF calls to read memory units from config along with tweaks to select hosts.
Moved some common code from GridEngine and LSF to super classes.
2011-07-21 22:57:18 -04:00
Mauricio Carneiro
8d7ef1bb51
Complete refactor of the ReplicationValidation framework, plus the following new functionality:
...
* merges all pools in a lane.
* merges all lanes in a site.
2011-07-21 21:39:00 -04:00
Mark DePristo
81d0cab27e
Walker index html now emited.
2011-07-21 16:01:54 -04:00
Mark DePristo
e892489696
V2 of the document system.
...
Now uses GATKDoc class to organize documentation for arguments.
Arguments now listed by feature (required, optional, hidden, etc) and link to detailed information about the argument in the html
Lots of code moving between Class and ClassDoc objects. Should be refactored into a single static utility class.
2011-07-21 15:20:34 -04:00
Christopher Hartl
2f5d10d16b
Fix bug wherein aligner could be closed prior to its being used to lowercase sequences.
2011-07-21 13:21:48 -04:00
Matt Hanna
7054c5342f
When using the BWA bindings, you have to explicitly call close() to get the
...
bindings to release memory.
It may or may not be possible to implicitly close triggered by the GC; I'll add a JIRA.
2011-07-21 12:13:29 -04:00
Christopher Hartl
15610ce0c3
Per Matt's request, disabling BWA-based integration tests so he can assess bamboo memory usage.
2011-07-21 11:04:22 -04:00
Mark DePristo
6fa17d86ae
Completely hacked together version of a FreeMarker + javadoc + custom doclet walker documentation generator
2011-07-21 00:18:07 -04:00
Mark DePristo
45c73ff0e5
Runs and emits an HTML document
2011-07-20 17:16:33 -04:00
Mark DePristo
d31b176e15
Removed GATK use of distributed parallelism framework.
...
Moved distributed GATK prototype code into distributedutils, separating from threading package
2011-07-20 16:26:09 -04:00
Guillermo del Angel
0a1d2df8cb
Merged bug fix from Stable into Unstable
2011-07-20 13:19:35 -04:00
Guillermo del Angel
f15023b7d2
Bad bug fix: output GLs in multiallelic records were in incorred order (misread spec)
2011-07-20 12:10:48 -04:00
Guillermo del Angel
b9c9e0e952
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-20 10:45:16 -04:00
Guillermo del Angel
7140280bf6
Further bug fixes/cleanups for PrintReadsWalker
2011-07-20 10:44:37 -04:00
Guillermo del Angel
a2d90a3590
Bug fix: reverted logic so that default behavior skips over sample lookup
2011-07-20 10:23:10 -04:00
Guillermo del Angel
e8409c80fa
Further protection vs null pointers in PrintReadsWalker
2011-07-19 21:59:24 -04:00
Christopher Hartl
5d706c9e92
Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
...
Removing PSP and CSM
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/sequenom/CreateSequenomMask.java
public/java/src/org/broadinstitute/sting/gatk/walkers/sequenom/PickSequenomProbes.java
2011-07-19 20:25:33 -04:00
Guillermo del Angel
fb2d475c22
Bug fix to prevent null pointer
2011-07-19 20:13:56 -04:00
Christopher Hartl
92c7cfa1c8
BWA bindings and tests moved to public (was required for ValidationAmplicons)
...
Integration tests for ValidationAmplicons. New argument to disable BWA, lowercase letters only for repetitiveness instead.
2011-07-19 20:11:31 -04:00
David Roazen
baae381acb
Revert "Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable"
...
This reverts commit 039a6bb01f345322ce2be50ae3634308bb24e77e, reversing
changes made to b9c9973d1c638dfc9f8c19b5eb845e99844f9d29.
2011-07-19 18:38:53 -04:00
Christopher Hartl
07e716d23a
PickSequenomProbes2 expanded functionality: lowercasing based on sequence uniqueness, preserving reference base prior to indel (not a part of the VC as I thought it was), masking deletion bases with 'N's, flanking insertion with 'N's, output is a fasta formatted file. Renamed to ValidationAmplicons since this is really not for picking sequenom probes, but for generating amplicon sequence from which other applications (like sequenom) can choose PCR primers. Moved from private to public.
2011-07-19 15:21:47 -04:00
Guillermo del Angel
6181d1e4cb
Fixed integration test for VariantsToTable: now the * in REF column is not output
2011-07-19 14:42:11 -04:00
Guillermo del Angel
e6d306458c
Merge bug fixes
2011-07-19 14:36:20 -04:00
Guillermo del Angel
989dd17f95
a) Add ability in PrintReads to specify a sample file to easily subset samples, useful for IGV visualization, b) VariantsToTable is more R-friendly with Indels when printing ref/alt columns, c) Changes to SelectVariants ability to speficy a mask to randomly sample from a given AF distribution
2011-07-19 14:29:07 -04:00
Mark DePristo
8f0badc52b
Updating md5s, as the diffobjects walker now emits the summary in reverse order.
2011-07-18 15:44:21 -04:00
Mark DePristo
c05451047c
Support for multiple records at the same site. The first record gets chr:start, and subsequent records get chr:start_2, chr:start_3, etc.
2011-07-18 15:43:52 -04:00
Mark DePristo
782a05e9b5
Support for sorting the diff output in reverse order.
2011-07-18 15:43:01 -04:00
Mark DePristo
45702d3084
Now supports a mode where the primary key isn't sorted. In this case the records are displayed in the order in which they are added to to the table.
2011-07-18 15:40:15 -04:00
Eric Banks
83ba2c066a
Making it deterministic
2011-07-18 13:59:02 -04:00
Eric Banks
92fa410450
Check that it's a valid bam file before parsing or bad things can happen
2011-07-18 13:43:34 -04:00
Eric Banks
80b5c5261a
CombineVariants no longer combines records of different types. So now when combining SNP and indel callsets, overlapping calls get their own records. Useful for Khalid in the pipeline. For those interested, it turns out the previous behavior was doing the wrong thing occasionally (and this was even captured in the integration tests).
2011-07-18 13:42:45 -04:00
Eric Banks
bc8b5da698
Added docs while I was reading through the code to understand it
2011-07-18 12:25:54 -04:00
Mark DePristo
51b0dd01c3
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-18 10:47:29 -04:00
Mark DePristo
d6e2e89f99
Walker test system refactoring. All MD5DB related functions are now in MD5DB.java.
...
System has the concept of a local and a global MD5 db. The local one is like it operated previously. The global one lives in /humgen/gsa-hpprojects/GATK/data/integrationtests. If the system can find this directory then MD5s will also be read / written to this location. This means that gsabamboo will print differences as appropriate. And all users will in effect have access to a complete history of MD5 file results.
A few minor code reshuffles changed VariantRecalibration and VCFHeader test files.
2011-07-18 10:46:01 -04:00
Mark DePristo
6f26c07b85
Removed the SpecificDifference class. Now Difference classes always have the option to remember specific master and test values. This means that all summarized differences carry with them specific examples of their differences. Consequently, now even summarized differences give at least one example of the specific difference, even when the count of the difference is > 1. Unit tests updated. Added DiffObjects integrationtest. VCFDiffableReader now specifically reads the first line of the VCF file to capture the version number.
2011-07-18 10:42:35 -04:00
Kiran V Garimella
b2b7d27fed
Merge branch 'laptop'
2011-07-18 00:25:46 -04:00
Kiran V Garimella
497721a799
Added class documentation string.
2011-07-18 00:25:21 -04:00
Kiran V Garimella
ac9c66138d
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-18 00:20:33 -04:00
Kiran V Garimella
824100e57f
Corrected typo in MergeAndMatchHaplotypes integration test
2011-07-17 22:50:54 -04:00
Kiran V Garimella
8167aba601
Moved (poorly named) MergeAndMatchHaplotypes to public. Added integration test
2011-07-17 22:47:32 -04:00
Kiran V Garimella
afb506e128
Added MD5s for PhaseByTransmission integration tests
2011-07-17 21:55:33 -04:00
Kiran V Garimella
558e197989
Integration test for PhaseByTransmission
2011-07-17 21:25:08 -04:00
Mark DePristo
9992c373be
Optimize imports run on the whole project, public and private. I just got too tired of all of the unused imports floating around. Confirmed that the system builds after the changes.
2011-07-17 20:29:58 -04:00
Kiran V Garimella
4ea433f8e1
Moved PhaseByTransmission to public
2011-07-17 19:42:00 -04:00
Mark DePristo
9ca9cf52ac
Uncommenting a stray commented test.
2011-07-17 15:38:33 -04:00
Mark DePristo
4db2b13e9e
Rev tribble.
...
Just added more documentation for diffEngine and pointer to new wiki:
http://www.broadinstitute.org/gsa/wiki/index.php/DiffEngine
2011-07-17 13:05:04 -04:00
Mark DePristo
92a1c0c278
Moved the varianteval/tags/DataPoint.java and varianteval/tags/Analysis.java to varianteval/utils. This allows rsync to see these files with the -C option, as tags is some kind of reserved CVS keyword.
2011-07-17 10:14:23 -04:00
Mark DePristo
eacf205f40
Tests needed to be updated to reflect the code reorg of tribble.
2011-07-16 09:22:34 -04:00
Menachem Fromer
72f4cf9c0e
Walker to perform deterministic annotation of phasing by transmission (to be compatible with RBP's definition of consecutive pairwise phasing)
2011-07-15 17:44:31 -04:00
Guillermo del Angel
9d59c2cb61
a) Made indel VQSR consensus script operational again, b) Made VariantsToTable more indel-friendly when printing out REF and ALT fields: strip out * from REF and print out alleles in the same way as the VCF so that offline processing is easier
2011-07-15 10:13:02 -04:00
Guillermo del Angel
10cf9245d7
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-14 19:18:05 -04:00
Mark DePristo
c0bbeb23ba
Now providing more information when the index on the fly isn't equal to the one created by reading the file from disk.
2011-07-14 15:12:28 -04:00
Mark DePristo
5ffeddd3b1
better to use _ instead of ., as this is a special case later.
2011-07-14 14:45:16 -04:00
Eric Banks
9540df6998
Oops, forgot to update unit test
2011-07-14 14:00:19 -04:00
Eric Banks
ed6beae1f3
Adding headers to diffable reading for VCFs
2011-07-14 13:55:35 -04:00
Eric Banks
66c652d687
Added some extra error checks in the VCF codec. Now that we've moved this back into the GATK, changed some of the standard exceptions to be USerErrors (instead of TribbleExceptions).
2011-07-14 11:56:10 -04:00
Eric Banks
0c54c796ed
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-13 14:57:33 -04:00
Eric Banks
bb0e3a26fc
Added integration test for VCF writing. Also, bug fix for writing the GT-free records.
2011-07-13 14:57:21 -04:00
Eric Banks
6a431da554
Don't output source and ref header lines anymore. Short-term motivation for this is that I'd like this tool when run on a VCF to emit the exact same VCF. Long-term motivation is that these tags should be output by the VCF writer itself for all tools.
2011-07-13 14:40:01 -04:00
Menachem Fromer
74aa49e423
Merged bug fix from Stable into Unstable
2011-07-13 12:12:42 -04:00
Menachem Fromer
fa3ff53508
Filters should only be applied to the new VC if the old VC had filters applied
2011-07-13 11:58:16 -04:00
Eric Banks
969227c657
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-13 10:01:28 -04:00
Eric Banks
797c50e689
Fixing integration tests I broke yesterday; removing batch merging test since we don't support that anymore.
2011-07-13 10:01:23 -04:00
Eric Banks
6007eea3ff
Allowing VCF records without GTs in vf4.1
2011-07-13 09:56:08 -04:00
Guillermo del Angel
1e81d521c0
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-12 20:12:29 -04:00
Ryan Poplin
837fb8f689
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-12 15:39:26 -04:00
Ryan Poplin
5077c94d85
Adding MappingQualityUnavailableReadFilter to the SNP and indel CountCovariates
2011-07-12 15:39:07 -04:00
Mark DePristo
01fd6a6949
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-12 15:20:44 -04:00
Mark DePristo
ccedd6ff4c
Difference is now the general form -- used to be SummarizedDifference. The old Difference class is now a subclass of Difference that includes pointers to specific the master and test DiffElements.
...
Added a size() function that calculates the number of elements tree from a DiffElement.
2011-07-12 15:20:28 -04:00
Eric Banks
a2597e7f00
This commit incorporates several different changes that each pretty much break all the VCF-based integration tests, so I bunched them all together. We now officially emit VCF4.1 files (woo hoo), which means that the VCF headers are now all different (header version is 4.1 plus counts for some of the annotations are 'A' or 'G'). Also, I've added a Read Filter for reads with MQ=255 ('unavailable' in the SAM spec) and have applied this to the UG and the RMS MQ annotation.
2011-07-12 14:11:53 -04:00
Ryan Poplin
329c3d8050
Merged bug fix from Stable into Unstable
2011-07-12 13:55:51 -04:00
Ryan Poplin
73735863b0
Fix for the case of requesting genotype for a sample that doesn't exist in a VariantContext
2011-07-12 13:55:21 -04:00
Guillermo del Angel
c4c145afb9
Merged bug fix from Stable into Unstable
2011-07-12 13:44:48 -04:00
Guillermo del Angel
cfe43e3971
Bug fix for Genotype given alleles: if we are in INDEL mode ignore SNPs and MNPs instead of emitting an empty site with alleles but no annotations
2011-07-12 13:43:46 -04:00
Guillermo del Angel
bfbca8b194
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-12 12:11:58 -04:00
Mark DePristo
05212aea62
reader now takes an argument for the maximum number of elements to read from the file.
2011-07-12 08:53:19 -04:00
Mark DePristo
8056a3fe89
getElement() now uses O(1) get from hash instead of linear O(n) search. Enables us to read large files easily.
2011-07-12 08:52:31 -04:00
Mark DePristo
f313e14e4e
Now deletes the dump directory on ant clean
...
Moving diffengine tests from private to public
2011-07-12 08:50:58 -04:00
Eric Banks
d7d15019dd
Adding support for other simple header line types (e.g. ALT) and cleaning up the interface a bit.
2011-07-12 01:16:21 -04:00
Eric Banks
400b0d4422
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-11 23:38:57 -04:00
Mark DePristo
d5056ad899
Merge branch 'master' into diffit
2011-07-11 23:16:15 -04:00
Mark DePristo
893cc2e103
Making the package public, so there's no dependances from public -> private
2011-07-11 23:15:08 -04:00
Mark DePristo
5e593793af
DiffEngine utility function simpleDiffFiles
...
printSummaryReport now uses GATKReport for nice formating
Moved print formatting arguments into inner class provided to printing functions themselves, not the class
BAMDiffableReader only reads 1000 entries to avoid performance issue. Work around for BAM files with non-unique names
Uncommented all of the incorrectly commented out CombineVariants integrationtests
BaseTest now uses DiffEngine to provide inline differences to VCF and BAM files
2011-07-11 23:10:27 -04:00
Eric Banks
e3748675db
Support for VCF 4.1 header counts
2011-07-11 17:40:45 -04:00
Guillermo del Angel
f54c2ae3b4
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-11 16:26:27 -04:00
Christopher Hartl
d6517adb42
Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-11 16:16:37 -04:00
Christopher Hartl
86890c6357
N and K (in binomial probability) got switched in RFA Walker with the last commit. No longer will NaNs be produced.
...
Added: TableToVCF. Kind of a longer-term project, but there are lots of variant calls available in a weird tabular format. I used this to convert Ju Et Al small indels to VCF. I'll check against the 1000G ASN superpopulation calls to see if we see a good amount of recapitulation, and if so, i'll put them in unvalidated comparisons. Minor chances to the TableCodec and TableFeatures to allow for this (the codec can sometimes drop a column, and the feature now allows you to grab on to its header).
2011-07-11 16:16:15 -04:00
Guillermo del Angel
d587856f2d
Private feature to input a list of family descriptions from a file and to look for MV's on all of these. Feature can also output a detailed description of the violation into a separate file
2011-07-11 14:17:59 -04:00
Guillermo del Angel
6e7b5e1e7a
Merged bug fix from Stable into Unstable
...
Merge branch 'master' into unstable
2011-07-08 21:19:45 -04:00
Guillermo del Angel
7fbc5987d0
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-07-08 21:17:32 -04:00
Mark DePristo
bd29236684
Merge branch 'master' into diffengine
2011-07-08 14:08:17 -04:00
Guillermo del Angel
224574424e
Bug fix: if we're genotyping a very long indel (>100 bp) fail gracefully instead of with an array out of bounds exception
2011-07-08 12:48:49 -04:00
Ryan Poplin
2a4b3ae4a2
Cleaning up / removing most of the monkeying around with annotation values that happens in VariantDataManager
2011-07-08 12:48:33 -04:00
Mark DePristo
8add2a3866
Merge branch 'master' into diffengine
2011-07-08 09:15:54 -04:00
Eric Banks
cc143493e3
Merged bug fix from Stable into Unstable
2011-07-07 23:01:24 -04:00
Eric Banks
4cfe0dd857
Test for bad alleles so that we don't generate IndexOutOfBoundsExceptions
2011-07-07 23:01:03 -04:00
Mark DePristo
3d4f0e9dd7
Now supports the case where you have multiple AC values in the info field.
2011-07-07 17:21:15 -04:00
Ryan Poplin
212e9a1a0c
Fixing unstable build after stable commit
2011-07-07 15:18:57 -04:00
Ryan Poplin
11d9a0473a
Merged bug fix from Stable into Unstable
2011-07-07 15:03:58 -04:00
Ryan Poplin
50111db2b7
Fixing non-determinism in single-threaded VQSR by moving references to cern.Normal over to the static random generator available in GenomeAnalysisEngine
2011-07-07 15:02:48 -04:00
Guillermo del Angel
4d565b0811
Merge branch 'incoming'
2011-07-07 06:21:05 -04:00
Guillermo del Angel
55c8c05060
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-07 06:18:29 -04:00
Guillermo del Angel
5ab2e83904
a) Cosmetic modifications to IndelType annotation. b) Add ability to select samples from a file in PrintReads, c) fixes to shaped AF random selection in SelectVariants
2011-07-07 06:15:10 -04:00
Mark DePristo
ccf34f7e45
(1) Added very useful helper class TestDataProvider to BaseTest that making creating data providers for TestNG far easier
...
(2) DiffEngine now officially working with with summaries. Extensive UnitTests all around!
2011-07-06 21:57:22 -04:00
Eric Banks
52f6f9fdcc
Merged bug fix from Stable into Unstable
2011-07-06 16:05:48 -04:00
Eric Banks
54121eb082
Catch malformed bams that cause the writer to run in infinite loops
2011-07-06 16:05:08 -04:00
Eric Banks
76a01a7453
Merged bug fix from Stable into Unstable
2011-07-06 12:53:09 -04:00
Eric Banks
14fee4ccbd
Patch from Bob to deal with symbolic alleles: these weren't getting padded but they should be.
2011-07-06 12:51:44 -04:00
Ryan Poplin
bdef233d4d
Merged bug fix from Stable into Unstable
2011-07-06 10:05:02 -04:00
Ryan Poplin
e8ed6b7f0f
Adding more comments to main VQSR walker. Fixing copyright lines. Bug fix for default paths to now point to public/R/ instead of R/ Bug fix in VQSR for the path to the R scripts not ending in a slash.
2011-07-06 10:01:14 -04:00
Guillermo del Angel
8e8b901d12
Merged bug fix from Stable into Unstable
...
Merge branch 'master' into unstable
2011-07-06 09:57:55 -04:00
Guillermo del Angel
81a4d18468
Mark several indel-related arguments as @Hidden
2011-07-06 09:56:38 -04:00
Guillermo del Angel
9124c84a7c
bug fixes
2011-07-04 21:10:44 -04:00
Guillermo del Angel
bb85f232b9
bug fixes
2011-07-04 21:04:49 -04:00
Guillermo del Angel
f26ffeaea0
bug fixes
2011-07-04 20:48:45 -04:00
Guillermo del Angel
04df153f47
bug fixes
2011-07-04 20:45:10 -04:00
Guillermo del Angel
7a04872a3f
bug fixes
2011-07-04 20:33:59 -04:00
Guillermo del Angel
08bc843d4c
SelectVariants can get a table to boost AF when choosing randomly
2011-07-04 20:23:22 -04:00
Guillermo del Angel
fac082de64
Report only highest AF and AC in multiallelic records in VariantsToTable or else R can't parse table
2011-07-03 14:32:12 -04:00
Guillermo del Angel
abe9480c6d
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-02 21:19:15 -04:00
Ryan Poplin
fb315b5f8c
Merge branch 'incoming'
2011-07-02 18:10:48 -04:00
Ryan Poplin
41d46059e7
fixing bad format statement
2011-07-02 18:09:17 -04:00
Ryan Poplin
3804afeb8a
Merge branch 'incoming'
2011-07-02 17:55:39 -04:00
Ryan Poplin
781c0c33a4
Use the worst X% of calls in addition to the bad training sites list. Don't include the already added calls in the calculation of X%
2011-07-02 17:55:10 -04:00
Ryan Poplin
6b8af6afd8
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-02 17:15:56 -04:00
Ryan Poplin
fdc2ebb321
Adding ability to specify in VQSR a list of bad sites to use when training the negative model. Just add bad=true to the list of rod tags for your bad sites track.
2011-07-02 17:15:13 -04:00
Guillermo del Angel
09af6bbc6c
Ugh - backed out experimental code not for public consumption unintendedly committed
2011-07-02 16:58:57 -04:00
Guillermo del Angel
c6c0dba040
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-02 16:45:34 -04:00
Ryan Poplin
4532a84314
Merged bug fix from Stable into Unstable
2011-07-02 10:48:55 -04:00
Ryan Poplin
5faf40b79d
Moving AnalyzeAnnotations into the archive because it has outlived its usefulness.
2011-07-02 10:39:53 -04:00
Ryan Poplin
17ff5bb094
Variant records coming out of the VQSR are now annotated with which input annotation was most divergent from the Gaussian mixture model. This gives a general sense for why each variant was removed from the callset.
2011-07-02 09:55:35 -04:00
Khalid Shakir
c65e52f88a
Merged bug fix from Stable into Unstable
2011-07-01 20:50:56 -04:00
Khalid Shakir
b6bc64a0c8
Cleanup of the utils.broad package.
...
Using Picard IoUtils on sample names.
2011-07-01 20:47:03 -04:00
Eric Banks
0c9105ca22
Minor fix of description
2011-07-01 18:07:35 -04:00
David Roazen
d647ea4fdc
Long-delayed change to CachingIndexedFastaSequenceFile. Made the cache
...
non-static to avoid problems when multiple references are used within the same
thread (eg., during integration tests). This should kill the intermittent
IndelRealignerIntegrationTest failures.
2011-07-01 16:04:30 -04:00
Eric Banks
761347b8d5
The VariantContext utility method used by SelectVariants wasn't checking the filter status (unfiltered vs. passing filters) and always returned a VC that was passing filters. This is fixed and the md5 from the VCF Streaming test has been re-updated.
2011-06-30 15:26:09 -04:00
Mauricio Carneiro
867056af51
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-06-30 15:03:18 -04:00
Mark A. DePristo
defa3cfe85
Moved around private walkers into appropriate directories in private gatk.walkers. Moved a few public walkers into private qc package, and some private qc walkers into the public directory. Removed several obviously broken and/or unused walkers.
2011-06-30 14:59:58 -04:00
Mauricio Carneiro
2cb1376ed0
VCFStreaming was failing integration tests because now select variants outputs the samples in alphabetical order, instead of random as before. Fixed the MD5.
2011-06-30 14:55:39 -04:00
Eric Banks
804d5f22d5
Reverting previous change, as promised.
2011-06-30 13:18:30 -04:00
Eric Banks
9e234cf5d6
This is a temporary commit for Picard. It will absolutely break integration tests, but I'm going to revert it in 1 minute. Because we don't want them in unstable, I need to push this into stable.
2011-06-30 13:17:14 -04:00
Eric Banks
352c38fc0b
Updated to reflect dbsnp conversion fix
2011-06-30 11:55:56 -04:00
Guillermo del Angel
331b47afbd
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-06-30 08:29:11 -04:00
Guillermo del Angel
50c32ce52e
VariantsToTableFix
2011-06-29 21:39:53 -04:00
Guillermo del Angel
9b134f3b96
VariantsToTableFix
2011-06-29 21:33:41 -04:00
David Roazen
f18fffd625
Fixing broken paths to the testdata directory throughout the codebase.
2011-06-29 17:36:47 -04:00
Guillermo del Angel
2b88033ef4
Enable considering 454 reads, just lower GOP by 15
2011-06-29 16:12:55 -04:00
Guillermo del Angel
dc4f63a1a8
a) consensus goes to week queue
...
b) New experimental TechnologyComposition annotation
c) SelectVariants fixes
2011-06-29 16:00:23 -04:00
Eric Banks
70ba851478
Might as well check for the illegal state and throw an exception
2011-06-29 15:59:10 -04:00
Eric Banks
1f19afe1d9
Fixed bug in the IndelRealigner: now that variants are correctly typed in VariantContext, it is possible that a variant can be an indel but neither an insertion or a deletion; added a isComplexIndel() method and now we check for such an event in the realigner (we don't use them to generate alternate consenses). Also, added a isMNP() method while I was there so that it would be consistent with other variant types.
2011-06-29 15:54:09 -04:00
Guillermo del Angel
e91ae6b265
AF matching when selecting random variants
2011-06-29 15:00:26 -04:00
Eric Banks
33c67a139c
Wrong package; this should have been moved when VC got moved in from Tribble
2011-06-29 14:56:02 -04:00
Guillermo del Angel
dee10140dd
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-06-29 13:58:04 -04:00
Eric Banks
8586c86bc4
My commit from last week to fix the old dbsnp rod conversion only worked for locus traversals. Updated now to work for all traversals.
2011-06-29 13:56:37 -04:00
Guillermo del Angel
f736a1d61b
Updated md5's from previous checkin
2011-06-29 13:37:15 -04:00
Guillermo del Angel
5b6d279a2e
Two bug fixes:
...
a) Modified the way clipped bases are dealt with in ReadPosRankSumTest when annotating indels. Cigar string cannot be trusted because BWA can clip good high quality bases and some sites get incorrect ReadPos annotations if BWA systematically clips at an indel breakpoint.
b) PL header needs to specify "." as length. Otherwise we fail VCF validation if multiallelic sites are present.
2011-06-29 10:21:27 -04:00
David Roazen
139c6b84a1
Modified build.xml and the help extractor doclet to use the output of "git
...
describe" as an absolute version number (if the repository has at least one
tag), using the raw SHA-1 hash value as a fallback version number in the case
where there are no tags.
2011-06-28 08:37:05 -04:00
David Roazen
3c9497788e
Reorganized the codebase beneath top-level public and private directories,
...
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00