Commit Graph

4497 Commits (b47bebb5a33ca70ff1dfd06cc09ec213fc139c9e)

Author SHA1 Message Date
Geraldine Van der Auwera dfa18a8fc6 Merge pull request #887 from broadinstitute/pd_vcf_cmdline_hdr
Fixed logging of 'out' command line parameter in VCF headers
2015-03-25 00:48:55 -04:00
Ami Levy-Moonshine c5fc5c4f8c create 2 new tools:
- ASEReadCounter (public tool) replce Tuuli's script to produce the input to Manny's tool.
   It count the number of reads that support the ref allele and the alt allele, filtereing low qual reads and bases and keep only properPaired reads
- ASECaller (private tool) take both RNA and DNA, and produce ontingencyTables ** still under development **

minor changes in other tools:
- update RNA HC variant calling scala script
- expose FS method pValueForContingencyTable to be able to call it from ASEcaller

In ASEReadCounter:
- allow different option to deal with overlaping read from the same fragment
- add option to ignore or include indels in the pileups
- add option to disabled DuplicateRead

add ASEReadCounterIntegrationTest.java and files for the test
2015-03-21 16:56:00 -04:00
Phillip Dexheimer 3b567d7a98 Fixed logging of 'out' command line parameter in VCF headers 2015-03-18 23:12:13 -04:00
Geraldine Van der Auwera a75e1d4ce4 Fixes the test that was failing due to gsalib build failure 2015-03-17 04:26:03 -04:00
Phillip Dexheimer 4d4d33404e Added gsa.reshape.concordance.table function to gsalib 2015-03-16 22:52:27 -04:00
Geraldine Van der Auwera 1d39ed9156 Merge pull request #814 from broadinstitute/biocyberman_maven_patches
Biocyberman maven patches
2015-03-13 16:26:02 -04:00
Geraldine Van der Auwera 39a972f348 Merge pull request #872 from broadinstitute/eb_create_rgq_format_field
Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Fixes #870
2015-03-13 13:59:53 -04:00
Geraldine Van der Auwera 7681e89454 Merge pull request #869 from broadinstitute/gg_fix_vqsr_plots_GSA-860
Switched VQSR tranches plot ordering rule
2015-03-13 10:46:55 -04:00
Eric Banks 1ff9463285 Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs.
Now, instead of stripping out the GQs for mono sites, we transfer them to the RGQ.
This is extremely useful for people who want to know how confident the hom ref genotype calls are.
Perhaps this is just what CRSP needs for pertinent negatives.

Note that I also changed the tool to no longer use the GenotypeSummaries annotation by default since
it was adding some seemingly unnecessary annotations (like mean GQ now that we keep the GQ around and
number of no-calls).  Let me know if this was a mistake (although Laura gave me a thumbs up).
2015-03-13 10:27:20 -04:00
Phillip Dexheimer 6ffa295963 Regression: The new 'includeUnmapped' PartitionBy annotation was incorrectly set for HC
Fixes #828
2015-03-13 00:24:57 -04:00
Geraldine Van der Auwera aa4084d42f Switched VQSR tranches plot ordering rule 2015-03-12 19:57:03 -04:00
Geraldine Van der Auwera f8a081a262 Updated readme in public/doc to just point to the website 2015-03-12 11:52:48 -04:00
Ron Levine bee7f655b7 Log a warning if using incompatible arguments in DepthOfCoverage
Add reference gene list file
2015-03-10 18:14:21 -04:00
Ron Levine 71d68c3d93 Fix NotPrimaryAlignmentFilter documentation 2015-03-05 20:30:46 -05:00
biocyberman ff6e288241 Upgrade SLF4J to allow new convient logging syntaxes
Signed-off-by: David Roazen <droazen@broadinstitute.org>
2015-03-02 17:01:10 -05:00
Ron Levine 44e5965a4b Change GC Content value type from Integer to Float 2015-02-25 13:56:42 -05:00
Geraldine Van der Auwera f3a57a6b07 Merge pull request #811 from broadinstitute/seru71_fix_MateSameStrandFilter
Corrected logical expression in MateSameStrandFilter
2015-02-23 17:57:10 -05:00
Ron Levine 2cbaef2fb2 Throw exception for -dcov argument given to ActiveRegionWalkers 2015-02-19 08:24:39 -05:00
seru71 3ee0311fdb corrected logical expression in MateSameStrandFilter
Signed-off-by: David Roazen <droazen@broadinstitute.org>
2015-02-12 12:21:44 -05:00
Phillip Dexheimer 92c7c103c1 GenotypeConcordance: monomorphic sites in truth are no longer called "Mismatching Alleles" when the comp genotype has an alternate allele
* PT 84700606
2015-02-07 15:54:38 -05:00
rpoplin b8b23b931e Merge pull request #807 from broadinstitute/rhl_handle_cigar
Process X and = CIGAR operators
2015-02-01 11:09:52 -05:00
Phillip Dexheimer 3354c07b1c Added optional element "includeUnmapped" to the PartitionBy annotation
* The value of this element (default true) determines whether Queue will explicitly run this walker over unmapped reads
 * This patch fixes a runtime error when FindCoveredIntervals was used with Queue
 * PT 81777160
2015-01-31 15:47:57 -05:00
Ron Levine 9d4b876ccd Process X and = CIGAR operators
Add simple BaseRecalibrator integration test for CIGAR = and X operators
2015-01-29 17:00:00 -05:00
Khalid Shakir 1808c90d2a Added introductory CRAM support.
Replaced usage of GATKSamRecordFactory with calls to wrapper GATKSAMRecord extending SAMRecord.
Minor other updates for test changes.
Added exampleCRAM.cram generated by GATK, with .bai and .crai indexes generated by CRAMTools.
CRAM-to-CRAM test disabled due to https://github.com/samtools/htsjdk/issues/148
Using exampleBAM.bam input, outputs of GATK's generated CRAM match CRAMTools generated CRAM, but not samtools/PrintReads SAM output, as things like insert sizes are different.
If required for other tools, CRAM indexes must be generated via CRAMTools until we can generate them via CRAMFileWriter.

Generation of exampleCRAM.cram:
* java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o public/gatk-utils/src/test/resources/exampleCRAM.cram
* java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram
* java -jar cramtools-2.1.jar index -I public/gatk-utils/src/test/resources/exampleCRAM.cram --bam-style-index

CRAM generation by existing tools:
* samtools view -C -T public/gatk-utils/src/test/resources/exampleFASTA.fasta -o testSamtools.cram public/gatk-utils/src/test/resources/exampleBAM.bam
* java -jar cramtools-2.1.jar cram --ignore-md5-mismatch --capture-all-tags -Q -n -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -O testCRAMTools.cram
* java -jar target/executable/GenomeAnalysisTK.jar -T PrintReads -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleBAM.bam -o testGATK.cram

CRAMTools view of the above:
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I public/gatk-utils/src/test/resources/exampleCRAM.cram | tail -n 1
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testSamtools.cram | tail -n 1
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testCRAMTools.cram | tail -n 1
* java -jar cramtools-2.1.jar bam --skip-md5-check -R public/gatk-utils/src/test/resources/exampleFASTA.fasta -I testGATK.cram | tail -n 1
2015-01-26 14:47:39 -03:00
Khalid Shakir de3ca65232 Bumping HTSJDK version to pickup a bug fix for CRAM. 2015-01-26 14:47:39 -03:00
Phillip Dexheimer 72f76add71 Added -trimAlternates argument to SelectVariants
* PT 84021222
 * -trimAlternates removes all unused alternate alleles from variants.  Note that this is pretty aggressive for monomorphic sites
2015-01-21 21:33:35 -05:00
Joel Thibault 5ce34d81b8 Allows users to disable specific read filters from the command line
- enable this for DuplicateReadFilter only
- enable the @DisabledReadFilters annotation to do this at the Walker level
2015-01-21 13:17:29 -05:00
Ron Levine 804b2a36b7 Fix SplitNCigar reads exception by making the list of RNAReadTransformer non-abstract, add test for -fixNDN
Includes documentation changes for -fixNDN argument and the read transformer documentation.

Documentation changes to CombineVariants
2015-01-14 22:22:05 -05:00
Phillip Dexheimer 6190d660e0 Edits to work with the latest htsjdk release:
* TextCigarCodec.decode() is now static, and the getSingleton() method is gone
 * MergingSamRecordIterator now wants a Collection<SamReader> rather than Collection<SAMFileReader> in the constructor
 * SeekableBufferedStream now correctly reads the requested number of bytes, removed workaround in GATKBAMIndex
2015-01-13 21:32:10 -05:00
Phillip Dexheimer b73e9d506a Added GATKVCFConstants and GATKVCFHeaderLines to consolidate the GATK-specific VCF annotations
* Removed unused annotations (CCC and HWP)
 * Renamed one of the two GC annotations to "IGC" (for Interval GC)
 * Revved picard & htsjdk (GATK constants are now removed from htsjdk)
 * PT 82046038
2015-01-13 21:32:09 -05:00
Ryan Poplin 2e5f9db758 Raising per-sample limits on the number of reads in ART and HC.
-- Active Region Traversal was using per sample limits on the number of reads that were too low, especially now that we are running one sample at a time. This caused issues with high confidence variants being dropped in high coverage data.
-- HaplotypeCallerGVCFIntegrationTest PL/annotation changes due to using more reads in those tests
-- Removed a CountReadsInActiveRegionsIntegrationTest test for excessive coverage because the read coverage no longer goes over the limits in ART
2015-01-09 11:21:42 -05:00
rpoplin 03203e249e Merge pull request #792 from broadinstitute/rhl_pairhmm_log_stderr
Rhl pairhmm log stderr
2015-01-07 12:41:10 -05:00
Ron Levine 7d58544f17 Do not use logger, write to stderr, could not get the correct logger dependency in pom.xml 2015-01-06 10:32:11 -05:00
Ryan Poplin 10b23bfb04 Adding Axiom_Exome_Plus.sites_only.all_populations.poly.vcf to the resource bundle because it is used in the v3.3 best practices 2015-01-05 14:52:31 -05:00
Ron Levine 26c46ae05e Change logger.info to logger.error 2015-01-05 14:14:02 -05:00
Ron Levine b4fda38922 Use logging system instead of stderr 2015-01-05 14:04:10 -05:00
rpoplin 3240b3538a Merge pull request #794 from broadinstitute/rhl_read_backed_phasing
Rhl read backed phasing
2015-01-05 09:47:25 -05:00
Ron Levine 64375f6341 Messages that were going to stdout now going to stderr
Make PairHMM outputs go to stderr instead of stdout

Change output from stdout to stderr in close()

Updated lib with output going to stderr
2014-12-23 11:03:29 -05:00
Menachem Fromer 11cd0080c3 Add option to genotype additional user-defined interval lists
Add Qscript 'ONLY_GENOTYPE_xhmmCNVpipeline' to genotype additional user-defined interval lists

Add Qscript 'ONLY_GENOTYPE_xhmmCNVpipeline' to genotype additional user-defined interval lists (and similar option to Qscript 'xhmmCNVpipeline')
2014-12-21 13:02:17 -05:00
Ron Levine 069398ad46 Added more tests and documentation 2014-12-19 12:57:43 -05:00
Ron Levine 08790e1dab Fix mmultiallelic info field annotation for VariantAnnotator
Add multi-allele test for info field annotations

Fix to process all types of INFO annotations

roll back to previous version, removes INFO and FORMAT

Correct @return for VariantAnnotatorEngine.getNonReferenceAlleles()

Enhance comments and clean up multi-allelic logic, handle header info number = R

only parse counts of A & R

Add INFO for AC

update MD5

Performance enhancement, only parse multiallelic with a count A or R

Make argument final in getNonReferenceAlleles()

Code cleanup, add exceptions for bad expression/allele size mismatch and missing header info for an expression

Change exception to warning for expression value/number of alleles check

remove adevertised exceptions
2014-12-17 22:21:00 -05:00
Phillip Dexheimer 71bdfbe465 Fix VariantsToTable output of FORMAT record lists when -SMA is specified
* PT 84242218
 * Note that FORMAT fields behave the same as INFO fields - if the annotation has a count of A (one entry per Alt Allele), it is split across the multiple output lines.  Otherwise, the entire list is output with each field
2014-12-10 21:41:15 -05:00
Geraldine Van der Auwera 45eddb4ecb Updated gsalib version to 2.1 for resubmitting with updated license to CRAN 2014-12-09 17:07:48 -05:00
Phillip Dexheimer a5dee8a42e Fix NPE in SplitSamFile
* PT 82892316
  * Added integration test
  * Fixed similar error in debug output of HC
2014-12-07 10:37:30 -05:00
Alec Wysoker 4fe6ccec98 Add -output-file-extension option to GATKDoclet to produce html instead of php. 2014-12-01 18:06:36 -05:00
Alec Wysoker 62e5d42380 Fix code to filter current directory from paths pass to Reflection library. 2014-12-01 17:45:46 -05:00
Ron Levine 386aeda022 Add HaplotypeCaller argument so integration tests can specify the hardware dependent PairHMM sub-implementation 2014-11-25 21:53:53 -05:00
rpoplin 00027e1555 Merge pull request #774 from broadinstitute/ldg_makeSelectVariantsTrimAlleles
Add -trim argument to SelectVariants to trim alleles to minimal represen...
2014-11-13 13:58:13 -05:00
Ron Levine 67656bab23 Resolved conflict during rebasing
Add more logging to annotators, change loggers from info to warn

Add comments to testStrandBiasBySample()

Clarify comments in testStrandBiasBySample

remove logic for not prcossing an indel if strand bias (SB) was not computed

remove per variant warnings in annotate()

Log warnings if using the wrong annotator or missing a pedgree file

Log test failures once in annotate(), because HaplotypeCaller does not call initialize(). Avoid using exceptions

Fix so only log once in annotate(), Hardey-Weinberg does not require pedigree files, fix test MD5s so pass

Check if founderIds == null

Update MD5s from HaplotypeCaller integrations tests and clean up code

Change logic so SnpEff does not throw excpetions, change engine to utils in imports

Update test MD5s, return immediately if cannot annotate in SnpEff.initialization()

Post peer review, add more logging warnings

Update MD5 for testHaplotypeCallerMultiSampleComplex1, return null if PossibleDeNovo.annotate() is not called by VariantAnnotator
2014-11-12 02:45:49 -05:00
Laura Gauthier 783a4fd651 Change default behavior of SelectVariants to trim remaining alleles when samples are subset. -noTrim argument preserves original alleles. Add test for trimming. 2014-11-11 16:32:25 -05:00