Commit Graph

12826 Commits (b603e6674df65a75a1a6eeffc5d3505f2d45f4ce)

Author SHA1 Message Date
amilev b603e6674d Merge pull request #425 from broadinstitute/ami_moleculo_project_changes
Ami moleculo project changes (add 'final's based on review)
2013-11-18 14:33:49 -08:00
Ami Levy-Moonshine e6ef37de1d Add an option to filter the read bases that are taking into account for the coveraged intervals. For that, new two arguments were added: minBaseQuality and minMappingQuality 2013-11-18 17:29:32 -05:00
Ami Levy-Moonshine 6ad841cec5 Rewrite ReadLengthDistribution to count the read lengths into a hash table first and only at the end to produce a GATK report table.
Before that fix, the tool was couldn't work with more then one RG before.
- Address all review comments
2013-11-18 17:29:31 -05:00
amilev cc85e373d0 Merge pull request #426 from broadinstitute/ami_fix_MoleculoPipeline_suffix_bug
fix a (ugly) weird error from last commit that changed all the scala fil...
2013-11-18 08:55:46 -08:00
Ami Levy-Moonshine 9c1023c933 fix a (ugly) weird error from last commit that changed all the scala files to end with MoleculoPipeline.scala 2013-11-18 11:44:24 -05:00
amilev 6448cc53f5 Merge pull request #422 from broadinstitute/ami-NPE-fix-NA12878
fix missed NPE error in AssessNA12878. Add a proper check and error mess...
2013-11-14 12:06:16 -08:00
Ami Levy-Moonshine 2ff0c23b53 fix missed NPE error in AssessNA12878. Add a check and an clear error message.
add unitTest for that case (when a file has genotypes but does not NA12878 as a sample)
2013-11-14 14:59:47 -05:00
MauricioCarneiro 7f08250870 Merge pull request #417 from broadinstitute/bt_pairhmm_api_cleanup2
Improve the PairHMM API for better FPGA integration
2013-11-14 10:47:07 -08:00
bradtaylor e40a07bb58 Improve the PairHMM API for better FPGA integration
Motivation:
The API was different between the regular PairHMM and the FPGA-implementation
via CnyPairHMM. As a result, the LikelihoodCalculationEngine had
to use account for this. The goal is to change the API to be the same
for all implementations, and make it easier to access.

PairHMM
PairHMM now accepts a list of reads and a map of alleles/haplotpes and returns a PerReadAlleleLikelihoodMap.
Added a new primary method that loops the reads and haplotypes, extracts qualities,
and passes them to the computeReadLikelihoodGivenHaplotypeLog10 method.
Did not alter that method, or its subcompute method, at all.
PairHMM also now handles its own (re)initialization, so users don't have to worry about that.

CnyPairHMM
Added that same new primary access method to this FPGA class.
Method overrides the default implementation in PairHMM. Walks through a list of reads.
Individual-read quals and the full haplotype list are fed to batchAdd(), as before.
However, instead of waiting for every read to get added, and then walking through the reads
again to extract results, we just get the haplotype-results array for each read as soon as it
is generated, and pack it into a perReadAlleleLikelihoodMap for return.
The main access method is now the same no matter whether the FPGA CnyPairHMM is used or not.

LikelihoodCalculationEngine
The functionality to loop through the reads and haplotypes and get individual log10-likelihoods
was moved to the PairHMM, and so removed from here. However, this class does need to retain
the ability to pre-process the reads, and post-process the resulting likelihoods map.
Those features were separated from running the HMM and refactored into their own methods
Commented out the (unused) system for finding best N haplotypes for genotyping.

PairHMMIndelErrorModel
Similar changes were made as to the LCE. However, in this case the haplotypes are modified
based on each individual read, so the read-list we feed into the HMM only has one read.
2013-11-14 09:45:33 -05:00
Geraldine Van der Auwera f22ab033f6 Merge pull request #424 from broadinstitute/gg_yetanothergatkdocfix
Yet another gatkdoc fix
2013-11-13 11:35:59 -08:00
Geraldine Van der Auwera dac3dbc997 Improved gatkdocs for InbreedingCoefficient, ReduceReads, ErrorRatePerCycle
Clarified caveat for InbreedingCoefficient
Cleaned up docstrings for ReduceReads
Brushed up doc for ErrorRatePerCycle
2013-11-13 14:33:04 -05:00
jmthibault79 34c8ad529f Merge pull request #423 from broadinstitute/jt_pdexheimer_patch
Changed name of jobs submitted to cluster job runners
2013-11-13 04:38:00 -08:00
Phillip Dexheimer 296bcc7fb1 Changed name of jobs submitted to cluster job runners
-- Added 'jobRunnerJobName' definition to QFunction, defaults to value of shortDescription
-- Edited Lsf and Drmaa JobRunners to use this string instead of description for naming jobs in the scheduler

Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2013-11-12 14:34:56 -05:00
Eric Banks adb939521a Merge pull request #420 from broadinstitute/rp_add_exome_chips_to_knowledge_base
Adding QC'ed versions of the Affy Exome Plus and Broad Exome LOF genotyp...
2013-11-08 08:19:17 -08:00
MauricioCarneiro cd5beca794 Merge pull request #419 from broadinstitute/mc_dpp_updates_part2
Mc dpp updates part2
2013-11-07 13:42:32 -08:00
Mauricio Carneiro 725656ae7e Generalizing the FullProcessingPipeline Qscript
We have generalized the processing script to be able to handle multiple scenarios. Originally it was
designed for PCR free data only, we added all the steps necessary to start from fastq and process
RNA-seq as well as non-human data. This is our go to script in TechDev.

   * add optional "starting from fastq" path to the pipeline
   * add mark duplicates (optionally) to the pipeline
   * add an option to run with the mouse data (without dbsnp and with single ended fastq)
   * add option to process RNA-seq data from topHat (add RG and reassign mapping quality if necessary)
   * add option to filter or include reads with N in the cigar string
   * add parameter to allow keeping the intermediate files
2013-11-07 16:34:29 -05:00
Ryan Poplin 1769bcacbc Adding QC'ed versions of the Affy Exome Plus and Broad Exome LOF genotyping results to the knowledgebase. These files include both SNPs and indels. 2013-11-07 16:21:23 -05:00
Eric Banks f15355856a Merge pull request #418 from broadinstitute/eb_fix_liftover_script
Fixing the liftover script to not require strict VCF header validation.
2013-11-07 06:04:56 -08:00
Eric Banks 2fc40a0aed Fixing the liftover script to not require strict VCF header validation.
Apparently no one has used the liftover script for a while (which I guess is a good thing)...
2013-11-07 09:02:17 -05:00
amilev f67c3a1b4e Merge pull request #415 from broadinstitute/eb_new_path_for_NIST_in_kb_upload
Various updates to the knowledge base
2013-11-06 13:00:45 -08:00
Eric Banks 23c9c24adc Various updates to the knowledge base
1. Updated NIST path to its proper place on the file system (and updated the NIST calls to the latest, v2.17).

2. Don't assess genotype concordance for multi-allelic sites because we mess up the GTs when we break
them into their component parts (and therefore the GTs look wrong when they really aren't).

3. Add an argument to control the minimum GQ for a GT to be considered called.
This improves genotyping accuracy assessments which were unfairly penalizing low confidence GT calls.
Delivers PT #59846158.
2013-11-06 14:05:07 -05:00
Eric Banks 0e3d83d1ef Merge pull request #413 from broadinstitute/rp_qd_and_qual_updates_in_ref_model_pipeline
Improvements to the reference model pipeline.
2013-11-05 06:33:17 -08:00
Eric Banks 09dfaf1a68 Merge pull request #416 from broadinstitute/mc_quick_fixes_to_cser_pipeline
Add interpretation to QualifyMissingIntervals
2013-11-05 06:08:13 -08:00
Eric Banks a372c0f074 Merge pull request #414 from broadinstitute/eb_update_dbsnp_in_bundle
Update the dbsnp version in the bundle from 137 to 138; resolves PT #59771004.
2013-11-04 07:04:23 -08:00
Eric Banks 96024403bf Update the dbsnp version in the bundle from 137 to 138; resolves PT #59771004. 2013-11-04 10:01:22 -05:00
Ryan Poplin b22c9c2cb4 Improvements to the reference model pipeline.
-- We use the RegenotypeVariants walker to recompute the qual field. (instead of the discussed idea of adding this functionality to CombineVariants)
-- QualByDepth will now be recomputed even if the stratified contexts are missing. This greatly improves the QD estimate for this pipeline. Doesn't work for multi-allelics since the qual can't be recomputed.
2013-11-01 17:58:25 -04:00
Eric Banks cafcb34855 Merge pull request #411 from broadinstitute/eb_add_exome_intervals_to_bundle_script
Updated the GATK bundle script to:
2013-10-29 07:38:44 -07:00
Eric Banks 209f2a61aa Updated the GATK bundle script to:
1. Include exome target list for b37
2. Not delete the 'current' link unless -run is applied to the command line!  (sorry, Ryan)
2013-10-29 10:33:51 -04:00
kshakir 0ad23c09d5 Merge pull request #410 from lbergelson/lb_missing_script_error
Improving the error message for a missing Queue script.
2013-10-25 08:57:06 -07:00
Mauricio Carneiro 5ed47988b8 Changed the parameter names from cds to baits
Making the usage more clear since the parameter is being used over and over to define baited
regions. Updated the headers accordingly and made it more readable.
2013-10-24 17:15:56 -04:00
Louis Bergelson 9498950b1c Adding more specific error message when one of the scripts doesn't exist.
--Previously it gave a cryptic message:
----IO error while decoding blarg.script with UTF-8
----Please try specifying another one using the -encoding option
2013-10-21 14:57:42 -04:00
David Roazen 5a2ef37ead Tweak dcov documentation to help prevent user confusion
Geraldine-approved!
2013-10-16 15:24:33 -04:00
Ryan Poplin 1b10d5467a Merge pull request #409 from broadinstitute/rp_postQC_titv_ratio_smoothing
Smooth over ti/tv bins with small numbers of counts in the PostCallingQC...
2013-10-16 07:41:37 -07:00
Ryan Poplin e9d0d6ada0 Smooth over ti/tv bins with small numbers of counts in the PostCallingQC script 2013-10-16 10:40:50 -04:00
MauricioCarneiro 93d525b56c Merge pull request #407 from broadinstitute/mc_dwn_exome
Qscript to Downsample and analyze an exome BAM
2013-10-10 11:56:12 -07:00
MauricioCarneiro b248a61db6 Merge pull request #408 from broadinstitute/gda_more_pool_caller_stuff
PooledCaller paper scripts
2013-10-10 11:55:17 -07:00
Guillermo del Angel 36e92a3323 More script committing for posterity -
Pool Caller scripts with last minute fixes. Also committed script that plotted 1000G FDR that I used in ASHG2012.
Added also a README.txt file in /humgen/gsa-hpprojects/dev/validationExperiments/largeScaleValidation/finalPaperData/README.txt
in case things need to get run again.
2013-10-10 14:40:12 -04:00
Mauricio Carneiro efbfdb64fe Qscript to Downsample and analyze an exome BAM
this script downsamples an exome BAM several times and makes a coverage distribution
analysis (of bases that pass filters) as well as haplotype caller calls with a NA12878
Knowledge Base assessment with comparison against multi-sample calling
with the UG.

This script was used for the "downsampling the exome" presentation
2013-10-10 14:37:33 -04:00
Chris Hartl 9d932e8c60 Merged bug fix from Stable into Unstable 2013-10-10 14:31:33 -04:00
Chris Hartl 6f46d1187a Remember to copy the integration test changes *as well as* the walker changes 2013-10-10 14:30:37 -04:00
Chris Hartl 55bab9fa87 Merged bug fix from Stable into Unstable 2013-10-10 13:01:12 -04:00
Chris Hartl 06d28c7f8b VariantsToBinaryPed: Move .fam file writing to initialize to ensure ordering matches the ordering of the VCF. Change the documentation to clarify that the fam files are not directly copied, but subset and re-ordered. 2013-10-10 12:53:15 -04:00
Mauricio Carneiro 5d6421494b Fix mismatching number of columns in report
Quick fix the missing column header in the QualifyMissingIntervals
report.

Adding a QScript for the tool as well as a few minor updates to the
GATKReportGatherer.
2013-10-09 14:38:15 -04:00
Eric Banks e9a93d23c3 Merge pull request #405 from broadinstitute/eb_fix_ugcallvariants
Don't keep the original QUAL.  The whole point of this private walker is to regenerate it!
2013-10-08 08:36:06 -07:00
Eric Banks b71691f5a3 Don't keep the original QUAL. The whole point of this private walker is to regenerate it! 2013-10-08 11:34:14 -04:00
David Roazen 8ebb288014 Auto-restart Bamboo on gsa4 in the event of a crash 2013-10-07 16:10:17 -04:00
David Roazen 385c8d4a70 Place a copy of gsa-engineering's gsa4 crontab under version control 2013-10-07 14:44:56 -04:00
David Roazen d6fcebf948 Make cron daemon-spawning scripts less chatty
Cuts down on gsa-engineering cron emails
2013-10-07 14:35:59 -04:00
Eric Banks 49d489cbf8 Merge pull request #393 from broadinstitute/mc_qualify_updates_for_cser
Length metric updates to QualifyMissingIntervals
2013-10-04 09:50:39 -07:00
Mauricio Carneiro 63ace685c9 add unit tests 2013-10-04 11:44:07 -04:00