Updates for HybridSelectionPipeline:
- Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5.
- If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds.
- Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches.
- Removed GSA_PRODUCTION_ONLY flag from indel caller.
- Updated indel hard filters based on delangel's analysis.
- Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.
Found this neat little walker Kiran wrote stashed in the private tree. Very useful. Generalized it a bit, added GATKDocs and moved it to public. I might include it as a QC step on the pacbio processing pipeline.
* generalize it so it works with non pair ended reads.
* generalize it to work with no read group information
* getRefCoordSoftUnclippedEnd was not resetting the shift when hitting insertions. Fixed.
* getReadCoordinateForReferenceCoordinateBeforeAlignmentEnd was returning the wrong read coordinate position. Fixed.
The running consensus accepts deletions as long as they are homozygous and (with the current parameters) there is only one read in the pileup. It was generating the cigar string correctly but adding the "D" base to the read bases. Fixed.
The clipper could leave an insertion or deletion as the start or end of a read after hardclipping a read if the element adjacent to the clipping point was an indel. Fixed.
This makes it more in line with the insertion treatment and avoids the following problems:
* consensus reads starting with deletions
* variant regions ending abruptly and turning into one consensus with a long insertion, making it difficult to call the deletion in that location.
Read clipper now identifies and clips even if the requested coordinate is outside the alignment but the read contains soft clipped bases in that region.
* When hard clipping a read that had insertions in it, the insertion was being added to the cigar string's hard clip element. This way, the old UnclippedStart() was being modified and so was the calculation of the new AlignmentStart(). Fixed it by subtracting the number of insertions clipped from the total number of hard clipped bases.
* Walker was sending read instead of filtered read when deleting a read that contains only Q2 bases
* Sliding the window was causing reads that started on the new start position to be entirely clipped.