Commit Graph

35 Commits (77ef2be9b8738f8bf8ee5455fece59647b1b20b1)

Author SHA1 Message Date
Mauricio Carneiro 9427ada498 Fixing no cigar bug
empty GATKSAMRecords will have a null cigar. Treat them accordingly.
2011-11-09 14:39:20 -05:00
Mauricio Carneiro e89ff063fc GATKSAMRecord refactor
The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...).

* No tools should create SAMRecord anymore, use GATKSAMRecord instead *
2011-11-03 15:43:26 -04:00
Mauricio Carneiro 3837aa45b4 Fixing conflicts
Conflicts:
	public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java
2011-10-03 19:07:59 -07:00
Mauricio Carneiro 9508220157 fixed hard clipping both ends inside deletion
If both ends of the interval falls within a deletion in the read then hardClipBothEnds would cut the right tail first including the entire deletion, then fail to cut the left tail because there would not be any bases there anymore. Fixed.
2011-09-29 15:36:49 -04:00
Mauricio Carneiro fc86cd6fd8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr 2011-09-29 00:12:15 -04:00
Roger Zurawicki 4fd5630f6a Added ReadClipper Unit Test
* Includes tests that include HardClip to Read and Reference Coords.
* Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing
2011-09-28 23:13:50 -04:00
Mauricio Carneiro 5c9b659c02 clipping both ends of the reads was modifying the original read
This goes against the ReadClipper contract, and was affecting the second part of the read that spans over multiple intervals. Fixed.
2011-09-28 16:07:34 -04:00
Mauricio Carneiro 3b6e43b7c4 Use reads that span multiple intervals
* RR will now compress reads that span across multiple intervals correctly and output them in the correct order.
* Fixed bug in getReadCoordinateForReferenceCoordinate where if the requested reference coordinate fell inside a deletion in the read the read would be clipped up to one element past the deletion.
2011-09-27 18:39:06 -04:00
Mauricio Carneiro c31f4cb2f6 Cleaning leading insertions
With the current implementation, a read cannot start with a deletion or an insertion. Maybe this will change in the future, but for now, chop the leading insertion off.
2011-09-24 14:33:32 -04:00
Mauricio Carneiro 7cac75ae1d Merged bug fix from Stable into Unstable 2011-09-23 19:00:43 -04:00
Mauricio Carneiro fbe3c1e0b3 Adding warning on HardClipping
Hard Clipping is still under heavy development and should not be used by anyone less prepared than MacGyver.
2011-09-23 19:00:19 -04:00
Mauricio Carneiro 9ea40f2e41 Deletions/Insertions in hard clip and bug fixes
* Deletions now count as hard clipped bases in order to recover the original alignment start of a clipped read.
* Insertions do not  count as hard clipped bases for the same reason.
* This created a bug in the previous cigar cleaning function. Fixed.
2011-09-23 16:37:08 -04:00
Mauricio Carneiro 39b54211d0 Fixed hard clipping soft clipped bases after hard clips
if soft clipped bases were after a hard clipped section of the read, the hard clip was clipping the left soft clip tail as if it were a right tail. Mayhem.
2011-09-22 15:46:55 -04:00
Mauricio Carneiro 1acf7945c5 Fixed hard clipped cigar and alignment start
* Hard clipped Cigar now includes all insertions that were hard clipped and not the deletions.
* The alignment start is now recalculated according to the new hard clipped cigar representation
2011-09-22 14:51:14 -04:00
Mauricio Carneiro 4e9020c9f7 Fixed alignment start for hard clipping insertions 2011-09-22 13:28:25 -04:00
Mauricio Carneiro 70335b2b0a Hard clipping soft clipped reads to fix misalignments.
Pre-softclipped reads (with high qual) are a complicated event to deal with in the Reduced Reads environment. I chose to hard clip them out for now and added a todo item to bring them back on in the future, perhaps as a variant region.
2011-09-21 17:12:01 -04:00
Roger Zurawicki 091c7197cd Fixed memory leak and bug with deletions in clipping
The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug.
* There is no check to make sure the read coordinate are covered by the read though
When Hard clipping to interval, I added a check for deletions.
NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized
2011-09-18 19:21:51 -04:00
Mauricio Carneiro 131cb7effd Bringing Reduce Reads bug fixes to the main repository 2011-09-07 12:25:53 -04:00
Roger Zurawicki 47607a7eff Fixed bug where deletions messed up interval clipping
- Instead of using readLength, the ReadUtil function are used to get a proper read coordinate
 - Added debug info in interval clipping ( with -dl)

  NOTE: method might not be safe for production and checks need to be added to the ClippingOp code
2011-09-06 14:25:57 -04:00
Mauricio Carneiro 08ae6c0c61 ReadClipper is now handling unmapped reads 2011-09-02 11:32:30 -04:00
Mauricio Carneiro 6f9264d2b3 Hard Clipping no longer leaves indels on the tails
The clipper could leave an insertion or deletion as the start or end of a read after hardclipping a read if the element adjacent to the clipping point was an indel. Fixed.
2011-08-30 02:44:58 -04:00
Mauricio Carneiro 90a1f5e15c Several bug fixes
* When hard clipping a read that had insertions in it, the insertion was being added to the cigar string's hard clip element. This way, the old UnclippedStart() was being modified and so was the calculation of the new AlignmentStart(). Fixed it by subtracting the number of insertions clipped from the total number of hard clipped bases.
* Walker was sending read instead of filtered read when deleting a read that contains only Q2 bases
* Sliding the window was causing reads that started on the new start position to be entirely clipped.
2011-08-30 02:44:19 -04:00
Mauricio Carneiro 66a8b36cf5 Fixed most indexing bugs
* added bases and quals to consensus
* fixed consensus read cigar generation.
2011-08-30 02:43:41 -04:00
Roger Zurawicki ac36271457 Fixed extra reads showing up in Variable Sites
Reads that were not hard clipped for the variable site no longer show up in output file
Walker now uses unclippedStart of Read to determine position in the sliding Window
2011-08-23 11:26:00 -04:00
Mauricio Carneiro feeab6075f Merging ReduceReads development with unstable repo
It is time to bring the ReadClipper class to the main repo. Read Clipper has tested functionality for soft and hard clipping reads. I will prepare thorough documentation for it as it will be very useful for the assembler and the GATK in general.
2011-08-22 23:03:03 -04:00
Mark DePristo b08d63a6b8 Documentation and code cleanup for ClipReads, CallableLoci, and VariantsToTable
-- Swapped -o [summary] and -ob [bam] for more standard -o [bam] and -os [summary] arguments.
-- @Advanced arguments
2011-08-19 15:06:37 -04:00
Mauricio Carneiro 6ef01e40b8 Complete rewrite of Hard Clipping (ReadClipper)
Hard clipping is now completely independent from softclipping and plows through previously hard or soft clipped reads.
2011-08-18 18:35:45 -04:00
Mauricio Carneiro b135565183 Added low quality clipping
Clips both tails of a read if the tails are below a given quality threshold (default Q2).
*Added special treatment for reads that get completely clipped.
2011-08-16 13:51:25 -04:00
Mauricio Carneiro 993ecb85da Added Hard Clipping Tail Ends
Added functionality to hard clip the low quality tail ends of reads (lowQual <= 2)
2011-08-15 15:22:54 -04:00
Mauricio Carneiro 0d976d6211 Fixed second time clipping
When a read is clipped once, and then in the second operation, because of indels, it doesn't reach the coordinate initially set for hard clipping, the indices were wrong. This should fix it.
2011-08-15 12:04:53 -04:00
Mauricio Carneiro 489c15b99d Fixed indexing issue in coordinate conversion
When a read had been previously soft clipped, the UnclippedEnd could not be used directly as Reference Coordinate for clipping , because the read does not go that far.
2011-08-15 01:42:34 -04:00
Mauricio Carneiro 8a51732049 Fixes to ReadClipper and added Reference Coordinate clipping.
* Added reference coordinate based hard clipping functions. This allows you to set a hard cut on where you need the read to be trimmed despite indels.
* soft clipping was messing up cigar string if there was already a hard clip at the beginning of the read. Fixed.
* hard clipping now works with previously hard clipped reads.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro 291d8c7596 Fixed HardClipping and Interval containment
* Hard clipping was wrongfully hard clipping unmapped reads while soft clipping then hard clipping mapped reads. Now we throw exception if we try to hard/soft clip unmapped reads and use the soft->hard clip procedure fore every mapped read.

 * Interval containment needed a <= and >= to make sure it caught the borders right.
2011-08-14 14:54:33 -04:00
Mark DePristo 9992c373be Optimize imports run on the whole project, public and private. I just got too tired of all of the unused imports floating around. Confirmed that the system builds after the changes. 2011-07-17 20:29:58 -04:00
David Roazen 3c9497788e Reorganized the codebase beneath top-level public and private directories,
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00