Re-added import java.io.File for BamGatherFunction.
Other cleanup to resolve scala syntax warnings from intellij.
Moved Example UG script to from protected to public.
- This change means that BamGatherFunction will now have an @Output field for the BAM index, which will allow the bai to be deleted for intermediate functions
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
This commit consists of 2 main changes:
1. When the strand table gets too large, we normalize it down to values that are more reasonable.
2. We don't include a particular sample's contribution unless the total ref and alt counts are at least 2 each;
this is a heuristic method for dealing only with hets.
MD5s change as expected.
Hopefully we'll have a more robust implementation for GATK 3.1.
The slicePrefix method functionality was broken.
Story:
https://www.pivotaltracker.com/story/show/64595624
Changes:
1. Fixed the bug.
2. Added unit test to check on the method functionality.
3. Added a integration test to verify the bug has been fixed in a empirical data reprudible case.
Story:
https://www.pivotaltracker.com/s/projects/1007536
Changes:
1. HC's GenotypingEngine now invokes reverseAlleleTrimming on GVCF variant output lines.
2. GenotypeGVCFs also reverse trim after regenotyping as some alt. alleles are dropped (observed in real-data).
The writer was never resetting the pointer to the end of the last non-ref VariantContext that it saw.
This was fine except when it jumped to a new contig - and a lower position on that contig - where it
thought that it was still part of that previous non-ref VariantContext so wouldn't emit a reference
block. Therefore, ref blocks were missing from the beginnings of all chromosomes (except chr1).
Added unit test to cover this case.
Bug uncovered by some untrimmed alleles in the single sample pipeline output.
Notice however does not fix the untrimmed alleles in general.
Story:
https://www.pivotaltracker.com/story/show/65481104
Changes:
1. Fixed the bug itself.
2. Fixed non-working tests (sliently skipped due to exception in dataProvider).
This script publishes GATK/Queue jars for internal GSA use to the following locations
whenever tests pass:
/humgen/gsa-hpprojects/GATK/private_unstable_builds/GenomeAnalysisTK_latest_unstable.jar
/humgen/gsa-hpprojects/Queue/private_unstable_builds/Queue_latest_unstable.jar
These jars include private code, and so are for internal use only.
Note that this tool is still a work in progress and very experimental, so isn't 100% stable. Most of
the features are untested (both by people and by unit/integration tests) because Chris Hartl implemented
it right before he left, and we're going to need to add tests at some point soon. I added a first
integration test in this commit, but it's just a start.
The fixes include:
1. Stop having the genotyping code strip out AD values. It doesn't make sense that it should do this so
I don't know why it was doing that at all.
Updated GenotypeGVCFs so that it doesn't need to manually recover them anymore.
This also helps CalculateGenotypePosteriors which was losing the AD values.
Updated code in LeftAlignAndTrimVariants to strip out PLs and AD, since it wasn't doing that before.
Updated the integration test for that walker to include such data.
2. Chris was calling Math.pow directly on the normalized posteriors which isn't safe.
Instead, the normalization routine itself can revert back to log scale in a safe manner so let's use it.
Also, renamed the variable to posteriorProbabilities (and not likelihoods).
3. Have CGP update the AC/AF/AN counts after fixing GTs.
commit 5e73b94eed3d1fc75c88863c2cf07d5972eb348b
Merge: e12593a d04a585
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date: Fri Feb 14 09:25:22 2014 +0000
Merge pull request #1 from broadinstitute/checkpoint
SimpleTimer passes tests, with formatting
commit d04a58533f1bf5e39b0b43018c9db3302943d985
Author: kshakir <github@kshakir.org>
Date: Fri Feb 14 14:46:01 2014 +0800
SimpleTimer passes tests, with formatting
Fixed getNanoOffset() to offset nano to nano, instead of nano to seconds.
Updated warning message with comma separated numbers, and exact values of offsets.
commit e12593ae66a5e6f0819316f2a580dbc7ae5896ad
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date: Wed Feb 12 13:27:07 2014 +0000
Remove instance of 'Timer'.
commit 47a73e0b123d4257b57cfc926a5bdd75d709fcf9
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date: Wed Feb 12 12:19:00 2014 +0000
Revert a couple of changes that survived somehow.
- CheckpointableTimer,Timer -> SimpleTimer
commit d86d9888ae93400514a8119dc2024e0a101f7170
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date: Mon Jan 20 14:13:09 2014 +0000
Revised commits following comments.
- All utility merged into `SimpleTimer`.
- All tests merged into `SimpleTimerUnitTest`.
- Behaviour of `getElapsedTime` should now be consistent with `stop`.
- Use 'TimeUnit' class for all unit conversions.
- A bit more tidying.
commit 354ee49b7fc880e944ff9df4343a86e9a5d477c7
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date: Fri Jan 17 17:04:39 2014 +0000
Add a new CheckpointableTimerUnitTest.
Revert SimpleTimerUnitTest to the version before any changes were made.
commit 2ad1b6c87c158399ededd706525c776372bbaf6e
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date: Tue Jan 14 16:11:18 2014 +0000
Add test specifically checking behaviour under checkpoint/restart.
Slight alteration to the checkpointable timer based on observations
during the testing - it seems that there's a fair amount of drift
between the sources anyway, so each time we stop we resynchronise the
offset. Hopefully this should avoid gradual drift building up and
presenting as checkpoint/restart drift.
commit 1c98881594dc51e4e2365ac95b31d410326d8b53
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date: Tue Jan 14 14:11:31 2014 +0000
Should use consistent time units
commit 6f70d42d660b31eee4c2e9d918e74c4129f46036
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date: Tue Jan 14 14:01:10 2014 +0000
Add a new timer supporting checkpoint mechanisms.
The issue with this is that the current timer is locked to JVM nanoTime. This can be reset after
a checkpoint/restart and result in negative elapsed times, which causes an error.
This patch addresses the issue in two ways:
- Moves the check on timer information in GenomeAnalysisEngine.java to only occur if a time limit has been
set.
- Create a new timer (CheckpointableTimer) which keeps track of the relation between system and nano time. If
this changes drastically, then the assumption is that there has been a JVM restart owing to checkpoint/restart.
Any time straddling a checkpoint/restart event will not be counted towards total running time.
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>