delangel
3c364279f4
Add simple ability to create "X out of N" combined files: if a site is present in at least X input rods, it gets output, otherwise it's skipped, controlled with argument -minN.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5783 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-10 15:27:18 +00:00
hanna
f275be6968
A 'fat shard' finder. Cranks through the indices of a BAM file or list of
...
BAM files looking for outliers (outliers right now are defined naively as
shards whose sizes are more than 5 stddevs away from the mean). Runs in
13 minutes per chromosome on 707 low pass whole genome BAMs -- not great, but
much faster than running UG on the same region to discover anomalies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5782 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-10 12:56:47 +00:00
kshakir
3ffc2ccd81
Implemented broad specific LSF requirement in the LSF job runner ahead of GridEngine check in by dheiman.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5781 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-09 22:14:04 +00:00
kshakir
7d21350a17
Fixed import.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5780 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-09 18:07:40 +00:00
asivache
0861451726
Print on multiple rows in standalone command line mode when the sequences are too long
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5779 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-09 13:51:00 +00:00
ebanks
bf40351094
Minor update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5778 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-08 03:48:37 +00:00
ebanks
15c7bd82a5
Fix for IndelRealigner memory problem. Now the Constrained mate fixing writer is told whether a read has been modified and, if it wasn't, can dump it when the cache needs to get flushed at places with tons of coverage.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5777 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-06 19:34:41 +00:00
rpoplin
d8a761bbbd
Warn the user if trying to train with too few variants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5776 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-06 17:47:58 +00:00
hanna
c2e8c460cb
Factor out all testing dependencies into a separate test configuration and
...
only download that test configuration when running unit/integration tests.
This means that the build will (hopefully) never break because it can't
fetch a file that isn't required for the GATK to run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5775 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 22:42:11 +00:00
rpoplin
1d11e88899
Adding another example call set to GATK resource bundle for use in VQSR wiki tutorial
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5774 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 21:16:33 +00:00
rpoplin
b94d8dae17
Removing requirement of providing known track in VQSR for the non-humans. Updating placement of legend on tranche plot.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5773 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 20:24:06 +00:00
fromer
04f156d86b
Removed extraneous import
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5772 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 18:51:03 +00:00
delangel
7d7ce6cf00
Two embarassing bug fixes:
...
a) Forgot to convert from phred to log-prob when computing gap penalties from recal table.
b) Forgot to uncomment code to correctly deal with hard-clipped bases in a read. But because of this, had to do a short term workaround to at least temporarily return class from hardClipAdaptorSequence to GATKSAMRecord. Otherwise, I get exceptions when casting because somehow some reads in HiSeq get to be SAMRecord (which GATKSAMRecord inherits from) but some reads get to be BAMRecords (which can't be cast into GATKSAMRecord), not sure why.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5771 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 17:08:34 +00:00
hanna
45d8634522
Intermediate commit: bring Google Caliper into our private repository (even
...
though sonatype is back up). This will tide us over until I figure out how
to add caliper to test configuration, so that it's only swapped in when we
actually run our unit / performance tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5770 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 14:33:14 +00:00
kshakir
4d08d39849
Moved some of the java to scala conversions from production to test code as it's not needed in production and slows down the code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5769 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 04:11:15 +00:00
kshakir
28b897d5de
Fixed O(N^2) operation when scattering interval files.
...
Cleaned up intervals contig count function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5768 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 03:32:35 +00:00
carneiro
3882d1b9c0
fixing the build \o/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5767 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 00:57:49 +00:00
kshakir
8ad547e6c2
Fixed another interval bug where dividing up N intervals into N parts wasn't working.
...
Minor updates to the FCPTest to match the changes due to using the old indel caller.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5766 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:49:35 +00:00
rpoplin
825682f58c
oops, putting the script back into a sensible state
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5765 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:17:05 +00:00
rpoplin
b5ab2274f6
Committing the base qscript I used to make the Phase1 Project Consensus. Does per-population cleaning and simplifyBAM, and then per-analysis-panel calling with genotype given alleles. Combines info fields using the panel with max AC.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5764 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:13:26 +00:00
corin
b4654b0f47
Status messages to user added
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5763 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:10:47 +00:00
corin
bcc688c1e9
small formatting change
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5762 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:01:20 +00:00
corin
1410327901
Cmd line argument reference fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5761 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:00:14 +00:00
hanna
5c6965575e
Some refactoring that Mauricio and I worked through together. Changed filters
...
to extend from org.broadinstitute.sting.gatk.filters.ReadFilter rather than
directly from net.sf.picard.filter.SamRecordFilter, which allows us to add
an initialize(GATKEngine) method so that filters can do any initialization
they'd like based on CL arguments, SAM headers, etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5760 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 19:29:08 +00:00
carneiro
b66c6dced1
- No longer prints out non confident calls (they were leading to tables that don't add up and confusing some Pacbio folk).
...
- Added sensitivity and Specificity to the report.
- With the changes in genotype likelihoods, the indel analysis only happens if the BAM file also has an extended event. Not great, but at least it's not broken.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5759 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 19:26:55 +00:00
kshakir
4d251fb91f
Why won't you die?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5758 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 19:13:39 +00:00
kshakir
f7d9f0a1f3
Removing QPipeline directory as there's no one to support it at the moment.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5757 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 18:36:02 +00:00
carneiro
7ed8b4ddb0
Making sure CalculateLikelihoodsAndGenotypes returns an empty variant context when 'EMIT_ALL_SITES' and 'GENOTYPE_GIVEN_ALLELES' are being used, now for indels too!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5756 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 18:04:56 +00:00
corin
3e8fc71743
Missing parenthesis for database access commands added
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5755 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:59:17 +00:00
corin
23efd66d31
Updated Tearsheet with by sample QC metrics, bugfix for misnamed variables
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5754 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:57:48 +00:00
corin
72a07e4553
Updated Tearsheet with by sample QC metrics, bugfix for misnamed variables
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5753 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:54:05 +00:00
corin
2e1c09c03b
Updated tearsheet drop
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5752 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:47:47 +00:00
corin
f386cad58c
Updated Tearsheet with by sample QC metrics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5751 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:47:26 +00:00
rpoplin
6c7a0adc76
Updating VariantGaussianMixtureModelUnitTest to use truth sensitivity cutting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5750 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 13:56:01 +00:00
kshakir
08f0509a5c
Disabling the queue/pipeline package by default so that scala code can build. If it's not going to be fixed the package should be removed. If it is going to be fixed this patch to build.xml should be reverted.
...
Also added the old model of indel calling to the FCP.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5749 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 12:17:33 +00:00
delangel
a19389528d
Bring back from the dead the old likelihoods model for indels, which has worse performance but is about 4x faster. Enabled with argument -GSA_PRODUCTION_ONLY in UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5748 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 22:38:33 +00:00
carneiro
f35d955490
recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:43:09 +00:00
carneiro
9f2a8033ff
just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:42:23 +00:00
carneiro
c2f8536e02
removing old GATK options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:40:39 +00:00
carneiro
8bb92160b5
Script to identify mendelian violations in the CEU Trio and follow up with supposedly incorrect SNP calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5744 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:19:42 +00:00
carneiro
e2b9227d8d
script to test BQSR on good/bad regions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5743 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:16:37 +00:00
carneiro
e5cc0f4eec
Added 'specificity' to variant eval's Validation Report evaluator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5742 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:48:30 +00:00
rpoplin
b88dec387c
clean up from VQSR movement
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5741 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:35:30 +00:00
rpoplin
23cd3a7a5d
Moving VQSR v2 to core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5740 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:20:06 +00:00
rpoplin
44a717f63a
Good bye VQSR v1. This commit will break the build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5739 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:09:52 +00:00
hanna
2dacf1b2b2
Better header support when running R's read.table(...,header=T).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5738 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:56:20 +00:00
hanna
ad8c786b2d
Now more easily R-parseable.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5737 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:30:50 +00:00
rpoplin
5bade81c6d
Adding tranche plot generation back to VQSR
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5736 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:26:26 +00:00
rpoplin
e73720c2db
Updating VQSLOD annotation description
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5735 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 19:01:08 +00:00
rpoplin
11052918d9
Better exception text for common error in VQSR.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5734 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:37:25 +00:00