kshakir
ec443e89cf
Added pass-throughs for -Djava.io.tmpdir to javac and testng.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5791 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-11 20:56:35 +00:00
carneiro
fb1be2653c
A succint walker that reports GC content by interval. Taking down two old implementations of the same thing from oneoffs. Documentation added to the wiki.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5790 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-11 18:53:11 +00:00
depristo
9a1d0d7076
Simple bug fix to allow multiple records at same site when genotyping given alleles. Takes only the first record (respecting filters, SNP type, etc), and issues a warning if there is more than one valid record at a site
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5789 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-11 14:17:14 +00:00
dheiman
16db86e6cb
Grid Engine backend to GATK-Queue, initial commit of implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5788 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-11 13:21:45 +00:00
ebanks
dfdef2d29b
PLEASE READ ME! In order to prepare for the upcoming changes to VCF4, we felt it was best to split up the vcf3 and vcf4 codecs (vcf4 is not backwards compatible to vcf3 and certain changes are too complex to handle in both codecs). Using the 'VCF' rod type in the GATK will now throw a UserException for vcf3.2 or vcf3.3 files telling you to use the 'VCF3' type instead (and vice versa). Integration/unit tests have been updated. For programmers: note that there is currently a lot of code duplication in the two codecs (although I pulled out the easy stuff to a VCFCodecUtils class); however WE ARE FREEZING THE VCF3 CODEC AND WILL NO LONGER MAKE CHANGES TO IT. All updates/improvements will be targetted to the vcf4 codec only as vcf3 is there only to be able to read legacy files. People should really be using vcf4 files only.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5787 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-11 12:07:44 +00:00
delangel
852e555c00
Fix broken functionality from previous commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5786 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-10 18:38:25 +00:00
ebanks
8d47d2e813
Fix for Tim. It was possible for the constrained mate fixer to dump its cache in them middle of a given realignment (so the IndelRealigner was playing by the rules). No longer possible.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5785 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-10 16:48:24 +00:00
ebanks
fbe7974094
Renaming for consistency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5784 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-10 16:36:39 +00:00
delangel
3c364279f4
Add simple ability to create "X out of N" combined files: if a site is present in at least X input rods, it gets output, otherwise it's skipped, controlled with argument -minN.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5783 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-10 15:27:18 +00:00
hanna
f275be6968
A 'fat shard' finder. Cranks through the indices of a BAM file or list of
...
BAM files looking for outliers (outliers right now are defined naively as
shards whose sizes are more than 5 stddevs away from the mean). Runs in
13 minutes per chromosome on 707 low pass whole genome BAMs -- not great, but
much faster than running UG on the same region to discover anomalies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5782 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-10 12:56:47 +00:00
kshakir
3ffc2ccd81
Implemented broad specific LSF requirement in the LSF job runner ahead of GridEngine check in by dheiman.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5781 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-09 22:14:04 +00:00
kshakir
7d21350a17
Fixed import.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5780 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-09 18:07:40 +00:00
asivache
0861451726
Print on multiple rows in standalone command line mode when the sequences are too long
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5779 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-09 13:51:00 +00:00
ebanks
bf40351094
Minor update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5778 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-08 03:48:37 +00:00
ebanks
15c7bd82a5
Fix for IndelRealigner memory problem. Now the Constrained mate fixing writer is told whether a read has been modified and, if it wasn't, can dump it when the cache needs to get flushed at places with tons of coverage.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5777 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-06 19:34:41 +00:00
rpoplin
d8a761bbbd
Warn the user if trying to train with too few variants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5776 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-06 17:47:58 +00:00
hanna
c2e8c460cb
Factor out all testing dependencies into a separate test configuration and
...
only download that test configuration when running unit/integration tests.
This means that the build will (hopefully) never break because it can't
fetch a file that isn't required for the GATK to run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5775 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 22:42:11 +00:00
rpoplin
1d11e88899
Adding another example call set to GATK resource bundle for use in VQSR wiki tutorial
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5774 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 21:16:33 +00:00
rpoplin
b94d8dae17
Removing requirement of providing known track in VQSR for the non-humans. Updating placement of legend on tranche plot.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5773 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 20:24:06 +00:00
fromer
04f156d86b
Removed extraneous import
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5772 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 18:51:03 +00:00
delangel
7d7ce6cf00
Two embarassing bug fixes:
...
a) Forgot to convert from phred to log-prob when computing gap penalties from recal table.
b) Forgot to uncomment code to correctly deal with hard-clipped bases in a read. But because of this, had to do a short term workaround to at least temporarily return class from hardClipAdaptorSequence to GATKSAMRecord. Otherwise, I get exceptions when casting because somehow some reads in HiSeq get to be SAMRecord (which GATKSAMRecord inherits from) but some reads get to be BAMRecords (which can't be cast into GATKSAMRecord), not sure why.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5771 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 17:08:34 +00:00
hanna
45d8634522
Intermediate commit: bring Google Caliper into our private repository (even
...
though sonatype is back up). This will tide us over until I figure out how
to add caliper to test configuration, so that it's only swapped in when we
actually run our unit / performance tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5770 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 14:33:14 +00:00
kshakir
4d08d39849
Moved some of the java to scala conversions from production to test code as it's not needed in production and slows down the code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5769 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 04:11:15 +00:00
kshakir
28b897d5de
Fixed O(N^2) operation when scattering interval files.
...
Cleaned up intervals contig count function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5768 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 03:32:35 +00:00
carneiro
3882d1b9c0
fixing the build \o/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5767 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 00:57:49 +00:00
kshakir
8ad547e6c2
Fixed another interval bug where dividing up N intervals into N parts wasn't working.
...
Minor updates to the FCPTest to match the changes due to using the old indel caller.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5766 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:49:35 +00:00
rpoplin
825682f58c
oops, putting the script back into a sensible state
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5765 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:17:05 +00:00
rpoplin
b5ab2274f6
Committing the base qscript I used to make the Phase1 Project Consensus. Does per-population cleaning and simplifyBAM, and then per-analysis-panel calling with genotype given alleles. Combines info fields using the panel with max AC.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5764 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:13:26 +00:00
corin
b4654b0f47
Status messages to user added
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5763 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:10:47 +00:00
corin
bcc688c1e9
small formatting change
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5762 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:01:20 +00:00
corin
1410327901
Cmd line argument reference fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5761 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:00:14 +00:00
hanna
5c6965575e
Some refactoring that Mauricio and I worked through together. Changed filters
...
to extend from org.broadinstitute.sting.gatk.filters.ReadFilter rather than
directly from net.sf.picard.filter.SamRecordFilter, which allows us to add
an initialize(GATKEngine) method so that filters can do any initialization
they'd like based on CL arguments, SAM headers, etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5760 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 19:29:08 +00:00
carneiro
b66c6dced1
- No longer prints out non confident calls (they were leading to tables that don't add up and confusing some Pacbio folk).
...
- Added sensitivity and Specificity to the report.
- With the changes in genotype likelihoods, the indel analysis only happens if the BAM file also has an extended event. Not great, but at least it's not broken.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5759 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 19:26:55 +00:00
kshakir
4d251fb91f
Why won't you die?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5758 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 19:13:39 +00:00
kshakir
f7d9f0a1f3
Removing QPipeline directory as there's no one to support it at the moment.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5757 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 18:36:02 +00:00
carneiro
7ed8b4ddb0
Making sure CalculateLikelihoodsAndGenotypes returns an empty variant context when 'EMIT_ALL_SITES' and 'GENOTYPE_GIVEN_ALLELES' are being used, now for indels too!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5756 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 18:04:56 +00:00
corin
3e8fc71743
Missing parenthesis for database access commands added
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5755 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:59:17 +00:00
corin
23efd66d31
Updated Tearsheet with by sample QC metrics, bugfix for misnamed variables
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5754 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:57:48 +00:00
corin
72a07e4553
Updated Tearsheet with by sample QC metrics, bugfix for misnamed variables
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5753 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:54:05 +00:00
corin
2e1c09c03b
Updated tearsheet drop
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5752 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:47:47 +00:00
corin
f386cad58c
Updated Tearsheet with by sample QC metrics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5751 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 16:47:26 +00:00
rpoplin
6c7a0adc76
Updating VariantGaussianMixtureModelUnitTest to use truth sensitivity cutting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5750 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 13:56:01 +00:00
kshakir
08f0509a5c
Disabling the queue/pipeline package by default so that scala code can build. If it's not going to be fixed the package should be removed. If it is going to be fixed this patch to build.xml should be reverted.
...
Also added the old model of indel calling to the FCP.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5749 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 12:17:33 +00:00
delangel
a19389528d
Bring back from the dead the old likelihoods model for indels, which has worse performance but is about 4x faster. Enabled with argument -GSA_PRODUCTION_ONLY in UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5748 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 22:38:33 +00:00
carneiro
f35d955490
recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:43:09 +00:00
carneiro
9f2a8033ff
just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:42:23 +00:00
carneiro
c2f8536e02
removing old GATK options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:40:39 +00:00
carneiro
8bb92160b5
Script to identify mendelian violations in the CEU Trio and follow up with supposedly incorrect SNP calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5744 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:19:42 +00:00
carneiro
e2b9227d8d
script to test BQSR on good/bad regions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5743 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:16:37 +00:00
carneiro
e5cc0f4eec
Added 'specificity' to variant eval's Validation Report evaluator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5742 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:48:30 +00:00