gatk-3.8

Commit Graph

Author	SHA1	Message	Date
depristo	e16bc2cbd9	Contracts for Java now write for GenomeLoc and GenomeLocParser. The semantics of GenomeLoc are now much clearer. It is no longer allowed to create invalid GenomeLocs -- you can only create them with well formed start, end, and contigs, with respect to the mater dictionary. Where one previously created an invalid GenomeLoc, and asked is this valid, you must now provide the raw arguments to helper functions to assess this. Providing bad arguments to GenomeLoc generates UserExceptions now. Added utilty functions contigIsInDictionary and indexIsInDictionary to help with this. Refactored several Interval utilties from GenomeLocParser to IntervalUtils, as one might expect they go Removed GenomeLoc.clone() method, as this was not correctly implemented, and actually unnecessary, as GenomeLocs are immutable. Several iterator classes have changed to remove their use of clone() Removed misc. unnecessary imports Disabled, temporarily, the validating pileup integration test, as it uses reads mapped to an different reference sequence for ecoli, and this now does not satisfy the contracts for GenomeLoc git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5827 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-20 15:43:27 +00:00
depristo	0095aa2627	Contracts for java now enabled by default in GATK build. The contract checking is automatically enabled when running tests and integrationtests. If you want to run the GATK with Contract checking enabled, add -javaagent:lib/cofoja.jar to your jvm args git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5826 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-20 02:53:42 +00:00
kshakir	6c6e52def9	Renamed FCP to HybridSelectionPipeline. Reviewed pipelines with dev team. HSP updates: - Calling SNPs and Indels at the same time then using SelectVariants to separate them for filtering - Moved logs next to the files like in WGP - Flattened outputs into one directory - The file names for the final outputs are now <projectName>.vcf and <projectName>.eval - Updated test to pass the chr20 intervals instead of a boolean - Removed MultiFCP WGP updates: - Only cleaning and calling chromosomes 1-22, X, Y, MT - Splitting SNPs from indels, filtering indels, then merging the selected SNPs and selected Indels back together to make sure there are no collisions in CombineVariants - Still running VQSR on the recombined SNPs plus hard filtered indels - Using hard indel filters from delangel - Reduced number of tranches with rpoplin - Changed prior for dbsnp from 10 to 8 with rpoplin - Assuming identical samples on both CombineVariants - Explicitly using variant merge option UNION even though it's the default - Not setting the default genotype merge option PRIORITIZE - Generating a vcf and eval for each tranche git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5825 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 22:47:02 +00:00
kiran	d896a4a9d3	Given genotypes for a trio, phases child by transmission. Computes probability that the determined phase is correct given that the genotypes for mom and dad are correct (useful if you want to use this to compare phasing accuracy, but want to break that comparison down by phasing confidence in the truth set). Optionally filters out sites where the phasing is indeterminate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5824 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 21:27:37 +00:00
rpoplin	fe4b40ac2c	Adding new InbreedingCoeff and PercentNBases annotations for Guillermo to use. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5823 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 19:50:39 +00:00
carneiro	76c87c9f1d	trio WGS was creating trio WEX filenames. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5822 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 17:45:45 +00:00
ebanks	bc98ac1e74	Adding a TODO for future consideration git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5821 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-19 15:02:23 +00:00
hanna	0bb6b9a91a	Locus iterators were implemented in a peekable style, which meant that a locus and its three or four nearest neighbors could be in memory at once. Tweaking the iterators to ensure that previous AlignmentContexts don't have strong references which means that the garbage collector can work effectively to help us trundle through these regions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5820 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 21:40:40 +00:00
hanna	a38b2be329	Fix for old, broken invariant where unmapped reads are represented by null rather than an empty BAMFileSpan. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5819 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 20:57:38 +00:00
carneiro	ebcd333ed8	Quick small updates: SelectVariants: typo MethodsDevelopmentPipeline: Added CEU Trio WGS dataset git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5818 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 20:08:39 +00:00
carneiro	b5b8cb959a	Added VQSR to the downsampling script and changed memory limits for the clean script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5817 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 20:07:42 +00:00
rpoplin	4b00fd2688	Adding User Exception to VQSR for the case of trying to cluster with an annotation that doesn't exist in the input VCF git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5816 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 19:47:51 +00:00
depristo	218354e338	Contracts for Java (http://code.google.com/p/cofoja/) infrastructure enabled. No piece of code actually uses this, so it's possible to remove easily. Does not build by default (you must modify build.xml). Really an intermediate commit so I can play around with the system in my java classes and revert safely. Very much looking forward to DVCS git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5815 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 18:05:59 +00:00
kshakir	83e207d9dd	Added option to exclude intervals during chunk calling. Removed job priority as temp space isn't as tight at the moment and planning on changing the priority interface. Updated chunk calling with ebanks: - Using "the bundle" of resources. - Using dbsnp 132 and 1000G indel RODs for both RTC & IR. - Using the default maxIntervalSize in RTC. - Removed use of UG.exactCalculation argument. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5814 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-18 03:48:02 +00:00
rpoplin	d698c87bbf	More UserExceptions and warnings in VQSR. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5813 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-16 19:03:21 +00:00
kshakir	541b5f7a80	Somehow checked in a version that was building extensions for everything ("") instead of selected packages. Fixed. Also added more logging when extension generation fails. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5812 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-16 16:58:37 +00:00
delangel	a27e8b1dc6	Bug fix - use correct variable to retrieve from map. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5811 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-16 15:32:58 +00:00
rpoplin	d925f76edc	Cutting down on the number of info lines in VQSR so that I can read the warning messages git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5810 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-16 13:35:51 +00:00
delangel	5a7444e186	First step in refactoring UG way of storing indel likelihoods - main motive is that rank sum annotations require per-read quality or likelihood information, and even the question "what allele of a variant is present in a read" which is trivial for SNPs may not be that straightforward for indels. This step just changes storage of likelihoods so now we have, instead of an internal matrix, a class member which stores, as a hash table, a mapping from pileup element to an (allele, likelihood) pair. There's no functional change aside from internal data storage. As a bonus, we get for free a 2-3x improvement in speed in calling because redundant likelihood computations are removed. Next step will hook this up to, and redefine annotation engine interaction with UG for indel case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5809 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-15 23:04:11 +00:00
depristo	3ccc08ace4	Now emits siteType = {SNP,INDEL}. Doesn't work (and may never actually work) for indels under current extended event system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5808 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-15 19:16:09 +00:00
depristo	75db4705ab	Added splitContextByReadGroup() and fixed bug in getPileupForReadGroup() that resulted in a NPE when no reads where present for a read group. Added doc string for getNBoundRodTracks() Intermediate commit for CalibrateGenotypeLikelihoods and GenotypeConcordanceTable, so I have a record of my work. Not ready for public consumption. Really looking forward to making local commits so I can track my progress without needing to push incomplete functionality up to the server. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5807 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-15 17:36:07 +00:00
depristo	9423652ad8	Computes how well a genotype chip covers a reference panel git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5806 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-14 15:07:28 +00:00
depristo	5e9c0d00c6	Simple R script to visualize geontype likelihood accuracy git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5805 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-14 15:05:55 +00:00
delangel	fa75efb6ac	Backing off - need to change pileup interface for rank sum tests before indels can be annotated with them git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5804 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-13 21:54:54 +00:00
asivache	befbcd274b	Computes additional stats we want to use later for filtering: median and mad for indel position with respect to starts and ends of all the reads that support it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5803 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-13 21:19:58 +00:00
asivache	5c889580c4	Change of logic: if "read" (sequence 2) sticks out beyond the boundary of the ref (sequence 1) it is aligned to, the extra bases on the left or on the right will be softclipped in the cigar generated for such an alignment, rather than added to the firts/last M block. This also affects alignment offset: if read starts before the ref (used to be represented by a negative offset), the cigar now will start with S, and the returned offset (alignment start) will be 0. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5802 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-13 21:12:54 +00:00
delangel	d4ca8d94fa	Trivial change to allow indels to be annotated by rank rum tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5801 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-13 20:24:08 +00:00
kshakir	95fc6c0a83	Changed VR tranches from old 0.1-10 to new 100 to 90. Using hapmap training and truth based on wiki. Explicitly setting the ts_filter_level even though 99.0 is the default. Recal file path now ends with with .recal. Added ar's vcf input. Omni rod name now omni instead of 1kg. The VR RodBind tags had spaces in them. Was passing both the full intervals and the chunk intervals to chunk jobs. Switched back to chr20 for default since the VR crashes on small intervals sets with "MESSAGE: Matrix is singular." Log files names based on the file paths + .out. Added eval statifications by sample based on the Hybrid Selection / Whole Exome pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5800 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-13 14:38:56 +00:00
kshakir	08c13f3944	Using embedded GATK. Hardcoded the reference and dbsnp since the training rods are also hardcoded, for now. Changed freeze/chr20 to wg/chr20/cent1 to also test the heaviest known shard. Other cleanup. TODO: Memory command line options or have the script figure it out using FLS or similar. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5799 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 23:19:49 +00:00
hanna	03452c15c0	Cleanup GATKBAMIndex unit test to allow a more efficient access pattern for FindLargeShards. Runtime of FindLargeShards on papuan dataset is now 75min. GATK proper should benefit as well, although the benefits might be so small as to not be measurable. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5798 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 21:50:33 +00:00
dheiman	9e08a699c6	Corrected memory handling and jobName formatting issues git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5797 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 17:47:56 +00:00
depristo	db1f9af679	Now supports multiple records in allele at sites that genotype as reference git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5796 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 17:36:27 +00:00
chartl	66c8fa5c48	James P says this change worked for him, so I'm committing it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5795 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 16:55:18 +00:00
rpoplin	a22e98a2c4	Yikes. Fixing the build git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5794 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 01:52:35 +00:00
rpoplin	40797f9d45	Ensuring a minimum number of variants when clustering with bad variants. Better error message when Matrix library fails to calculate inverse. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5793 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 01:48:37 +00:00
kshakir	a20d257773	Generating extensions for org.broadinstitute.sting.gatk.datasources.reads.utilities, including FindLargeShards. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5792 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-12 00:49:31 +00:00
kshakir	ec443e89cf	Added pass-throughs for -Djava.io.tmpdir to javac and testng. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5791 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-11 20:56:35 +00:00
carneiro	fb1be2653c	A succint walker that reports GC content by interval. Taking down two old implementations of the same thing from oneoffs. Documentation added to the wiki. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5790 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-11 18:53:11 +00:00
depristo	9a1d0d7076	Simple bug fix to allow multiple records at same site when genotyping given alleles. Takes only the first record (respecting filters, SNP type, etc), and issues a warning if there is more than one valid record at a site git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5789 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-11 14:17:14 +00:00
dheiman	16db86e6cb	Grid Engine backend to GATK-Queue, initial commit of implementation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5788 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-11 13:21:45 +00:00
ebanks	dfdef2d29b	PLEASE READ ME! In order to prepare for the upcoming changes to VCF4, we felt it was best to split up the vcf3 and vcf4 codecs (vcf4 is not backwards compatible to vcf3 and certain changes are too complex to handle in both codecs). Using the 'VCF' rod type in the GATK will now throw a UserException for vcf3.2 or vcf3.3 files telling you to use the 'VCF3' type instead (and vice versa). Integration/unit tests have been updated. For programmers: note that there is currently a lot of code duplication in the two codecs (although I pulled out the easy stuff to a VCFCodecUtils class); however WE ARE FREEZING THE VCF3 CODEC AND WILL NO LONGER MAKE CHANGES TO IT. All updates/improvements will be targetted to the vcf4 codec only as vcf3 is there only to be able to read legacy files. People should really be using vcf4 files only. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5787 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-11 12:07:44 +00:00
delangel	852e555c00	Fix broken functionality from previous commit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5786 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-10 18:38:25 +00:00
ebanks	8d47d2e813	Fix for Tim. It was possible for the constrained mate fixer to dump its cache in them middle of a given realignment (so the IndelRealigner was playing by the rules). No longer possible. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5785 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-10 16:48:24 +00:00
ebanks	fbe7974094	Renaming for consistency git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5784 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-10 16:36:39 +00:00
delangel	3c364279f4	Add simple ability to create "X out of N" combined files: if a site is present in at least X input rods, it gets output, otherwise it's skipped, controlled with argument -minN. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5783 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-10 15:27:18 +00:00
hanna	f275be6968	A 'fat shard' finder. Cranks through the indices of a BAM file or list of BAM files looking for outliers (outliers right now are defined naively as shards whose sizes are more than 5 stddevs away from the mean). Runs in 13 minutes per chromosome on 707 low pass whole genome BAMs -- not great, but much faster than running UG on the same region to discover anomalies. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5782 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-10 12:56:47 +00:00
kshakir	3ffc2ccd81	Implemented broad specific LSF requirement in the LSF job runner ahead of GridEngine check in by dheiman. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5781 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-09 22:14:04 +00:00
kshakir	7d21350a17	Fixed import. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5780 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-09 18:07:40 +00:00
asivache	0861451726	Print on multiple rows in standalone command line mode when the sequences are too long git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5779 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-09 13:51:00 +00:00
ebanks	bf40351094	Minor update git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5778 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-08 03:48:37 +00:00

1 2 3 4 5 ...

5786 Commits (e16bc2cbd9ff13bdb96e90c2d083908099aa82e8) All Branches Search

5786 Commits (e16bc2cbd9ff13bdb96e90c2d083908099aa82e8)

All Branches