gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mauricio Carneiro	e939e0d9b3	Small improvement to the license update scripts - launch it once per file type, not license type (was unnecessary). - renamed ParseLicense to UpdateLicense for clarity GSATDG-5	2013-01-21 16:24:33 -05:00
Mauricio Carneiro	35e3939dca	Adding bamboo script to check all licenses - CheckLicense.py script checks license of sources against a provided license file - checkAllLicenses.csh script runs CheckLicense for all files in the repo with the appropriate license for each GSATDG-9	2013-01-21 16:09:38 -05:00
Mauricio Carneiro	35f8dc7426	Accept spaces before comments or package line this allows us to be a bit more lienent before erroring out in the license script. Feature suggested by Yossi. GSATDG-5	2013-01-18 16:51:01 -05:00
Mauricio Carneiro	7b8b064165	Last manual license update (hopefully) if everyone updates their git hook accordingly, this will be the last time I have to manually run the script. GSATDG-5	2013-01-18 16:13:07 -05:00
Mauricio Carneiro	02d1b87326	Better error handling for the license scripts - Your commit will now fail gracefully with an error message if you mess up the license system - Your file will be preserved (unmodified) if you fail the commit process - Error message should be indicative of the error you need to fix (usually missing package information) Set your pre-commit hook as a symlink to be automatically updated by new pushes with : ln -s private/shell/pre-commit .git/hooks/ GSATDG-18 #resolve	2013-01-18 16:13:07 -05:00
Ami Levy-Moonshine	0fb7b73107	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-18 15:03:42 -05:00
Ami Levy-Moonshine	826c29827b	change the default VCFs gatherer of the GATK (not just the UG)	2013-01-18 15:03:12 -05:00
Mauricio Carneiro	63e1a377cc	Automatic license information git-hook This hook will automatically add / fix the license information in all files you commit to the repo. To activate it, copy it to your hooks directory : cp private/shell/pre-commit .git/hook/ Now everytime you commit, you will have all your java and scala files automatically updated. GSATDG-5 GSATDG-7 GSATDG-8 #resolve	2013-01-18 12:47:32 -05:00
Eric Banks	cac439bc5e	Optimized the Allele Biased Downsampling: now it doesn't re-sort the pileup but just removes reads from the original one. Added a small fix that slightly changed md5s.	2013-01-18 11:17:31 -05:00
Chris Hartl	08d2da9057	Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable	2013-01-18 10:28:45 -05:00
Chris Hartl	bf5748a538	Forgot to actually put in the md5. Also with the new change to record pairing and filtering, the multiple-records integration test changed: the indel records (T/TG \| T/TGACA) are matched up (rather than left separate) resulting in properly identifying mismatching alleles, rather than HET-UNAVAILABLE and UNAVAILABLE-HET. Very nice.	2013-01-18 10:25:36 -05:00
Mauricio Carneiro	f99bd5be6a	Small fix to make the script more generic (thanks Yossi) GSA-710 #resolve	2013-01-18 10:06:50 -05:00
Chris Hartl	91030e9afa	Bugfix: records that get paired up during the resolution of multiple-records-per-site were not going into genotype-level filtering. Caught via testing. Testing for moltenized output, and for genotype-level filtering. This tool is now fully functional. There are three todo items: 1) Docs 2) An additional output table that gives concordance proportions normalized by records in both eval and comp (not just total in eval or total in comp) 3) Code cleanup for table creation (putting a table together the way I do takes -way- too many lines of code)	2013-01-18 09:49:48 -05:00
Eric Banks	39c73a6cf5	1. Ryan and I noticed that the FisherStrand annotation was completely busted for indels with reduced reads; fixed. 2. While making the previous fix and unifying FS for SNPs and indels, I noticed that FS was slightly broken in the general case for indels too; fixed. 3. I also fixed a minor bug in the Allele Biased Downsampling code for reduced reads.	2013-01-18 03:35:48 -05:00
Eric Banks	6a903f2c23	I finally gave up on trying to get the Haplotype/Allele merging to work in the HaplotypeCaller. I've resigned myself instead to create a mapping from Allele to Haplotype. It's cheap so not a big deal, but really shouldn't be necessary. Ryan and I are talking about refactoring for GATK2.5.	2013-01-18 01:21:08 -05:00
Eric Banks	6db3e473af	Better error message for bad qual	2013-01-17 10:30:04 -05:00
Eric Banks	953592421b	I think we got out of sync with the HC tests as we were clobbering each other's changes. Only differences here are to some RankSumTest values.	2013-01-17 09:19:21 -05:00
Eric Banks	ded659232b	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-16 22:49:56 -05:00
Eric Banks	a623cca89a	Bug fix for HaplotypeCaller, as reported on the forum: when reduced reads didn't completely overlap a deletion call, we were incorrectly trying to find the reference position of a base on the read that didn't exist. Added integration test to cover this case.	2013-01-16 22:47:58 -05:00
Ami Levy-Moonshine	fcb3c6dc2a	fix small bugs in scala files	2013-01-16 22:42:20 -05:00
Eric Banks	dbb69a1e10	Need to use ints for quals in HaplotypeScore instead of bytes because of overflow (they are summed when haplotypes are combined)	2013-01-16 22:33:16 -05:00
Chris Hartl	e15d4ad278	Addition of moltenize argument for moltenized tabular output. NRD/NRS not moltenized because there are only two columns.	2013-01-16 18:00:23 -05:00
Mark DePristo	738c24a3b1	Add tests to ensure that all insertion reads appear in the active region traversal	2013-01-16 16:25:36 -05:00
Mark DePristo	3c476a92a2	Add dummy functionality (currently throws an error) to allow HC to include unmapped reads during assembly and calling	2013-01-16 16:25:36 -05:00
Eric Banks	79bc818022	Bug fix for VariantsToVCF: old dbSNP files can have '-' as reference base and those records always need to be padded.	2013-01-16 16:15:58 -05:00
Eric Banks	4cf34ee9da	Bug fix to FisherStrand: do not let it output INFINITY. This all needs to be unit tested, but that's coming on the horizon.	2013-01-16 15:35:04 -05:00
Mark DePristo	2a42b47e4a	Massive expansion of ActiveRegionTraversal unit tests, resulting in several bugfixes to ART -- UnitTests now include combinational tiling of reads within and spanning shard boundaries -- ART now properly handles shard transitions, and does so efficiently without requiring hash sets or other collections of reads -- Updating HC and CountReadsInActiveRegions integration tests	2013-01-16 15:30:00 -05:00
Mark DePristo	ddcb33fcf8	Cache result of getLocation() in Shard so we don't performance expensive calculation over and over	2013-01-16 15:30:00 -05:00
Mark DePristo	4d0e7b50ec	ArtificialBAMBuilder utility class for creating streams of GATKSAMRecords with a variety of properties -- Allows us to make a stream of reads or an index BAM file with read having the following properties (coming from n samples, of fixed read length and aligned to the genome with M operator, having N reads per alignment start, skipping N bases between each alignment start, starting at a given alignment start) -- This stream can be handed back to the caller immediately, or written to an indexed BAM file -- Update LocusIteratorByStateUnitTest to use this functionality (which was refactored from LIBS unit tests and ArtificialSAMUtils)	2013-01-16 15:29:59 -05:00
Eric Banks	ec1cfe6732	Oops, forgot to add 1 of my files	2013-01-16 15:05:49 -05:00
Eric Banks	e47a389b26	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-16 14:59:11 -05:00
Eric Banks	d18dbcbac1	Added tests for changing IUPAC bases to Ns, for failing on bad ref bases, and for the HaplotypeCaller not failing when running over a region with an IUPAC base. Out of curiosity, why does Picard's IndexedFastaSequenceFile allow one to query for start position 0? When doing so, that base is a line feed (-1 offset to the first base in the contig) which is an illegal base (and which caused me no end of trouble)...	2013-01-16 14:55:33 -05:00
Khalid Shakir	4ffb43079f	Re-committing the following changes from Dec 18: Refactored interval specific arguments out of GATKArgumentCollection into InvtervalArgumentCollection such that it can be used in other CommandLinePrograms. Updated SelectHeaders to print out full interval arguments. Added RemoteFile.createUrl(Date expiration) to enable creation of presigned URLs for download over http: or file:.	2013-01-16 12:43:15 -05:00
Eric Banks	445735a4a5	There was no reason to be sharing the Haplotype infrastructure between the HaplotypeCaller and the HaplotypeScore annotation since they were really looking for different things. Separated them out, adding efficiencies for the HaplotypeScore version.	2013-01-16 11:10:13 -05:00
Eric Banks	392b5cbcdf	The CachingIndexedFastaSequenceFile now automatically converts IUPAC bases to Ns and errors out on other non-standard bases. This way walkers won't see anything except the standard bases plus Ns in the reference. Added option to turn off this feature (to maintain backwards compatibility). As part of this commit I cleaned up the BaseUtils code by adding a Base enum and removing all of the static indexes for each of the bases. This uncovered a bug in the way the DepthOfCoverage walker counts deletions (it was counting Ns instead!) that isn't covered by tests. Fortunately that walker is being deprecated soon...	2013-01-16 10:22:43 -05:00
Eric Banks	4fb3e48099	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-16 00:13:38 -05:00
Eric Banks	0d282a7750	Bam writing from HaplotypeCaller seems to be working on all my test cases. Note that it's a hidden debugging option for now. Please let me know if you notice any bad behavior with it.	2013-01-16 00:12:02 -05:00
Mark DePristo	c8de9b92d0	Updating CountReadsInActiveRegionsIntegrationTest integration tests due to new ART	2013-01-15 15:41:33 -05:00
Chris Hartl	327169b283	Refactor the method that identifies the site overlap type into the type enum class (so it can be used elsewhere potentially). Completed todo item: for sites like (eval) 20 12345 A C 20 12345 A AC (comp) 20 12345 A C 20 12345 A ACCC the records will be matched by the presence of a non-empty intersection of alleles. Any leftover records are then paired with an empty variant context (as though the call was unique). This has one somewhat counterintuitive feature, which is that normally (eval) 20 12345 A AC (comp) 20 12345 A ACCC would be classified as 'ALLELES_DO_NOT_MATCH' (and not counted in genotype tables), in the presence of the SNP, they're counted as EVAL_ONLY and TRUTH_ONLY respectively. + integration test	2013-01-15 12:13:45 -05:00
Eric Banks	d3baa4b8ca	Have Haplotype extend the Allele class. This way, we don't need to create a new Allele for every read/Haplotype pair to be placed in the PerReadAlleleLikelihoodMap (very inefficient). Also, now we can easily get the Haplotype associated with the best allele for a given read.	2013-01-15 11:36:20 -05:00
Mark DePristo	3c37ea014b	Retire original TraverseActiveRegion, leaving only the new optimized version -- Required some updates to MD5s, which was unexpected, and will be sorted out later with more detailed unit tests	2013-01-15 10:24:45 -05:00
Eric Banks	94800771e3	1. Initial implementation of bam writing for the HaplotypeCaller with -bam argument; currently only assembled haplotypes are emitted. 2. Framework is set up in the VariantAnnotator for the HaplotypeCaller to be able to call in to annotate dbSNP plus comp RODs. Until the HC uses meta data though, this won't work.	2013-01-15 10:19:18 -05:00
Mark DePristo	39bc9e999d	Add a test to LocusIteratorByState to ensure that we aren't holding reads anywhere -- Run an iterator with 100Ks of reads, each carrying MBs of byte[] data, through LIBS, all starting at the same position. Will crash with an out-of-memory error if we're holding reads anywhere in the system. -- Is there a better way to test this behavior?	2013-01-14 16:30:16 -05:00
Mark DePristo	b8b2b9b2de	ManagingReferenceOrderedView optimization: don't allow a fresh RefMetaDataTracker in the frequent case where there's no reference meta data	2013-01-14 16:30:16 -05:00
Mark DePristo	7eea6b8f92	ReservoirDownsampler optimizations -- Add an option to not allocate always ArrayLists of targetSampleSize, but rather the previous size + MARGIN. This helps for LIBS as most of the time we don't need nearly so much space as we allow -- consumeFinalizedItems returns an empty list if the reservior is empty, which it often true for our BAM files with low coverage -- Allow empty sample lists for SamplePartitioner as these are used by the RefTraversals and other non-read based traversals Make the reservoir downsampler use a linked list, rather than a fixed sized array list, in the expectFewOverflows case	2013-01-14 16:30:16 -05:00
Mark DePristo	c7f0ca8ac5	Optimization for LIBS: PerSampleReadStateManager now uses a simple LinkedList of AlignmentStateMachine -- Instead of storing a list of list of alignment starts, which is expensive to manipulate, we instead store a linear list of alignment starts. Not grouped as previously. This enables us to simplify iteration and update operations, making them much faster -- Critically, the downsampler still requires this list of list. We convert back and forth between these two representations as required, which is very rarely for normal data sets (WGS NA12878 on chr20 is 0.2%, 4x WGS is even less).	2013-01-14 16:30:16 -05:00
Mark DePristo	5a5422e4f8	Refactor PerSampleReadStates into a separate class -- No longer update the total counts in each per-sample state manager, but instead return delta counts that are updated by the overall ReadStateManager -- One step on the way to improving the underlying representation of the data in PerSampleReadStateManager -- Make LocusIteratorByState final	2013-01-14 16:30:16 -05:00
Mark DePristo	5c2799554a	Refactor updateReadStates into PerSampleReadStateManager, add tracking of downsampling rate	2013-01-14 16:30:16 -05:00
Mark DePristo	a4334a67e0	SamplePartitioner optimizations and bugfixes -- Use a linked hash map instead of a hash map since we want to iterate through the map fairly often -- Ensure that we call doneSubmittingReads before getting reads for samples. This function call fell out before and since it wasn't enforced I only noticed the problem while writing comments -- Don't make unnecessary calls to contains for map. Just use get() and check that the result is null -- Use a LinkedList in PassThroughDownsampler, since this is faster for add() than the existing ArrayList, and we were's using random access to any resulting	2013-01-14 16:30:16 -05:00
Mark DePristo	19288b007d	LIBS bugfix: kept reads now only (correctly) includes reads that at least passed the reservoir -- Added unit tests to ensure this behavior is correct	2013-01-14 16:30:16 -05:00

1 2 3 4 5 ...

11632 Commits (e939e0d9b3ddd224728209d38e32a7768dc886d3) All Branches Search

11632 Commits (e939e0d9b3ddd224728209d38e32a7768dc886d3)

All Branches