Commit Graph

6079 Commits (5786b51bcfd992afad8e71c8c85514ae030220c8)

Author SHA1 Message Date
Eric Banks 5786b51bcf Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-30 15:26:28 -04:00
Eric Banks 761347b8d5 The VariantContext utility method used by SelectVariants wasn't checking the filter status (unfiltered vs. passing filters) and always returned a VC that was passing filters. This is fixed and the md5 from the VCF Streaming test has been re-updated. 2011-06-30 15:26:09 -04:00
Mauricio Carneiro 64048a67e8 cleaning up ghost scala scripts. Deleting clearly unused one and moving others to qscripts.archive 2011-06-30 15:20:43 -04:00
Mauricio Carneiro 867056af51 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-30 15:03:18 -04:00
Mark A. DePristo defa3cfe85 Moved around private walkers into appropriate directories in private gatk.walkers. Moved a few public walkers into private qc package, and some private qc walkers into the public directory. Removed several obviously broken and/or unused walkers. 2011-06-30 14:59:58 -04:00
Mauricio Carneiro 75f93882c7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-30 14:59:10 -04:00
Mauricio Carneiro 2cb1376ed0 VCFStreaming was failing integration tests because now select variants outputs the samples in alphabetical order, instead of random as before. Fixed the MD5. 2011-06-30 14:55:39 -04:00
Ryan Poplin f4ae6edb92 Moving some of the released R scripts into public from private 2011-06-30 14:55:25 -04:00
Mauricio Carneiro 197b7141c1 Added an optional argument -bt <num_threads> for BWA to run multithreaded. 2011-06-30 14:41:57 -04:00
Mauricio Carneiro f4463d38ca BWA requires pair ended reads to be sorted by read names when operating over BAM files, but Picard sorts by coordinate, so in case we use BWA in pair ended reads, the pipeline now resorts the BAM in read name order, realigns it then sorts it in coordinate order. 2011-06-30 14:29:21 -04:00
Mauricio Carneiro bb6b1d615b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-30 13:50:16 -04:00
Eric Banks 804d5f22d5 Reverting previous change, as promised. 2011-06-30 13:18:30 -04:00
Eric Banks 9e234cf5d6 This is a temporary commit for Picard. It will absolutely break integration tests, but I'm going to revert it in 1 minute. Because we don't want them in unstable, I need to push this into stable. 2011-06-30 13:17:14 -04:00
Eric Banks 352c38fc0b Updated to reflect dbsnp conversion fix 2011-06-30 11:55:56 -04:00
Eric Banks f51478595b I guess I don't want to lose these 2011-06-30 11:43:46 -04:00
Mauricio Carneiro efd99c3c11 new home for the core qscripts 2011-06-30 11:32:06 -04:00
Mauricio Carneiro a26a793532 moving the oneoffs queue scripts to their new home. 2011-06-30 11:29:33 -04:00
Mauricio Carneiro 36620c62be Saving the walkers I 'care about'. 2011-06-30 00:20:26 -04:00
Mauricio Carneiro a3c17a38bb Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-29 18:26:11 -04:00
Mauricio Carneiro 0d1978756e Removing the replication/validation files and package from the stable repo. It should only exist in private/unstable. 2011-06-29 18:25:47 -04:00
David Roazen f18fffd625 Fixing broken paths to the testdata directory throughout the codebase. 2011-06-29 17:36:47 -04:00
Mauricio Carneiro 2ad4d2f0bb Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-29 16:06:14 -04:00
Mauricio Carneiro 1085df8b7b Making the BQSR pipeline publicly available and supported.
this is for all the Pacbio validation that is going on right no in the cancer group. They are all using this script, and I'm happy to support it.
2011-06-29 16:05:32 -04:00
Mauricio Carneiro ec9d8313ee Giving the BQSR pipeline script a more appropriate name since more people are using it now. 2011-06-29 16:03:26 -04:00
Eric Banks 70ba851478 Might as well check for the illegal state and throw an exception 2011-06-29 15:59:10 -04:00
Eric Banks 1f19afe1d9 Fixed bug in the IndelRealigner: now that variants are correctly typed in VariantContext, it is possible that a variant can be an indel but neither an insertion or a deletion; added a isComplexIndel() method and now we check for such an event in the realigner (we don't use them to generate alternate consenses). Also, added a isMNP() method while I was there so that it would be consistent with other variant types. 2011-06-29 15:54:09 -04:00
Eric Banks f018c27050 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-29 14:56:22 -04:00
Eric Banks 33c67a139c Wrong package; this should have been moved when VC got moved in from Tribble 2011-06-29 14:56:02 -04:00
Matt Hanna 9a22d78e48 Fix up gsalib target to work with new public/private directory structure. 2011-06-29 14:48:10 -04:00
Guillermo del Angel dee10140dd Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-29 13:58:04 -04:00
Eric Banks 8586c86bc4 My commit from last week to fix the old dbsnp rod conversion only worked for locus traversals. Updated now to work for all traversals. 2011-06-29 13:56:37 -04:00
Guillermo del Angel f736a1d61b Updated md5's from previous checkin 2011-06-29 13:37:15 -04:00
Guillermo del Angel 5b6d279a2e Two bug fixes:
a) Modified the way clipped bases are dealt with in ReadPosRankSumTest when annotating indels. Cigar string cannot be trusted because BWA can clip good high quality bases and some sites get incorrect ReadPos annotations if BWA systematically clips at an indel breakpoint.
b) PL header needs to specify "." as length. Otherwise we fail VCF validation if multiallelic sites are present.
2011-06-29 10:21:27 -04:00
David Roazen 643458d7db Updated the tribble jar -- this should fix most of the integration test
failures we've been seeing.

Note that with tribble's new svn repository the revision numbers have reset,
hence "revision 3"
2011-06-29 01:11:03 -04:00
David Roazen 0a96f53772 One last test... 2011-06-28 19:18:17 -04:00
David Roazen efff6060b7 Revert "Final test commits before going live."
This reverts commit ba1443e54b829dd97af636415ed0d5cdaf256ad3.
2011-06-28 18:58:55 -04:00
David Roazen 755b80dc74 Final test commits before going live. 2011-06-28 18:56:00 -04:00
David Roazen 7e243fae6e Bug fixes for build.xml related to the public/private restructuring. 2011-06-28 12:55:18 -04:00
David Roazen 139c6b84a1 Modified build.xml and the help extractor doclet to use the output of "git
describe" as an absolute version number (if the repository has at least one
tag), using the raw SHA-1 hash value as a fallback version number in the case
where there are no tags.
2011-06-28 08:37:05 -04:00
David Roazen 3c9497788e Reorganized the codebase beneath top-level public and private directories,
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00
carneiro b46279d62e required RODs are now checked by annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6080 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-25 06:38:19 +00:00
ebanks 3879b02cdd updating a package
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6079 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 20:13:28 +00:00
ebanks 86aa82caf8 Missed this integration test during my move of VC from Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6078 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 20:07:25 +00:00
ebanks c2ec2891d1 Other people besides Mark also wanted VariantContext moved to the GATK, so I listened. I am moving VariantContext and all codecs that rely on it (VCF, SoapSNP, HapMap, and CGvar) to the GATK - including relevant unit tests and data files. Additionally, Matt has modified build.xml to generate the necessary jar files so that people can use our VCF codec with Tribble.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6077 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 16:56:04 +00:00
carneiro be123d1399 missed a check for null on sampleNames. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6076 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 22:42:00 +00:00
carneiro 9c1b8ea796 Updated BQSR script to be more general and work with the new PacBio BAM files - for Kristian Cibulskis
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6075 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 21:05:28 +00:00
carneiro 087a25d9e3 quick memory upgrade to BWA classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6074 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:53:32 +00:00
carneiro fbe157137f removing the old processing pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6073 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:19:13 +00:00
carneiro 91fb664135 Many updates to SelectVariants :
1) There is now a different parameter for sample name (-sn), sample file (-sf) or sample expression (-se). The unexpected behavior of the previous implementation was way too tricky to leave unchecked. (if you had a file or directory named after a sample name, SV wouldn't work)

1b) Fixed a TODO added by Eric -- now the output vcf always has the samples sorted alphabetically regardless of input (this came as a byproduct of the implementation of 1)

2) Discordance and Concordance now work in combination with all other parameters.

3) Discordance now follows Guillermo's suggestion where the discordance track is your VCF and the variant track is the one you are comparing to. I have updated the example in the wiki to reflect this change in interpretation. 

4) If you DON'T provide any samples (-sn, -se or -sf), SelectVariants works with all samples from the VCF and ignores sample/genotype information when doing concordance or discordance. That is, it will report every "missing line" or "concordant line" in the two vcfs, regardless of sample or genotype information.

5) When samples are provided (-sn, -se or -sf) discordance and concordance will go down to the genotypes to determine whether or not you have a discordance/concordance event. In this case, a concordance happens only when the two VCFs display the same sample/genotype information for that locus, and discordance happens when the disc track is missing the line or has a different genotype information for that sample. 

6) When dealing with multiple samples, concordance only happens if ALL your samples agree, and discordance happens if AT LEAST ONE of your samples disagree.

---

Integration tests:

1) Discordance and concordance test added
2) All other tests updated to comply with the new 'sorted output' format and different inputs for samples.

---

Methods for handling sample expressions and files with list of samples were added to SampleUtils. I recommend *NOT USING* the old getSamplesFromCommandLineInput as this mixing of sample names with expressions and files creates a rogue error that can be challenging to catch.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6072 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:18:45 +00:00
droazen 48055d45cb Added support for PICARD functions to QUEUE after following Khalid's pointers on where to do it. I have added the 6 functions used by the Data Processing Pipeline, but from now on it should be a matter of seconds to copy/paste and create bindings to more functions.
Updated the Data Processing Pipeline to use the new Picard classes and reorganized the pre-processing of the pipeline accordingly.

Will only update the wiki once this change goes live.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6071 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:14 +00:00