depristo
2bdb011865
Improvements to 1KG processing pipeline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3807 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 15:33:47 +00:00
asivache
1dd8a28a5d
Added new query: isMNP(feature); returns true if dbsnp feature is multi-nucleotide polymorfism (e.g. a di-nuc TA ->CC)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3806 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 15:32:10 +00:00
corin
74f705d943
fixed silly syntax errors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3805 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 15:08:46 +00:00
corin
917469ef43
This script produces information for a firehose job-finished email
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3804 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 14:44:52 +00:00
aaron
ec94cfdf05
remove unit test for VCF writer, it's not applicable now that we produce only VCF4. Guillermo, it's up to you if you want to adapt this or remove it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3803 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 14:33:25 +00:00
depristo
b29eda83bb
Parallelized CountCovarites! percent_ref_called_var now a standard genotype concordance module (for validation!). Really much smarter merging of headers for combineVariants. VCF codecs now actually look at the file version and blow up if they are the wrong versions. setHeaderVersion() in VCFHeaderLine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3802 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 14:10:18 +00:00
ebanks
f293eb7de1
Fix for Kim: for some ungodly reason, I was initializing the bins that were maintaining counts to 1 instead of 0.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3801 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 03:40:29 +00:00
ebanks
e7e58d7129
The SAM spec has now officially reserved my new tags for original cigar and original alignment start... except that OS has been named OP ('original POS')
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3800 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 00:09:36 +00:00
depristo
81eef0d993
DOT visualization with Queue. More sophisticated recalibation queue script with scatter/gather
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3799 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-15 22:32:48 +00:00
ebanks
ab84ed8c68
Fix for Mark: get rid of old program tags whose IDs clash with the recalibrator/realigner tag (including if the id has a .1 at the end, etc.). Keeping them around is dangerous because we don't know which one refers to the latest run of the tool on the bam.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3798 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-15 19:13:50 +00:00
depristo
6bf5df4eb5
Better merge command
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3797 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-15 17:02:47 +00:00
hanna
dfddf8fd75
- Bring the PaperGenotyper up to code.
...
- Remove some old debugging cruft regarding handling of threaded engine exceptions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3796 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 22:31:21 +00:00
bthomas
f65cba6b9a
Adding support for shared file locking via a new class for file locking, FSLockWithShared. This will eventually take over for FSLock, the current file locking class - I'll work with Aaron to merge the tribble code that uses FSLock right now.
...
FYI: creating an exclusive lock on a file that does not exist will create that file as an empty file, and will NOT delete that file after the program terminates. So watch out if it's possible that the file you're locking does not exist - could end up leaving extra files that confuse users.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3795 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 20:45:51 +00:00
hanna
a8caa20378
Previously the hierarchical microscheduler defensively coded around and reported exceptions of
...
the walker itself, but didn't do a great job of catching framework exceptions. This became extremely
unfortunate in the case where walkers caused exceptions that manifested themselves in the framework,
such as when the walker opens more files than file handles are available.
Reworked the exception handling so that framework errors are treated like walker errors and the resulting
exception bubbles out of the walker. Stack traces for threaded walkers are still convoluted and nasty.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3794 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 20:34:43 +00:00
ebanks
bf384f48e1
Reverting previous change because it won't always work. More investigation needed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3793 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:13:17 +00:00
ebanks
e4bfb06888
Check header type instead of rod type, since rod type will now be VC and not VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3792 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:10:09 +00:00
ebanks
0226412b11
Add GQ to list of genotype attributes for reg exp
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3791 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:01:11 +00:00
ebanks
78a4d8ec3d
Removing more references to VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3790 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 16:34:15 +00:00
ebanks
af23762778
Removing more references to VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3789 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 11:54:23 +00:00
ebanks
a4f8d70d8d
oops, forgot to update this integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3788 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 11:38:33 +00:00
ebanks
460283f6d2
No more manually converting VariantContexts to VCFRecords. You should be utilizing VCs and not VCFRecords.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3787 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 05:21:28 +00:00
ebanks
6b5c88d4d6
The GATK no longer writes vcf3.3; welcome to the world of vcf4.0. Needed to fix a few output bugs to get this to work, but it's looking great. Much more still to come. Guillermo: hopefully this doesn't break your local build too badly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3786 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 04:56:58 +00:00
depristo
530a320f28
Intermediate commit of scatter/gather recalibation pipeline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3785 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 22:46:08 +00:00
chartl
19a5830186
Restore "type" annotation (but not genomechange or cDNA change, which are already encoded in the VCF)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3784 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 17:33:15 +00:00
chartl
9d2a485532
Update to AminoAcidTransition eval module
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3783 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 17:12:03 +00:00
chartl
9cc1a411b2
Altering the formatting of the annotation to work better with VariantEval's AminoAcidTransition
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3782 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 16:31:14 +00:00
rpoplin
3db7fbb5e9
Fix for added EOF in csv file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3781 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 16:09:48 +00:00
ebanks
9a05e8143d
Move to 4.0 and away from VCFRecord.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3780 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 15:54:54 +00:00
ebanks
6442dabf94
Deleting/archiving as instructed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3779 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 15:23:50 +00:00
ebanks
7e7da75d27
Moving over to 4.0 and away from VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3778 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 14:07:10 +00:00
ebanks
d896d03554
Moving VF to vcf 4.0. Still need to fix genotype filters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3777 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 11:39:51 +00:00
ebanks
76b3b39720
Technically, Mark broke this with his commit earlier. But since I had an outstanding broken test, I lose and have to fix this one too...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3776 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 03:58:38 +00:00
ebanks
1bef7dd170
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3775 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 00:56:12 +00:00
depristo
de969f7cc7
logger != null check
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3774 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 23:07:14 +00:00
depristo
2e445262f2
Promotion to . for variable numbers of arguments
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3773 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 22:53:53 +00:00
delangel
297f15a60c
Protect ProduceBeagleInputWalker against evil users who feed to it VCF's with indels, no variation sites or other interesting markers: Write to Beagle input only in biallelic SNP sites since that's the only thing Beagle can do.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3772 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 20:54:42 +00:00
chartl
80a5ddfa2f
Memory string added
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3771 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 20:43:19 +00:00
ebanks
52c534a8f2
Updating to VCF 4.0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3770 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 20:18:30 +00:00
delangel
5992b79159
a) Simplify normalization code in ProduceBeagleInputWalker, as to always normalize, and use MathUtils.normalizeFromLog10 to do this.
...
b) Several improvements to BeagleOutputToVCFWalker:
1. If a Hapmap input track is provided (e.g. -B comp,VCF,file), Hapmap sites will be annotated with Hapmap Allele count and allele frequency (key ACH, AFH).
2. If probability of correct genotype is lower than ncthr (optional argument provided by user, default = 0.0), walker will keep original calls instead of using Beagle calls.
3. Instead of annotating just whether Beagle had modified a site, annotate instead HOW MANY genotypes in a site were actually changed by Beagle.
All three improvements are mostly for debugging and analysis only.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3769 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 19:54:58 +00:00
chartl
46c39f2d53
Quick python scripts for going from genotype VCFs to site-only VCFs, and one to fix BC vcf files (which had "het" genotypes at non-variant sites)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3768 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 19:13:32 +00:00
ebanks
e50627a49e
1. Updated tests and added integration test for liftover code.
...
2. Updated liftover code (and scripts) to emit vcf 4.0 and no longer depend on VCFRecord.
3. Beagle walker now also emits vcf 4.0.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3767 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 17:58:18 +00:00
ebanks
2a7112302a
More archiving
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3766 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 17:04:41 +00:00
ebanks
221e01fb27
deleting/archiving as instructed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3765 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 16:59:45 +00:00
ebanks
8086ab1f75
Pulled sample/header merging routines out of CombineVariants and into util classes. Added more generalized methods for retrieving samples. Updated the Beagle walkers to use these methods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3764 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 16:51:54 +00:00
ebanks
0c4a32843c
No longer uses VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3763 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 13:57:39 +00:00
ebanks
f130d29318
No longer uses VCFRecord.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3762 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 13:34:10 +00:00
ebanks
e75b3e13bd
updating unit test for previous fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3761 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 03:23:53 +00:00
ebanks
0427f3554b
Bug fix: valid fields were being stripped off the FORMAT for samples because String.match was used instead of String.equals. Also, please use VCFConstants from now on instead of hard-coding e.g. missing values into the code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3760 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 03:06:51 +00:00
ebanks
fb717fe128
First pass needed to remove old VCF code: moving all VCF-related constants into a single unified class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3759 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 07:19:16 +00:00
ebanks
6b960bd9c5
Fix for Steve: genotype filters still want to see the values from the VC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3758 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 04:30:15 +00:00