rpoplin
095fc1922a
By popular demand I'm adding the qscript I used to do the 660 bamfile 1000G calling for ASHG. It does cleaning, BAQing, and merging in 3mb chunks genome-wide then calls SNPs on those temporary bams.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4866 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 18:49:03 +00:00
depristo
4a54f3f230
ThreadLocal version of CachingIndexedFastaSequenceFile. More efficient support for shared memory BAQ calculations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4865 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 15:44:48 +00:00
depristo
32d5397c01
Experimental support for sided annotations. Currently not more/less valuable than two-tailed testing. Future experiments are needed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4864 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 15:08:31 +00:00
handsake
21dc05138a
Bug fixes for the bwa aligner and changes to support compiling against newer releases of the bwa code base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4863 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 14:49:15 +00:00
chartl
0d18bd1011
Now that addAll() is in the superclass, no longer need this definition (which, without override, prevents the script from compiling anyway)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4862 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 05:36:31 +00:00
chartl
fd1d817d45
Cryptic implementation of base-string entropy. I suspect this scales ~linearly with length, so I may choose to normalize in the future.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4861 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 22:25:05 +00:00
kshakir
3a6d1dbcef
Fixed a class initializer crash on shutdown when the graph has nothing to run.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4860 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 18:56:55 +00:00
kshakir
8101f8f301
Simplified the Queue package definition so that BCEL doesn't miss any dependent queue classes or those loaded via reflection.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4859 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 18:17:48 +00:00
chartl
2bd2667516
Another privately-owned class to add before re-checking out repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4858 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 18:14:51 +00:00
chartl
08710fc71e
A successor to the Design File Generator and GCContent walkers; allows for refseq/other metadata annotation of intervals, and calculates reference GC content and entropy of the interval. Compiles, but as yet untested and incomplete (but my repositories are kinda messed up so i'm committing this to blow 'em away and re checkout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4857 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 18:14:03 +00:00
chartl
e406eb0f95
Adding a useful accessor method to TableFeature
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4856 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 18:11:51 +00:00
fromer
6310a524d9
Do not abort integration (over het-het distances) on errors, but warn about them
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4855 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 17:20:25 +00:00
ebanks
8ab4704b4c
Adding a command-line argument to allow missing values to evaluate as false instead of true
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4854 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 05:18:12 +00:00
ebanks
9f3e56e487
VariantAnnotator shouldn't die when multiple records occur at the same position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4853 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 04:05:47 +00:00
fromer
b1f0df0047
Handle case where read lengths are longer than fragment size
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4852 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 02:19:55 +00:00
fromer
a5e1854b3a
Forgot to pass correct parameters to calcPhasingProbsForWindowDistances()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4851 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 01:58:06 +00:00
fromer
2b0dc8625c
Updated RBP theoretical model as per Mark's insights regarding the correct understanding of insert sizes being calculated post-hoc from the distance between read lengths. The correct way to think of it is: 1) There's a fragment of length F. 2) Each of it's two ends are read for L bases. 3) The insert size = i = F - 2 * L, after the fragment's assumed identity is determined by mapping the read mates to a reference sequence. Therefore, the external user-defined distribution is on the FRAGMENT SIZES
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4850 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 01:45:20 +00:00
hanna
acfe83920b
'-L unmapped': adding integration tests for explicitly including (-L unmapped)
...
unmapped reads and explicitly excluding (-XL unmapped) unmapped reads, augmenting
the suite of unit tests already put in place.
'-L unmapped' seems safe to use; go for it, but please validate results against
samtools flagstat when the process finishes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4849 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 23:11:46 +00:00
fromer
aea481ae01
Trivial bug fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4848 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 18:29:15 +00:00
ebanks
dabdeb729e
Eric broke the build. Eric broke the build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4847 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 17:01:38 +00:00
ebanks
5c0b66cb7c
3 big changes that all kill the integration tests: 1. Don't cap the PLs by 255 anymore. 2. Move over to the 3state model as the only available base model for UG (no more base transition tables). 3. New QD implementation when GLs/PLs are available.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4846 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 16:24:28 +00:00
depristo
46cd227613
Stabilitity improvements to GATK run report system. R code is now robust. XML parser uses the C backend in python, 10x faster. Added shell script that runs the daily reports, and linked the /humgen/ runme.csh to this script. Script now emails the group the daily PDFs to gsamembers
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4845 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 14:56:12 +00:00
chartl
5a27d231fa
Rename it so that nobody else falls into the trap laid out (the test is VariantToTable, the walker is Variant[s]ToTable)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4844 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 11:43:00 +00:00
chartl
5e27e9162f
Huh? I thought we parsed out comma-separated command line arguments into list automatically...just change the syntax of the integration test, no need to update the md5
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4843 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 11:40:27 +00:00
chartl
3e75431bc8
Thanks to mark: VCFInfoToTable removed in favor of a more flexible walker. Slight change to the argument structure of the walker to make it play more nicely with Queue: the field list parsing is pushed into the command line system (e.g. the variable is exposed as a List<String> and not a String, so Queue doesn't have to join a list into a string only to have it broken out again. This also allows the user to specify -F field1 -F field2 -F field3 if he/she so desires.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4842 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 03:33:36 +00:00
chartl
2217837845
Commit for Khalid -- should be a scala version of vcf2table but for some reason the run method isn't getting called.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4841 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 00:44:15 +00:00
kshakir
01323447c6
Removed LibBat.SUB2_BSUB_BLOCK since the use of it exits the JVM.
...
Fixed integration tests to wait on their own for the job to run instead of using SUB2_BSUB_BLOCK.
Updated VariantRecalibrationIntegrationTests MD5s which were knocked out of sync whele SUB2_BSUB_BLOCK was exiting in the middle of integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4840 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 19:57:20 +00:00
hanna
67c07d1a6a
Fixed recently introduced multiplexer issue where DoC couldn't be written
...
directly to command-line.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4839 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 19:35:15 +00:00
fromer
c167c6f9eb
Calculate the phasing probabilities for particular intra-het distances
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4838 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 18:44:59 +00:00
hanna
526ae92093
Getting back to '-L unmapped':
...
- basic unit tests for interval sorting and merging with mix of mapped/unmapped.
- validation to ensure that locus walkers (really all non-read walkers) blow up with a user error when -L unmapped is specified.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4837 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 18:24:18 +00:00
fromer
4dbdf7a13d
Added ability to sample from intra-het distance distribution
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4836 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 18:09:03 +00:00
ebanks
afd4655674
Use @Output instead of @Argument. As a side note, Chris I'm ready for this nightmare to go away...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4835 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 17:13:15 +00:00
ebanks
cf7d932a17
Fix for f***ed up BWA alignments that adhere to SAM specs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4834 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 17:12:25 +00:00
kshakir
d550fdfd60
Disabling integration test to see if this restores the full test suite.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4833 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 15:27:02 +00:00
fromer
4403b9d276
Added probability bound on phasing paths, which slightly speeds up calculations. It seems that a real speed-up can only be achieved by considering fewer paths by doing some form of caching of sub-problems (e.g., dynamic programming or matrix multiplication, as Mark suggested)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4832 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 23:53:56 +00:00
chartl
f36861eeee
One more little bfix -- the issue was not the grep command, but instead the NFS in the awk; i changed it to ++count in the last commit which was really responsible for the fix. Then this ultra-escaping semi-broke teh grep again.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4831 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 20:36:14 +00:00
delangel
a5008faca8
Bug fix: when getting variant contexts at a site, we need to get only variants that start at current location, otherwise we get duplicated records when filtering indels.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4830 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 19:23:10 +00:00
chartl
d34c5640d2
Bugfix for clf version of extract samples. Due to dynamic shell creation and bsubs and whatnot, the OR pipe for grep ("a|b") needs to be super-escaped ("a\\\\\\\\|b").
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4829 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 19:06:30 +00:00
delangel
17db2e0e24
(forgot I hadn't committed this) - refactored IndelStatistics module and added a new inner class to compute Indel classification along with other statistics. So, we now get an extra table specifying, per sample, counts of whether indels are:
...
- Repeat Expansions
- Novel sequence
And for indels of size <=2 we get a per-mononuc. or dinuc. breakdown of novels and expansions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4828 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 17:43:43 +00:00
chartl
f795b25c47
In-process versions of sample extraction and interval-list conversion for VCF files. Required an in-process-function branch of the queue library.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4827 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 17:36:53 +00:00
depristo
e219f6a4b5
Q script to run VQSR on a whole variety of common data sets. To be used as a basis for general methods development pipeline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4826 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 16:55:52 +00:00
depristo
a6397ed8c3
Default R script now plots sensitivity/specificity curve
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4825 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 16:55:11 +00:00
chartl
7bc2049031
Updates and bug fixes to private mutations qscript and pipeline libraries. Hand filter strings are now not busted (boo to having to escape quotes); convenience method added to VariantCalling to propagate standard trait data to a given GATK command line -- should be made more scala-esque in the future.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4824 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 04:55:13 +00:00
chartl
cf75caf653
java changes:
...
VariantEvalWalker's logger is made public, so that variant eval modules can access it through the parent object.
DesignFileGenerator comment lists how best to bind things to it, and the feature accessor is better refined to grab the genome loc. (old change)
scala changes:
convenience addAll( List[CommandLineFunction] ) added to QScript class (and thus removed from the fCPV2)
useful command line functions added to a new library package for command line functions (these are fast simple VCF command lines)
bug fixed in ProjectManagement for the class where there's only one batch to be batch-merged (not really part of the use-case, but an edge-condition that came up during pipeline testing)
first draft of a private mutations pipeline which will be elaborated in future
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4823 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-12 05:10:45 +00:00
depristo
abd6ce1c77
A TiTv-free approach for cutting variants! Apparently much better than previous approach, and will work for indels and SV will truly minor modifications to the code. Will discuss with methods group on Monday.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4822 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-11 23:08:13 +00:00
chartl
81290d238d
Restructuring my qscripts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4821 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-11 20:58:45 +00:00
depristo
974aaa134d
Trival fix to broken build
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4820 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-11 13:56:03 +00:00
kshakir
895cb39f41
Thanks to Platform Computing tech support, found the magical environment variable BSUB_QUIET.
...
Minor refactoring to add more of the CLibrary including setenv().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4819 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 21:27:12 +00:00
depristo
5b46a900b3
Final version of BAQ calculation. default gap open is 1e-4, a good sensitive value. Useful timer class SimpleTimer added. BAQ is now live.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4818 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 19:35:12 +00:00
ebanks
491a599b59
Minor optimization
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4817 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 18:56:35 +00:00