Commit Graph

146 Commits (7b452ea2b9ad2d2f3e8bbfcbfb0e818f0a87f18d)

Author SHA1 Message Date
depristo aaecc271e8 Misc changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5324 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-26 15:35:49 +00:00
depristo 356eb264ab Now says FNR, not FDR. We really need to clean up VQSR
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5243 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 12:28:09 +00:00
corin 9fc45e1234 Use the yaml as an arguemtn to get out squid numbers
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5186 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 18:32:05 +00:00
corin 027c91871f Commenting out something I meant to leave commented out
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5163 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 15:47:52 +00:00
depristo 393df46055 updates to handle only reporting on a specific SVN revision. Updated the R script to show the domain name of the runner, now that S3 logging is working
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5157 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-01 12:02:12 +00:00
corin 1753f9d864 "Properly handles multiplexed and non multiplex lane data from the picard database."
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5146 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-31 21:14:32 +00:00
kshakir 23578b7402 Pipeline tests will only start from scratch after "ant clean", making it faster to debug downstream issues when re-running "ant pipelinetest -Dpipeline.run=run".
Updated the FCP, the test, and the ADPR to handle an issue with the ADPR locating the yaml generated by the FCPTest.
Does not solve the ADPR error: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5126 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-29 19:44:03 +00:00
corin afeb0f63c3 Further, smarter modifications to R script for correctly accessing database data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5124 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-29 03:32:20 +00:00
corin 4f2882c546 Fixing the way that lanes are pulled into from the database so that multiplexed lanes are handled and older sequence data ids are properly handled.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5123 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-29 02:40:10 +00:00
corin 1d8412d652 Updating a few graphical parameters and making sure everything fits together
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5111 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 18:44:23 +00:00
corin 88ea60b864 Updates formatting and combines plots into the tearsheet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5109 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 17:45:23 +00:00
corin 73e2942c62 Reformated backdrop--removed the date
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5086 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-26 18:25:59 +00:00
corin b22f82d5dd Minor formatting udpates to deal with long bait names, multiple sequencer types, and date formatting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5072 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 19:02:40 +00:00
corin 1dcdebbc9e Updating the file path for proper inclusion of the background in the tearsheet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5056 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-22 19:15:33 +00:00
delangel a50d7f74fa Change to support plotting of indel quality as a function of covariates - for now, just call different R calling script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5053 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-22 14:09:23 +00:00
corin 22582960be Adding the backdrop to the current version of the tearsheet so it is always available
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5026 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:56:11 +00:00
corin dfcd45181a Some minor tweaks to the tearsheet generator that incorperate the gsalib more universally and create a more accurate output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5021 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-19 20:27:02 +00:00
corin 6b5474a00a This updates the script to produce a more tearsheet-like output for sample set statistics. Formatting will be updated for aesthetic improvements. There are also several database options that currently pull out misleading information because of changes in sequencing methodology that will be updated to show correct information. Eventually, plot formatting will be updated as well and additional informative plots will be added.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4988 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 15:59:06 +00:00
depristo 91824f478e FASTQ directory is gone
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4986 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 15:16:06 +00:00
depristo 3362f0c280 Private mutation simulator and analysis routines for EOMI paper
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4960 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 21:23:29 +00:00
kiran 3b76034d50 Namespace changes to avoid conflicts with other packages.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4950 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:33:26 +00:00
kiran ab143c82af Selects only the project requested via the Oracle command, rather than selecting everything and then subsetting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4949 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:32:32 +00:00
depristo 5583fa179b Now uses PNGs and a very high downsampling value to more clearly display the information
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4928 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 13:57:51 +00:00
depristo dba30c4118 Subsampling of points, for the case where we have enormous numbers of points
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4927 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 13:48:11 +00:00
depristo 54adbd2581 Added instantaneous rate plotting routine to performance log plotter
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4926 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 13:33:35 +00:00
depristo 5539c2d9f3 --performanceLog (-PF) X.dat argument now enabled. Writes out a table (R-friendly) of the performance of the GATK over time, exactly as a more detailed version of the INFO progress meter. R script for useful plotting of the performance of the GATK over time. Will be helpful for upcoming scalability testing and debugging of memory leaks and other incremental performance problems
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4921 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 23:34:21 +00:00
depristo 586d3f05d9 Better cumhist function
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4920 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 23:32:20 +00:00
depristo b024b85798 misc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4899 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-22 20:16:41 +00:00
fromer cc909602c7 Vectorize() pDirectlyPhaseHetPairAtDistanceUsingDepth; deal with minor precision issues
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4890 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-21 17:18:26 +00:00
fromer a829284d84 Return analyzed window sizes as well
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4884 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-20 17:00:17 +00:00
fromer 6310a524d9 Do not abort integration (over het-het distances) on errors, but warn about them
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4855 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 17:20:25 +00:00
fromer b1f0df0047 Handle case where read lengths are longer than fragment size
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4852 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 02:19:55 +00:00
fromer a5e1854b3a Forgot to pass correct parameters to calcPhasingProbsForWindowDistances()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4851 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 01:58:06 +00:00
fromer 2b0dc8625c Updated RBP theoretical model as per Mark's insights regarding the correct understanding of insert sizes being calculated post-hoc from the distance between read lengths. The correct way to think of it is: 1) There's a fragment of length F. 2) Each of it's two ends are read for L bases. 3) The insert size = i = F - 2 * L, after the fragment's assumed identity is determined by mapping the read mates to a reference sequence. Therefore, the external user-defined distribution is on the FRAGMENT SIZES
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4850 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 01:45:20 +00:00
fromer aea481ae01 Trivial bug fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4848 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 18:29:15 +00:00
depristo 46cd227613 Stabilitity improvements to GATK run report system. R code is now robust. XML parser uses the C backend in python, 10x faster. Added shell script that runs the daily reports, and linked the /humgen/ runme.csh to this script. Script now emails the group the daily PDFs to gsamembers
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4845 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 14:56:12 +00:00
fromer c167c6f9eb Calculate the phasing probabilities for particular intra-het distances
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4838 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 18:44:59 +00:00
fromer 4dbdf7a13d Added ability to sample from intra-het distance distribution
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4836 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 18:09:03 +00:00
fromer 4403b9d276 Added probability bound on phasing paths, which slightly speeds up calculations. It seems that a real speed-up can only be achieved by considering fewer paths by doing some form of caching of sub-problems (e.g., dynamic programming or matrix multiplication, as Mark suggested)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4832 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 23:53:56 +00:00
depristo a6397ed8c3 Default R script now plots sensitivity/specificity curve
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4825 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 16:55:11 +00:00
fromer 2bf4fc94f0 Try to use more sampling to get a "correct" estimate of multivariate integral
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4815 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 02:58:46 +00:00
fromer c64bf80b57 Added theoretical model of read-backed phasing (RBP)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4814 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 01:46:09 +00:00
delangel 2ac938fe4e 1)
Minor fixes to avoid crashes vs CG indel files:
- Add count for complex events, not just insertions and deletions
- Handle correctly cases of large indels falling out of bounds of histogram array: added a count of indels ouf of bounds and avoid exceptions.

2) Cosmetic fix for R script assessing UG calling performance: draw red y=x line on top of Simulated vs Estimated AC to get a better view of under/over-estimation of AC.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4758 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 21:08:25 +00:00
kiran 9cca14acc5 Changed VCF subsetting procedure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4742 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-28 00:46:29 +00:00
kiran ecd496cf51 Modifications to reflect changes to gsalib. Smarter about figuring out the names of the filtered parts of the callset.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4739 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-27 23:26:03 +00:00
kiran 247f33a553 Prefixed all the functions with gsa. in order to distinguish the methods from other possible methods of the same name in the namespace.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4738 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-27 23:24:42 +00:00
depristo 8768e1a240 Useful profiling tool that reads in a single rod and evalutes the time it takes to read the file by byte, by line, into pieces, just the sites of the vcf, and finally the full vcf. Emits a useful table for plotting with the associated R script that can be run like Rscript R/analyzeRodProfile.R table.txt table.pdf titleString
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4728 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 14:59:16 +00:00
kiran d2fc30d188 Added a debugging statement to plot.venn
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4718 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 01:19:31 +00:00
kiran d492eb94ad Actually subsets the resulting table now, like it was supposed to all along.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4696 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:18:23 +00:00
kiran 50dbbdb8ab Retrieves per-sample or per-lane metrics from the SQUID database and populates a dataframe with the results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4693 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 22:46:07 +00:00