WGS Extract WWW home
is a desktop tool for verifying, analyzing and manipulating your Personal 30x WGS test result. It can also be used with any human genome based BAM or CRAM file including WES and Y-only test results.
WGS Extract User Manual: v4 User Manual (Google Doc)
Latest Releases you can install on the supported platforms are:
Track | Version | Date | md5 hash signature |
---|---|---|---|
BETA v4 | 44.5 | 13 Jun 2024 | fbe59361caaf8cdb6f23df16a249c552 |
ALPHA v4 | 44.6 | 20 Jun 2024 | c3c6a283dec9fa0dce66a6210adfb04d |
Dev(eloper) v4+ | 44.9 | 30 Sep 2024 | f69b432ef8ebe2364b7c66283bc314bb |
These are just the installer scripts. You need to download, unpack and run the installer for your OS. See the Installation Section in the user manual for details about installing on your platform. See the v4 Release Notes in the installation directory for more information about the updates in the current release. You should periodically re-run the installer to update to the latest release in the track you select. There is information available on how to use hashes to verify the Installer you download.
- With MacOS Sonoma 14.5 and later, Apple MacOS regressed and turned off allowing unsigned apps to be downloaded and run from outside their store. You can no longer enable this for directly in settings == the “anywhere” option disappered from system settings / privacy & security / security / “allow applications from”. To re-enable this in Sonoma and allow apps like ours to run with a GUI click, you must open a terminal and use the command
sudo spctl --master-disable
. For older releases, the first time you run the app, follow the Ctrl-Click process from before as described in the manual.- With MacOS Sequoia 15.0 they have permanently removed the “anywhere” option completely. The only way to run non-app-store and non-apple “notarized” apps is, within 30 minutes of downloading and trying to open the app the first time (and clicking Done in the pop-up that comes when you try to run it), you must now navigate to systems settings / privacy & security and scroll all the way down to the Security section. You should see “Install_macos.command” was blocked to protect your Mac. with a button next to it that says “Open Anyway”. Once clicked there, you will then get the old “open anyway” pop-up to finally approve the program. You will also be required to enter your password in a follow-on pop-up to finally approve the action. You must do this each time you download and install an update as well. Even this option is expected to go away. So MacOS will no longer allow most bioinformatics tools to run on their platform; including this one.
- Ubuntu 24, MacOS 14 and MacOS 15 require release 44.6 or later.
This tool is geared toward the needs of genetic genealogy and Ancient DNA (aDNA) studies but can be helpful for those looking into health-releated uses of WGS tests. The personal, sub-$500, Direct-to-Consumer (DTC), 30x Whole Genome Sequence (WGS) tests are delivered with basic data files and reports. This tool serves to bridge the gap between the WGS data files delivered and the present day genetic genealogy community tools. Many health analysis sites accept the microarray and VCF files generated by this tool.
Still waiting for your WGS test results? Want to get started today? See the International Genome Sample Resource (1K Genome archive) for BAM or CRAM files that you can download and play with to learn the tool while waiting for your results.
This tool is designed to be a simple, push-button manipulation of WGS files from any source. It hides the installation and scripting of complex bioinformatic tools and automatically adapts based on the data within your files. For more control over your pipeline, either learn to use the underlying tools or seek out a Galaxy server (such as UseGalaxy).
Dante Labs, Nebula Genomics, Sequencing, and YSEQ are test results most commonly used with this tool. Full Genomes Corp, GeneDX, Sano Genetics and Veritas (historical) are other test providers whose output is processed here. These are all results from Illumina and MGI next generation sequencers (NGS). Results from Oxford Nanopore and PacBio HiFi CCS third generation, long-read sequencers can also be used; as can FamilyTreeDNA’s BigY output. (This is not an endorsement of any company or service; simply reporting what is commonly used with the tool.)
The tool acronym is WGSE and pronounced as “wig-see”. We encourage that use in conversation.
We encourage the use of the Facebook group Personal WGS for discussions on how to make use of your personal, sub-$500, DTC 30x WGS test results. Bugs, use cases and announcements about the Beta release tool happen there. As part of that groups’ Files section, you will find a number of useful companion documents and tool references. In particular, see Bioinformatics for Newbies. We also maintain a number of corrollary documents.
User issues, if not brought up in the Personal WGS Facebook group, should be raised in the local user issues section of this GitHub site. The issues section is preferred so code bugs, use limitations and suggested improvements can be tracked within the development project.
There is a separate Facebook group for WGSE Developers and Alpha testers where bleeding edge issues are discussed and tested before wider availability. Developer’s should visit the main GitHub WGS Extract Developers Code Repository as well. Development issues, Alpha code bugs and limitations should be raised in the development issues section so they are tracked till resolved in a release. If you want to take a stab at modifying and improving the code, then the manual contains many suggested improvements. If the latest release is not checked in to GitHub, simply download and install the latest DEVelopers release and start from there. Look in the Program folder for the Python source files.
There is a separate Reference Genome Library manager that can be run to check and update the library. The WGSE program will check and determine when it needs a genome and prompt you to install any missing file then. The tool now verifies checksums of all library downloads and final, installed versions. This along with installer and startup activites to check the latest versions of files (and possibly update them), are the only network access requirements to run the tool. Otherwise the tool is standalone. There is an uninstaller to unwind all the changes made by the installer. You should never locate your data files within the installation folder.
v4 entered Alpha on 1 April 2022 and was formally Beta released on 6th November 2022. v5 entered pre-Developer mode release on 10 March 2023 and had a first real release in July then Nov 2023. Old releases are documented in the historical release section. (v5’s release in Dev has been delayed.)
The tool home page is WGSE.bio. With the developers and delivery platform using WGSE.io (note the slight difference; they will cross reference each other). Currently, both point to this page located at https://WGSExtract.github.io/.
64 bit OS and processor platforms tested as part of the release process are:
The tool has the potential to be a simple install in a BioConda environment as it is mostly just a Python package. But a majority of our users are on Microsoft Windows 10/11 systems. Bioconda nor the bioinformatic tools are supported there. So we currently deliver the tool with our own installer everywhere and Windows executables. This may change going forward after we find a Windows package manager (pacman) to separately supply the bioinformatic tool ports we currently create. This is the only source of recent bioinformatic tool releases on a Windows system (that we are aware of). Docker packages are either not usable across all the platforms or too ineffecient for these large file needs. But could play a role in the future.
Some have downloaded the WGS Extract tool solely to gain access to the MS Windows native executables of the Bioinformatic Tools we include. You can use these Bioinformatic tools independent of the WGS Extract program. Look in the cygwin64/usr/local folder for the bioinformatic tools. Just add cygwin64/bin and cygwin64/usr/local/bin to your PATH to make the programs available from the Terminal command line of CMD, Powershell or BASH. If you do not wish to use the rest of the WGSE release, just delete everything except the cygwin64 folder and adjust your path for wherever you move the folder. For those using the new Msys2 release, it is the msys2/usr/bin and msys2/ucrt64/bin folders; respectively.
Since v4, this is a full, BASE environment of Cygwin64 that is captured as of the stated release date. The bioinformatic tools are compiled to this same version on that release date. So do not update the cygwin64 libraries else the bioinformatic tools will break. Msys2 is just a minimal set of tools to operate WGSE.
Windows 11 WSLG with Ubuntu Linux Desktop (not server) can be used to install and run WGS Extract. You have to tune WSLG parameters to get effective use of your disk space, CPU cores and memory under WSLG. This is rarely better than the native Windows executables. The Windows kernel cannot support some fundamental features from Unix / Linux that the bioinformatic tools rely on (e.g. memory mapped files). The WSL I/O performance for files outside its VMDK file space does make the tool slower in WSL than native Windows; in most cases. And will greatly expand your VHDK virtual disk (if you keep your files local) by .5 to 1 TB or larger. This space cannot generally be recovered.