WGS Extract WWW home

View My GitHub Profile

WGS Extract

is a desktop tool for verifying, analyzing and manipulating your Personal 30x WGS test result. It can also be used with any human genome based BAM or CRAM file including WES and Y-only test results.

WGS Extract User Manual: v4 User Manual (Google Doc)

Latest Releases you can install on the supported platforms are:

These are just the installer scripts. You need to download, unpack and run the installer for your OS. See the Installation Section in the user manual for details about installing on your platform. See the v4 Release Notes in the installation directory for more information about the updates in the current release. You should periodically re-run the installer to update to the latest release in the track you select.

NOTE: When re-running the installer to update, you will see program package error “44.2: syntax error: invalid arithmetic operator” when checking for the latest program package. No worry. When we next update the installer package, it will fix the error and allow the program package to update. We will not update the program package before the installer is fixed for this error.

This tool is geared toward the needs of genetic genealogy and Ancient DNA (aDNA) studies but can be helpful for those looking into health-releated uses of WGS tests. The personal, sub-$500, Direct-to-Consumer (DTC), 30x Whole Genome Sequence (WGS) tests are delivered with basic data files and reports. This tool serves to bridge the gap between the WGS data files delivered and the present day genetic genealogy community tools. Many health analysis sites accept the microarray and VCF files generated by this tool from your WGS test result.

Still waiting for your WGS test results? Want to get started today? See the International Genome Sample Resource (1K Genome archive) for BAM or CRAM files that you can download and play with to learn the tool while waiting for your results.

This tool is designed to be a simple, push-button manipulation of WGS files from any source. It hides the installation and scripting of complex bioinformatic tools and automatically adapts based on the data within your files. For more control over your pipeline, either learn to use the underlying tools directly or seek out a Galaxy server (such as UseGalaxy).

Dante Labs, Nebula Genomics, Sequencing, and ySeq are test results most commonly used with this tool. Full Genomes Corp, GeneDX, Sano Genetics and Veritas (historical) are other test providers whose output is processed here. These are all results from Illumina, MGI next generation sequencers. Results from Oxford Nanopore and PacBio HiFi CCS third generation long-read sequencers can also be used; as can FamilyTreeDNA’s BigY output. (This is not an endorsement of any company or service; simply reporting what is commonly used with the tool.)

The tool acronym is WGSE and pronounced as “wig-see”. We encourage that use in conversation.

We use the Facebook group Consumer WGS Testing for discussions on how to make use of your personal, sub-$500, DTC 30x WGS test results. Bugs, use cases and announcements about the Beta release tool happen there. As part of that Facebook groups’ Files section, you will find a number of useful companion documents and tool references. In particular, see Bioinformatics for Newbies. We also maintain a number of corrollary documents.

User issues, if not brought up in the before-mentioned Facebook group, should be raised in the local user issues section of this GitHub site. The issues section is preferred so code bugs, use limitations and suggested improvements can be tracked within the development project.

There is a separate Facebook group for Developers and Alpha testers where bleeding edge issues are discussed and tested before wider availability. Developer’s should visit the main GitHub WGS Extract Developers Code Repository as well. Development issues, code bugs and limitations should be raised in the development issues section so they are tracked till resolved in a release. If you want to take a stab at modifying and improving the code, then the manual contains many suggested improvements. If the latest release is not checked in to GitHub, simply download and install the latest DEVelopers release and start from there. Look in the Program folder for the Python source files.

There is a separate Reference Genome Library manager that can be run to check and update the library. Or you can simply wait for the program to determine when it needs something. It will prompt you to install the missing file then. The tool now verifies checksums of all library downloads and final, installed versions. This and startup activites to check the latest versions of files (and possibly update), are the only network access requirements to run the tool. Otherwise the tool is standalone. There is an uninstaller to unwind all the changes made by the program. You should never locate your data files within the installation folder.

We bring you v5 some 13 months after v4. v3 and Marko’s first 2 years v1 and v2 historical release are documented in the bistorical release section. v3 went into Alpha on the 18th June 2020 and was finally released as Beta on the 15th June 2021. v4 entered Alpha on 1 April 2022 and was formally Beta released on 6th November 2022. v5 entered Developer mode release on 10 March 2023 and had a first real release in July 2023.

This page is currently located at https://WGSExtract.github.io/ and serves as the WWW home for the tool. As the need develops, we will create our own Facebook Group for users to raise issues outside of the local User Issues Section already mentioned. The user interface / front will soon change to WGSE.bio and the developers and delivery platform to WGSE.io (note the slight difference; they will cross reference each other).

Supported Platforms

64 bit OS and processor platforms tested as part of the release process are:

We are experimenting with a more generalized Linux installer based on micromamba to support a wider range of Linux desktop release versions that already support the bioinformatic tool packages.

The tool has the potential to be a simple install in a BioConda environment as it is mostly just a Python package. But a majority of our users are on Microsoft Windows 10/11 systems. Bioconda nor the bioinformatic tools are supported there. So we currently deliver the tool with our own installer and Windows executables. This may change going forward after we find a Windows package manager to separately supply the bioinformatic tool ports we currently create. This is the only source of recent bioinformatic tool releases on a Windows system (that we are aware of). Docker packages are either not usable across all the platforms or too ineffecient for these large file and program needs. But could play a role in the future.


Windows Release Users

Some have downloaded the WGS Extract tool solely to gain access to the MS Windows native executables of the Bioinformatic Tools we include. You can use these Bioinformatic tools independent of the WGS Extract program. Look in the cygwin64/usr/local folder for the bioinformatic tools. Just add cygwin/bin and cygwin/usr/local/bin to your PATH to make the programs available from the command line of CMD, Powershell or the native BASH included there (not the very old BASH supplied with Windows).

In v4, this is a full, BASE environment of Cygwin64 that is captured as of the stated release date. The bioinformatic tools are compiled to this same version on that release date. So do not update the cygwin64 libraries else the bioinformatic tools may break.

Windows 11 WSLG with Ubuntu Linux Desktop (not server) can be used to install and run WGS Extract. You have to tune WSLG parameters to get effective use of your disk space, CPU cores and memory under WSLG. But this can sometimes be better than the native Windows executables. This occurs because the Windows kernel cannot support some fundamental features from Unix / Linux that the bioinformatic tools rely on (e.g. memory mapped files). The WSL I/O performance for files outside its VMDK file space does make the tool slower in WSL than native Windows; in most cases.

SHA256:   TBD
 *WGSExtract-Alphav35_31Jul2022_installer.zip (may not be same version available above)
MD5:      TBD
 *WGSExtract-Alphav35_31Jul2022_installer.zip  (may not be same version available above)

(we have not updated the hashes since spliting to multiple program packages. We should get to it soon. Only the installers that you download directly will need hashes that you can checl. The other packages will be checked by the installer at download time using a public / private key mechanism that only we can generate offline.) More information is available on using hashes to verify the download