WGS Extract WWW home

View My GitHub Profile

WGS Extract Version 3 Beta

is a desktop tool for verifying, analyzing and manipulating your Consumer 30x WGS test results. It can also be used with any human genome based BAM or CRAM file, including WES test results, but with a more limited benefit.

Current release is Beta v3 (10 Jul 2021) on many supported platforms:

NOTE: A manual patch for the file WGSExtractv3/program/mainwindow.py can be downloaded from here. Replace the the current release file with this downloaded one in the WGSExtractv3/program directory. Fixes a basic bug in the (Re)Align command that slipped through after being fixed during the regression testing release process This does not fix the issue with Samtools stalling while processing most MGI sequencer files (most recent Nebula Genomics results) or for the S Tk/Tcl Python library bug not allowing you to select two FASTQ files in the Align button. Those fixes will be posted in the next update.

NOTE: A simple fix to the Install_MacOS.command file will enable use on MacOS 12.x Monterrey. After lines 118-119 in that file, you simply paste in the code: 12) # Monterey is v12.x now. Minor versions are patches. @ 12.4 as of May 2022 / MACPORTSF="MacPorts-2.7.2-12-Monterey.pkg";; Lines 118-119 look like: 11) # BigSur is v11.x now. Minor versions are patches. @ 11.3 as of May 2021 / MACPORTSF="MacPorts-2.7.2-11-BigSur.pkg";; The new release should be out soon. (Replace the forward slash (/) with a newline / enter to create two lines.)

See the Release Notes and the Installation Section in the Manual (Google Doc) for how to determine your release version and upgrade if necessary. Also for important notes about installing specific to your platform. We have additional notes for enhanced Windows 10 / 11 installation below.

Still waiting for your WGS test results? Want to get started today? See the International Genome Sample Resource (1K Genome archive) for BAM or CRAM files that you can download and play with to learn the tool while waiting for your results.

This tool is geared toward the needs of genetic genealogy but may be helpful for those looking into health-releated uses of WGS tests. The sub-$500, Direct-to-Consumer (DTC), 30x Whole Genome Sequence (WGS) tests are delivered with basic data files and reports. This tool serves to bridge the gap between the WGS data files delivered and the present day genetic genealogy community tools. Many health analysis sites accept the microarray and VCF files generated by this tool as well.

This tool is designed to be a simple, push-button manipulation of WGS files from any source. It hides the complex installation and scripting of bioinformatic tools and automatically determines the needed parameters based on the data supplied it. For more control over your pipeline, either learn to use the underlying tools directly or seek a Galaxy server (such as UseGalaxy).

Dante Labs, Nebula Genomics, and ySeq are test results most commonly used with this tool. Full Genomes Corp, GeneDX, Sano Genetics, Sequencing and Veritas (historical) are other test providers whose output is processed here. These are all results from Illumina or MGI next generation sequencers. Results from Oxford Nanopore and PacBio HiFi third generation sequencers can also be used; as can FamilyTreeDNA’s BigY output. (This is not an endorsement of any company or service; simply reporting what is commonly used with the tool.)

We use the Facebook group Consumer WGS Testing for discussions on how to make use of your sub-$500, DTC 30x WGS test results. Bugs, use cases and announcements about this tool happen there. As part of that Facebook groups’ Files section, you will find a number of useful companion documents and tool references. In particular, start with Bioinformatics for Newbies.

User issues, if not brought up in the Facebook group, should be raised in the local user issues section of this site. The issues section is preferred so code bugs, use limitations and suggested improvements can be tracked within the development project.

The tool acronym is WGSE and pronounced as “wig-see”. We encourage that use in conversation.

Developer’s should visit the main GitHub WGS Extract Developers Code Repository. Development issues, code bugs and limitations should be raised in the development issues section so they are tracked till resolved in a release. The manual contains many suggested improvements if you want to take a stab at modifying and improving the code. If the latest release is not checked in there, simply download the latest release and start from there. Look in the Program folder for the Python source files.

We bring you v3 after 16 months of v2 and 6 months of v1 before that. The original, first 2 years v1 and v2 historical release from Marko is documented there. Especially key there is the microarray generator. v3 went into Alpha on the 18th June 2020 and was finally released as Beta on the 15th June 2021.

This page is located at https://WGSExtract.github.io/ and serves as the WWW home for the tool. As the need develops, we will create our own Facebook Group for users to raise issues outside of the local User Issues Section already mentioned.

Supported Platforms

Platforms tested as part of the release process are:

The tool has the potential to be a simple install in a BioConda environment as it is mostly just a Python package. But a majority of our users are on Microsoft Windows 10/11 systems. Bioconda nor the bioinformatic tools are supported there. So we currently deliver the tool with our own installer and Win10 executables when needed. This may change going forward after we find a Win10 package manager to supply the bioinformatic tool ports we currently provide. We fully test and use the Win10, Ubuntu Linux and Apple MaxOS versions; on Intel, AMD and Apple M1 (Arm) architectures[^Platforms]. This is the only source of the bioinformatic tools on a Win10 system (that we are aware of). Docker packages are either not usable across all the platforms or too ineffecient at the current time. But could play a role in the future.


Win10 Release Users

Some have downloaded the WGSE tool solely to gain access to the Win10 native executables of the Bioinformatic Tools that we make available. These are installed as part of the Win10 Installer package. You can download and use these Bioinformatic tools without installing the whole WGS Extract program; if you desire.

This is a more complete environment than provided by the WGS Extract installation. A ore full-featured CygWin64 install (still minimal by their process) using the same version DLLs as the included and compiled bioinformatic tools.

We install a less-than-minimal, but still usable Cygwin64 release during the WGS Extract tool install. The original files used by the installer are:

The first two above are used by the installer directly and in stages. It merges or overlays them on top of each other. We provide the merged form directly as the third link above; for your concenience. The CygWin64 environment installed with the WGS Extract tool is minimal to what the WGS Extract tool needs. The installer uses MS Onedrive links to the first two files above.

note: If you upgrade the Cygwin64 release provided in the above, it may load new DLL versions that are incompatible with the compiled bioinformatic tools. If you install Cygwin64 independently, it may load different version DLLs into memory that cause similar issues. Cygwin64 DLLs are not versioned in memory and different versions cannot coexist.

The CygWin64 tools can be slower on a Win10 platform than the Win10 WSL environment running Ubuntu Linux with Linux versions of the bioinformatic tools installed directly. Once WSLG becomes more complete and supported in Win11, we will likely avoid delivering Win10 executables all together and simply ask Windows users to install and use WSLG for running WGS Extract. At which time we can consider becoming a Bioconda package as well. As of Spring 2021, you can get WSLG with the Win11 pre-release as part of the Windows Insiders program. As of Fall 2021, WSL2 in the standard Win11 release includes WSLG. We do not provide support for the use of WSGE on WSLG at this time.

$ md5sum *.zip
c5ade89fa8aee97f0b2db376bdb8a169 *win10tools-1.12-full.zip
0898dba22b4d7c074a203e49a12f70ad *win10tools-1.12.zip
6a347e44667eb7320868cb7688b870c6 *win10tools-bioinfo1.12.zip
cf67ef5fe86db0837f71f7847b91db08 *win10tools-cygwin64.zip
$ shasum -a 256 *.zip
c9a3ec4e10154895acb5144320ecfd79558ac84d5850bed5dfd18faf8a2f08ed *win10tools-1.12-full.zip
f956dc197ad89ccd0ab1345abc89559f492059d26f20a444d7f1ddec31225557 *win10tools-1.12.zip
a160a7930e06b984b547a071cfa742d6538089e2fcc7ae1b1ac3644bc1ae6bdd *win10tools-bioinfo1.12.zip
0c72aec08091f3f1bd6fb5154d9b0e69b416cf9b0e8dcd145f558dbaf51ac45c *win10tools-cygwin64.zip