The files downloaded in the previous blog post are in EIGENSTRAT format. In this post, we’ll look at how to convert them to PACKEDPED format.

PACKEDPED format allows for easier downstream processing using the PLINK toolset. With PLINK, it becomes straightforward to extract sample subsets, filter SNPs, and perform a wide range of analyses.


I use PLINK 1.9. While there is a newer version (2.0), I prefer 1.9 because it includes several features that were deprecated or removed in the newer release.

Choose and download the binary suitable for your operating system.
If you’re on Linux or WSL, you can make the binary globally accessible like this:

# Make PLINK globally accessible
# Run this from the directory where you downloaded the binary
sudo cp plink /usr/local/bin/

# Test if PLINK works
plink

Compiling EIGENSOFT

Before we can use PLINK on the downloaded dataset, we have to convert it to PACKEDPED format. We will use the convertf tool from the EIGENSOFT toolset.

Install Dependencies

First we install the required dependencies:

sudo apt update
sudo apt install -y build-essential gfortran liblapack-dev liblapacke-dev libgsl-dev
sudo apt install -y libopenblas-dev

Download and Compile EIGENSOFT

Then clone the GitHub repository, enter the src directory, and compile:

# Install git first with: sudo apt install git

git clone https://github.com/DReichLab/EIG
cd EIG/src
LDLIBS="-llapacke" make

This will generate several necessary binaries. The most relevant ones are: convertf and mergeit in the src folder, and smartpca in the eigensrc subdirectory. You can confirm the files were compiled with:

ls
Make tools globally accessible

To use convertf and mergeit from anywhere, move them to a global location like /usr/local/bin/:

sudo cp convertf mergeit /usr/local/bin/

Accessing smartpca

If the build was successful, you’ll find smartpca in the eigensrc directory:

# Navigate into the eigensrc folder
cd ~/EIG/src/eigensrc

# Check files in folder
ls

Make it globally accessible:

sudo cp smartpca /usr/local/bin/

I will provide examples on using smartpca for principal component analysis (PCA) in another post. Since it was compiled alongside the other EIGENSOFT tools, I’m just mentioning it here for completeness.


Converting from EIGENSTRAT to PACKEDPED

Now we can convert the EIGENSTRAT dataset to PACKEDPED. Navigate into the folder or directory you store the dataset. Then with a text editor of your choice generate a file with the following content (adjust the input prefixes to those of your dataset):

genotypename:    v62.0_HO_public.geno 
snpname:         v62.0_HO_public.snp 
indivname:       v62.0_HO_public.ind 
outputformat:    PACKEDPED 
genotypeoutname: data.bed 
snpoutname:      data.bim 
indivoutname:    data.fam

Save the file with any name you like, I named mine simply parameter (no file extension). This file tells convertf what to do. Start the conversion with:

convertf -p parameter

The process may take a while, as the dataset is large.

Once complete, you’ll have a set of PLINK-compatible binary files: data.bed, data.bim, and data.fam, ready for downstream processing.