Posts

PLINK PCA Tutorial: Running PCA in PLINK (Commands + Output)

In this post, I’ll demonstrate how to perform a PCA on a PLINK dataset. Before we begin, we need to prepare a subset of samples we’re interested in analyzing. To do this, we’ll extract sample information from the .fam file. But first, we need to identify the samples of interest. For example, those from a specific population such as Sardinians. The easiest way is to open the corresponding .ind file and look at the population column, which is the third column in each row. Open the file in a text editor, and search for the population name, in this case, Sardinian. ...

Converting EIGENSTRAT/PACKEDANCESTRYMAP to PACKEDPED

The files downloaded in the previous blog post are distributed as an EIGENSTRAT-style .geno/.snp/.ind dataset. This naming can be confusing: the .snp and .ind files are the usual EIGENSTRAT metadata files, but the .geno file may either be plain-text EIGENSTRAT or binary PACKEDANCESTRYMAP. PACKEDPED format allows for easier downstream processing using the PLINK toolset. With PLINK, it becomes straightforward to extract sample subsets, filter SNPs, and perform a wide range of analyses. ...

How to Download the AADR Dataset (Linux & WSL)

Note: This post uses an older AADR release and parts of it may now be outdated. For the latest AADR v66 download, including TGENO conversion and ADMIXTOOLS2 compatibility notes, see Downloading and Converting AADR v66. A Linux environment is unavoidable when it comes to bioinformatical data processing and preparation. You can use your favorite distribution. For Windows users, the Windows Subsystem for Linux (WSL) provides a good alternative to dual booting or setting up a full virtual machine. ...