SmartPCA Tutorial: How to Run PCA on Genetic Data (EIGENSOFT)

This post is a continuation of the previous one, where I demonstrated how to perform PCA with PLINK. While PLINK’s PCA is great for quick, exploratory analysis, smartpca (part of the EIGENSOFT toolset) is more commonly used in published genetic studies. Smartpca needs to be compiled on Linux or macOS. I covered how to install and prepare the toolchain on Linux in this earlier post: From EIGENSTRAT to PACKEDPED. As before, I’ll use a small subset. The focus here is on the technical process, not on interpreting the results. One key difference in this post is that I’ll perform Linkage Disequilibrium (LD) pruning, which helps reduce SNP redundancy and improves the detection of population structure in PCA. ...

July 30, 2025

PLINK PCA Tutorial: Running PCA in PLINK (Commands + Output)

In this post, I’ll demonstrate how to perform a PCA on a PLINK dataset. Before we begin, we need to prepare a subset of samples we’re interested in analyzing. To do this, we’ll extract sample information from the .fam file. But first, we need to identify the samples of interest. For example, those from a specific population such as Sardinians. The easiest way is to open the corresponding .ind file and look at the population column, which is the third column in each row. Open the file in a text editor, and search for the population name, in this case, Sardinian. ...

July 29, 2025