For the second step, we perform clustering on the principal components to estimate GIA cluster membership for each individual in ATLAS. We use the K-nearest neighbors algorithm where we use the superpopulation name of the samples in 1000 Genomes to define the cluster labels. The superpopulations form 5 clusters: European, African, Admixed American, East Asian, and South Asian genetic ancestry.
For the second step, we perform clustering on the principal components to estimate subcontinental GIA cluster membership for each individual in the East Asian American GIA group in ATLAS. We use the-nearest neighbors algorithm where we use the population name of the East Asian ancestry samples in 1000 Genomes to define the cluster labels. The populations form 5 clusters: Han Chinese, Southern Han Chinese, Dai Chinese, Japanese, Kinh Vietnamese genetic ancestry.
For identity-by-descent calling, an interim version of the ATLAS data consisting of 24,318 individuals was used. First, ATLAS data was merged with the 1000 Genome Project [
Ethnic equity sounds like eugenics