Population Genetics Notes: How to run Structure, Structure Harvester, CLUMPP, and Distruct

Wednesday, July 24, 2013

How to run Structure, Structure Harvester, CLUMPP, and Distruct

How to run Structure :

1. Format your data so it similarly looks like the one from the example files (I even used that example file to create my file by deleting the allele one by one because Structure just didn't want to load my file although I have formatted it correctly. Note: I use Mac OS X v 10.7.5)

2. Open Structure; create a directory where you want your Structure files will be put (for example: My Doc/Structure)

3. Open your data file

4. Create a parameter set by click on menu: Parameter Set, New...

5. Fill in the length of burnin period (100.000 is usually more than enough, for more info please consult Structure manual by Pritchard et al.), and number of MCMC reps that you want (people use variable reps for this, from 100.000 to 2.000.000 or even more).

6.I usually follow the instruction to use admixture model for first run, leave the others as default.

7. Allele frequencies correlated.

8. Compute probability of the data (for estimating K)

9. Click ok and name your parameter

10. On menu file: Project, start a job, click on the parameter that you want to run, specify how many k you want to test for (k=1 to k=the max number of cluster you initially thought your data might possibly have)

11. Number of iterations: 5-10 or more

12. Click ok and let structure runs.

13. After complete, you can continue with Structure Harvester.

How to run Structure Harvester:

1. Go to the Results folder from your Structure results (for example: My Doc/Structure/ParameterName/Results)

2. Zip the Results folder. (Results.zip)

3. Upload that to Structure Harvester http://taylor0.biology.ucla.edu/structureHarvester/

4. Harvest!

5. Download the harvester output files.

How to run CLUMPP:

1. Based on Evanno et al. (2005) delta K formulation, you can identify your k.

2. Based on that k (for example k=2), take the specific k file for the indfile (then you will take K2.indfile; if your k is 5, take K5.indfile), move it to a new folder called (for example) FolderA

3. Edit the paramfile from the example files downloaded with CLUMPP package,
Datatype 0, revise everything else accordingly (for example change the indfile name to K2.indfile and so on). Move/copy this file to FolderA

4. Copy/move also CLUMPP into your FolderA so in your folder, there are: CLUMPP, k2.indfile and the edited paramfile

5. Open Terminal, change the directory to where your FolderA is located

6. then type ./clumpp paramfile

7. CLUMPP will produce several files in the folder. Take the output file (for example arabid.outfile) and change it into arabid.indivq

8. Repeat number 2-7 for K2.popfile by changing the paramfile to Datatype 1 and put those 3 files to FolderB; take the output file (for example: arabid.outfile) change the name to arabid.popq

How to run Distruct:

1. Take your indvq and popq files (for example: arabid.indivq and arabid.popq) and move it to a Folder C

2. Edit the drawparams file from the Distruct package accordingly. Create your arabid.names and arabid.perm files

3. Put those 5 files (at least): arabid.indivq, arabid.popq, arabid.names, arabid.perm files and drawparams in Folder C with distruct in it.

4. Run distruct (for me, distruct does not work in my Mac, so I have to use PC to run it. i just click on the windows executable file and distruct produce the output.ps file.

Yay!

22 comments:

UnknownMarch 13, 2014 at 12:55 PM
This was helpful, I found the solution to the mac problem here:
https://groups.google.com/forum/#!topic/structure-software/Im5DUNA-yN4

Hope that helps people!

Dave
ReplyDelete
Replies
UnknownJune 8, 2014 at 4:47 PM
Hi There,

Thank you for the instructions. Really helpful. But I have a problem. I applied the instructions, and I am at step: "How to run CLUMPP:step 8." But I do not see a popfile.
Do you also click "use sampling location as prior" during Structure run to get his popfile? Or how can I obtain it?
I'de be glad if you could help me in this issue.

Thank you.
ReplyDelete
Replies
erinJune 18, 2014 at 9:30 AM
The popfile comes from Structure Harvester
ReplyDelete
Replies
UnknownJuly 16, 2014 at 6:46 PM
Hi,

I tried running Structure 2.3.4 on a mac and the program did not save any files under the Results folder. I tried quitting and reopening again (following a post on google groups) but no matter how many times I tried this, it never worked. Then I tried running the same data on a PC and this time Structure saved files into the Results folder. I wonder if other mac users encounter this problem and how can this be solved.

Thanks

Cecilia
ReplyDelete
Replies
UnknownJuly 21, 2014 at 9:50 PM
Thanks Angelica, it worked!
ReplyDelete
Replies
SpencerSeptember 8, 2014 at 6:10 AM
I have a dataset that has no pop info beyond individual names. This being said I get no popfile from STRUCTURE HARVESTER. IS there a way to still do this without a popfile?
ReplyDelete
Replies
UnknownJune 1, 2015 at 11:19 PM
It is a very useful webpage. But I find some problem with step 8 for CLUMPP. I compared the outfile from running popfile with the example data, it looks like CLUMPP didn';t do his job. Can anyone help? Couldn't contact the guy who write the software.
ReplyDelete
Replies
Jennifer MaeSeptember 11, 2015 at 10:29 AM
Helpful. However, I needed to edit the .indfile from Structure Harvester before it would work in CLUMPP (windows). There need to be parentheses around the third column (and note that there are blank lines between runs in the S.H. output .indfile - so make sure there aren't any parentheses in those lines if you're using a column editor).
ReplyDelete
Replies
UnknownMay 23, 2016 at 10:53 PM
This is so helpful, one question i would like to know where and how to get indivq for running distruct

Clementine
ReplyDelete
Replies
JessicaSeptember 12, 2016 at 8:50 AM
I am confused as to how to get the popfile from Structure harvester. I'm using the locprior model in Structure. What exactly needs to be active in Structure in order to get harvester to produce the popfile?

Jessica
ReplyDelete
Replies
LucíaSeptember 29, 2016 at 7:01 PM
This was really helpful, I had no problem following your steps so thank you!
ReplyDelete
Replies
UnknownOctober 27, 2016 at 12:01 AM
The STRUCTURE and CLUMPP section was incredibly helpful. But could you please give more information on running DISTRUCT? The windows version of the software opens, runs and disappears, I am not sure what happens to it
ReplyDelete
Replies
UnknownOctober 27, 2016 at 3:37 AM
The STRUCTURE and CLUMPP section was incredibly helpful. But could you please give more information on running DISTRUCT? The windows version of the software opens, runs and disappears, I am not sure what happens to it
ReplyDelete
Replies
UnknownNovember 11, 2016 at 11:18 AM
Has anyone used TESS followed by Clumpp and Distruct?
ReplyDelete
Replies
UnknownJune 12, 2017 at 2:59 AM
i have a problem in running the structure, i have 103 individuals, 138 loci, 5 populations, after filling i get this statement bad format source expect 139 data entries, line 1 138 data entries. where do i go wrong?
ReplyDelete
Replies
UnknownSeptember 26, 2017 at 2:50 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownSeptember 26, 2017 at 2:51 AM
Hi. I have a problem in doing distruct. The pane shows

Error could not open population Q-matrix file K8.popq

May I know why this thing happens? And how to solve this problem?

Fatanah
ReplyDelete
Replies
UnknownMay 23, 2019 at 12:00 AM
Hi! Structure-Harvester has been processing results for hours now; with no output other than alternating between the words "Thanks!" and "Connecting" on its tab. My data has 100000 burns, & 100000 iterations; and they are from an HPCC (High Performance Computer Cluster) generated through StrAuto. Could that be too much for the program and what's the solution? Thanks.
ReplyDelete
Replies
Dr. Anita RawatAugust 19, 2020 at 10:52 PM
Hi,
This web page is really useful. I am facing problem in running CLUMPP. the message is "unable to open paramfile. please check the name of paramfile" Can anyone help me regarding this message
ReplyDelete
Replies

Add comment