Wednesday, July 24, 2013

How to run Structure, Structure Harvester, CLUMPP, and Distruct

How to run Structure :

1. Format your data so it similarly looks like the one from the example files (I even used that example file to create my file by deleting the allele one by one because Structure just didn't want to load my file although I have formatted it correctly. Note: I use Mac OS X v 10.7.5)

2. Open Structure; create a directory where you want your Structure files will be put (for example: My Doc/Structure)

3. Open your data file

4. Create a parameter set by click on menu: Parameter Set, New...

5. Fill in the length of burnin period (100.000 is usually more than enough, for more info please consult Structure manual by Pritchard et al.), and number of MCMC reps that you want (people use variable reps for this, from 100.000 to 2.000.000 or even more).

6.I usually follow the instruction to use admixture model for first run, leave the others as default.

7. Allele frequencies correlated.

8. Compute probability of the data (for estimating K)

9. Click ok and name your parameter

10. On menu file: Project, start a job, click on the parameter that you want to run, specify how many k you want to test for (k=1 to k=the max number of cluster you initially thought your data might possibly have)

11. Number of iterations: 5-10 or more

12. Click ok and let structure runs.

13. After complete, you can continue with Structure Harvester.


How to run Structure Harvester:

1. Go to the Results folder from your Structure results (for example: My Doc/Structure/ParameterName/Results)

2. Zip the Results folder. (Results.zip)

3. Upload that to Structure Harvester http://taylor0.biology.ucla.edu/structureHarvester/

4. Harvest!

5. Download the harvester output files.


How to run CLUMPP:

1. Based on Evanno et al. (2005) delta K formulation, you can identify your k.

2. Based on that k (for example k=2), take the specific k file for the indfile (then you will take K2.indfile; if your k is 5, take K5.indfile), move it to a new folder called (for example) FolderA

3. Edit the paramfile from the example files downloaded with CLUMPP package,
Datatype 0, revise everything else accordingly (for example change the indfile name to K2.indfile and so on). Move/copy this file to FolderA

4. Copy/move also CLUMPP into your FolderA so in your folder, there are: CLUMPP, k2.indfile and the edited paramfile

5. Open Terminal, change the directory to where your FolderA is located

6. then type ./clumpp paramfile

7. CLUMPP will produce several files in the folder. Take the output file (for example arabid.outfile) and change it into arabid.indivq

8. Repeat number 2-7 for K2.popfile by changing the paramfile to Datatype 1 and put those 3 files to FolderB; take the output file (for example: arabid.outfile) change the name to arabid.popq


How to run Distruct:

1. Take your indvq and popq files (for example: arabid.indivq and arabid.popq) and move it to a Folder C

2. Edit the drawparams file from the Distruct package accordingly. Create your arabid.names and arabid.perm files

3. Put those 5 files (at least): arabid.indivq, arabid.popq, arabid.names, arabid.perm files and drawparams in Folder C with distruct in it.

4. Run distruct (for me, distruct does not work in my Mac, so I have to use PC to run it. i just click on the windows executable file and distruct produce the output.ps file.

Yay!



22 comments:

  1. This was helpful, I found the solution to the mac problem here:
    https://groups.google.com/forum/#!topic/structure-software/Im5DUNA-yN4

    Hope that helps people!

    Dave

    ReplyDelete
  2. Hi There,

    Thank you for the instructions. Really helpful. But I have a problem. I applied the instructions, and I am at step: "How to run CLUMPP:step 8." But I do not see a popfile.
    Do you also click "use sampling location as prior" during Structure run to get his popfile? Or how can I obtain it?
    I'de be glad if you could help me in this issue.

    Thank you.

    ReplyDelete
  3. The popfile comes from Structure Harvester

    ReplyDelete
  4. Hi,

    I tried running Structure 2.3.4 on a mac and the program did not save any files under the Results folder. I tried quitting and reopening again (following a post on google groups) but no matter how many times I tried this, it never worked. Then I tried running the same data on a PC and this time Structure saved files into the Results folder. I wonder if other mac users encounter this problem and how can this be solved.

    Thanks

    Cecilia

    ReplyDelete
    Replies
    1. try upgrading your JAVA to the latest version, that solved my problem

      Delete
  5. I have a dataset that has no pop info beyond individual names. This being said I get no popfile from STRUCTURE HARVESTER. IS there a way to still do this without a popfile?

    ReplyDelete
    Replies
    1. What if you cluster them in only one population during structure analysis like pop1? Then it should be fine.You need to keep a priori population information option active to get pop file. I hope that what I say is clear.

      Delete
  6. It is a very useful webpage. But I find some problem with step 8 for CLUMPP. I compared the outfile from running popfile with the example data, it looks like CLUMPP didn';t do his job. Can anyone help? Couldn't contact the guy who write the software.

    ReplyDelete
  7. Helpful. However, I needed to edit the .indfile from Structure Harvester before it would work in CLUMPP (windows). There need to be parentheses around the third column (and note that there are blank lines between runs in the S.H. output .indfile - so make sure there aren't any parentheses in those lines if you're using a column editor).

    ReplyDelete
  8. This is so helpful, one question i would like to know where and how to get indivq for running distruct

    Clementine

    ReplyDelete
  9. I am confused as to how to get the popfile from Structure harvester. I'm using the locprior model in Structure. What exactly needs to be active in Structure in order to get harvester to produce the popfile?

    Jessica

    ReplyDelete
  10. This was really helpful, I had no problem following your steps so thank you!

    ReplyDelete
  11. The STRUCTURE and CLUMPP section was incredibly helpful. But could you please give more information on running DISTRUCT? The windows version of the software opens, runs and disappears, I am not sure what happens to it

    ReplyDelete
  12. The STRUCTURE and CLUMPP section was incredibly helpful. But could you please give more information on running DISTRUCT? The windows version of the software opens, runs and disappears, I am not sure what happens to it

    ReplyDelete
  13. Has anyone used TESS followed by Clumpp and Distruct?

    ReplyDelete
  14. i have a problem in running the structure, i have 103 individuals, 138 loci, 5 populations, after filling i get this statement bad format source expect 139 data entries, line 1 138 data entries. where do i go wrong?

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. Hi. I have a problem in doing distruct. The pane shows

    Error could not open population Q-matrix file K8.popq

    May I know why this thing happens? And how to solve this problem?

    Fatanah

    ReplyDelete
  17. Hi! Structure-Harvester has been processing results for hours now; with no output other than alternating between the words "Thanks!" and "Connecting" on its tab. My data has 100000 burns, & 100000 iterations; and they are from an HPCC (High Performance Computer Cluster) generated through StrAuto. Could that be too much for the program and what's the solution? Thanks.

    ReplyDelete
  18. Hi,
    This web page is really useful. I am facing problem in running CLUMPP. the message is "unable to open paramfile. please check the name of paramfile" Can anyone help me regarding this message

    ReplyDelete