Thursday, February 10, 2011

How to convert file with comma separator (CSV) to arff

My condition data file with comma separator that will be used with Weka program, one file is data another file is attributes name. Even java source code for this job was available at https://list.scms.waikato.ac.nz/pipermail/wekalist/2008-October/014671.html but sometime I still wanna coding sine my computer that I used didn't install java sdk. So, here is my step.

First, let's create header for arff using Notepad++.
Replace current attribute name to weka format for example attributes name in text file is

--------------attributeNames.txt------------------
0,1,2,3,4,5,6,7,8,9,10,11,12
a1: -1,1
a2: -1,1
----------------------------------------------------

but arff format should be something like

--------------data.arff----------------------------
@relation mydataset

@attribute a1 {-1,1}
@attribute a2 {-1,1}

@attribute class {0,1,2,3,4,5,6,7,8,9,10,11,12}

@data
1,-1,2
-1,1,10
----------------------------------------------------

1. Add line @relation follow dataset name
2. Replace "a" with @attribute a
3. Replace ": -1,1"
4. Copy the first line as it is class label to the last line of attribute and put @attribute in front of its.
5. Add line @data

After that we have to added this arff header to the data file there quite tricky step (I describe in term of using Ms windows)
1. Open data.txt in "notepad"
2. Copy the header that we just finished and paste in front of the data file
3. Save file in arff extension
Test with weka if it work .. let's use it.
But if not, weka might display error dialog about incorrect header at some token.
4. Reopen your data.arff with "notepad".
5. Copy the newline character that you might not seen but you can select by using keyboard cursor move between @relation mydataset and @attribute and paste behind @data
It should be work right now. ^ ^