Sunday, December 22, 2013

2.2 Preparing the dataset for usage with R

2.2 Preparing the dataset for usage with R

In this part, a more complex dataset is used as example, these files are Projects.xls/Projects.csv. It is recommended to convert the .xls-file to the .csv-file. In this part of the tutorial you can also see how this could be easily done. For these files it is as well important to check if it is suitable for analysis. This could be checked by taking the following criteria in consideration, which are explained in this part of the tutorial:

  • Accuracy;
  • Completeness;
  • Timeliness;
  • Consistency.
First the dataset has to be opened.


Opening an existing .csv-file in OpenOffice.org Calc

Wit OpenOffice.org Calc you can open .csv-files. Because this is a file that only consists of text, number and signs it is required to specify how the data should appear in a spreadsheet.

Opening a .csv-file with OpenOffice.org Calc:
  1. Select a .csv-file that you want to open with a right mouse click. Click on Open with.. and slect OpenOffice.org Calc;
  2. The window Text import appears, see Figure 15. In this window the section Options for seperators is important. In this section you can indicate what in this .csv-file has to be seen as seperator. If a .csv-file is saved with a comma you only select comma and tabulator (tabs) als seperators. In the example at the bottom of Figure 15 you can see how your spreadsheet is going to look like. In the example of Figure 15 the settings are selected in the right way. If this is the selected in the same way in your case, you click on OK in the window.
  3. The .csv-file is presented by OpenOffice.org Calc in the form of a spreadsheet.
Figure 15: Preparing the .csv-file in OpenOffice.org Calc for analysis with R
Figure 15: Preparing the .csv-file in OpenOffice.org Calc for analysis with R

Checking your datasets in OpenOffice.org Calc


When opening a dataset with OpenOffice.org Calc, you can format the cells by selecting it and clicking the right mouse click. This will open the window Format cells, see Figure 16

Figure 16: Formatting cells in OpenOffice.org Calc
Figure 16: Formatting cells in OpenOffice.org Calc

It might appear that you have decimals in your dataset. Make sure you select the right settings in determining decimals and seperators. Decimals and seperators are not supposed to be assigned to the right symbol. 

It is recommended to format the cells as number when analyzing with R.

To the next step: Importing and making a matrix of the dataset

No comments:

Post a Comment