Using GenStat with Microsoft Excel

GenStat contains a spreadsheet facility that can be used to view and manipulate data. The behaviour of a GenStat spreadsheet is not exactly the same as within other spreadsheet packages such as Excel. However, GenStat spreadsheets contains many features similar to Excel, plus some additional enhancements for ease of use.

Similarities between GenStat and Excel

GenStat provides a spreadsheet that lets you view and enter data cell by cell, with the ability to scroll around the cells, examining and changing the data. The same shortcut keys for cutting (Ctrl+C) and pasting (Ctrl+V) data from one range of cells work in both. Many of the other shortcut keys are in common between the two packages (Ctrl+A = Select all cells, Ctrl+O = Open File, Ctrl+P = Print, Ctrl+S = Save, Ctrl+Z = Undo change etc.). An example Excel workbook and the corresponding GenStat spreadsheet are displayed below:

The GenStat spreadsheet provides similar functionality to Excel, but also includes some additional enhancements for ease of use. All the normal cursor and selection keys should work as in Excel. The File and Edit menus have similar functionality in both applications. In GenStat all of the functionality for manipulating the data in a spreadsheet can be found in the Spread menu.

Within GenStat the rules governing selection of columns or cells can be customized by the user and this may result in differences from the standard Excel behaviour. If the cell or column selection does not operate in the way you expect then you can customize this using the Spreadsheet Column Options Tab. There are three options that essentially control the behaviour for selecting columns or cells: Select Columns with a single click, Keep Selected Columns, and Clear Selection on moving current cell.

Differences between Excel and GenStat spreadsheets

In Excel, the whole focus is on the workbook, and all results are displayed within the workbook. In GenStat the data is one aspect of the analysis, and so there are a range of other windows for output of results and graphs, and for specification of the analysis or editing of command scripts. Also data can be obtained from other sources that need not be displayed in spreadsheet format, but can be directly read into the statistical calculation engine.

Where is the GenStat Spreadsheet?

When GenStat start up, unlike Excel, you do not have an empty spreadsheet open. However, you can always create an empty spreadsheet on start-up by selecting the New Spreadsheet option on the Options - Workspace dialog. To create an empty spreadsheet you can either select the New item on the File menu or alternatively select Create from the New item on the Spread menu. Selecting wither of these menu items opens the File New - Spreadsheet dialog where you can specify the number of columns and rows to appear in the spreadsheet. If you want to import data stored within another package, you can select the Open item from the File menu, or transfer the data via the clipboard. GenStat has an additional facilities to help control how data are pasted to GenStat from the clipboard. To access these options when pasting from the clipboard select From Clipboard from the New item on the Spread menu (this opens the New Spreadsheet from Clipboard dialog).

Data Organisation

GenStat's spreadsheet is column based, in that all data on a particular measurement is normally stored in a single column. Each column has a particular type, either numerical (a variate), textual (a text) or categorical (a factor). You cannot mix text or numbers in a column. If you enter text in a numerical column you will get an error message, and if you enter a number on a text column, it will be treated as a piece of text. A factor has a limited set of values, known as levels, and any value entered into cell in a factor column must conform to one of the existing levels, or you will get a warning message. However, you can easily add a new item into the factor as a new level.

One of the main difficulties for Excel users is understanding the special FACTOR data type in GenStat, used to store group membership. If you have categorical columns in Excel (numerical or text), you can easily convert them to factors within GenStat by selecting Convert to in the Factor item on the Spread menu. See understanding factors within a spreadsheet for more details on factors.

There is a restriction that all structures in a spreadsheet must have the same number of rows. These are the same requirements for a table in a database. Thus while you can enter any data anywhere in Excel, GenStat requires that the data matches the type of the column that is used, and if you want to add new rows or columns, you are resizing the data structures, and need to specify type information for any new columns.

An Excel workbook is an unstructured set of cells containing numbers, text and formulae in any order, whereas GenStat spreadsheet displays a set of specific data structures with a particular size and attributes. The unstructured nature of Excel is both its strength and weakness as the organisation is only in the mind of the user, and not inherent in the spreadsheet.

Column Names in Excel and GenStat

Another result of GenStat's data structure orientation is that the column name is not just another standard cell that can be edited like all others as in Excel. To edit the structure name, click on the start of the cell name (the cursor should change to a pencil as you go to do this). The structure names also have the limitation that duplicate names cannot be used within the one spreadsheet, unlike Excel which allows this. If you attempt to enter a name already used, this will cause a warning to appear.

Size of the Spreadsheet

In general the GenStat spreadsheet's size is set to fit the required data for a given analysis. GenStat does not create the maximum sized sheet just in case you want to enter a large amount of data, as this can be inefficient. You need to specify the size of the spreadsheet you will need when you open it, and you can change this as below. When you save a file GenStat will remove empty rows and columns if the option Remove Empty Rows and Columns on Save is selected in the Spreadsheet File Options Tab dialog (opened by selecting the Spreadsheet Options item on the Tools menu).

The default size of an empty GenStat spreadsheet opened on startup is 10 columns by 100 rows. You can change the default size of a GenStat spreadsheet using the Options - Workspace dialog (opened by selecting Options on the Tools menu). If a new spreadsheet is opened using the File menu then you can specify the size of the spreadsheet within the File New - Spreadsheet dialog.

Each Excel spreadsheet is the same size, that is 256 columns by 65536 rows. Within GenStat a single GenStat spreadsheet can hold up to 16000 columns and an unlimited number of rows (provided you have enough memory on your computer). You can specify the maximum number of columns allowed within the File tab on the Spreadsheet Options.

What if my Excel Data is not Column based?

Sometime data in Excel may have been entered across rows or in several columns. GenStat provides several menus under the Manipulate item in the Spread menu to help you reorganise your data into the correct format for GenStat. Some of the facilities include the Stack Spreadsheet which can be used to stack multiple columns into a single column. The Transpose menu can be used to transpose rows and columns. In addition to these there is a Paste Special menu in the Edit menu that can be used reorganise data before pasting from the clipboard into a spreadsheets cells.

Multiple Pages

An Excel workbook can have up to 256 pages. A GenStat spreadsheet at present can only have a single spreadsheet. However, you can have multiple GenStat spreadsheets open in one session.

Spreadsheet Types

Excel has three workbook page types: data, chart and Visual Basic script. GenStat spreadsheets do not contain graphics. However, they can contain an analysis script of GenStat commands that can be entered and run using the Spreadsheet Analysis menu. GenStat has six types of data spreadsheets, corresponding to the range of data structures found in GenStat. Apart from the standard variate/text/factor spreadsheets, there are three matrix types (rectangular, symmetric and diagonal), a table type that holds cross-tabulations (like the Excel pivot tables) and a scalar spreadsheet that holds single valued constants.

Calculations

Within GenStat calculations are performed on whole columns rather than cell by cell. As with Excel, you enter a formula to perform the calculation. To perform a calculation on a GenStat column use the Calculations menu.

GenStat's calculations are not dynamic like Excel's but are performed only on the users request. So, if you change any of the values within cells and want to recalculate dependent cell values, you need to use the recalculate menu to redo the calculations.

GenStat maintains two copies of the data, one visible in a spreadsheet, and the other hidden in the statistical calculation engine (the server). The server data names and types can be displayed at any time using the Data Display menu. By default the two copies of the data are synchronized whenever the user moves focus off the spreadsheet. You can change to manually update GenStat from a spreadsheet by turning off the Auto update GenStat option within the spreadsheet option. To manually synchronize the server data with the spreadsheet select Changed data to GenStat from the the Update item on the Spread menu, or alternatively press the F10 key. This feature allows you to have multiple sheets with data structures having the same name, whilst retaining only the copy of the latest updated spreadsheet within the server. Also this allows reorganisation of the data in the spreadsheet without disturbing on-going calculations in the server (which can run concurrently with interface operations).

Dates

In Excel, dates are stored in a cell as the number of days since the 1st January 1900. A date format is applied to this number, often automatically if it has been initially typed in a date format (e.g dd/mm/yyyy).

GenStat stores dates as the number of days from either 1st January 1900 or 1st March 1600. A date format can also be applied to a complete column (not cell by cell as in Excel). Date formats are not applied automatically in GenStat, but must manually be applied using the Date Formats dialog (opened from the Spread>Column>Edit Attributes menu, selecting a format type of Date and then clicking the Date Format button.

If Dates are copied from Excel to GenStat via the clipboard, they are transferred just as a piece of text, resulting in a text column. This can be converted to a numerical value displayed as date using the Spread>Column>Convert menu which opens the Date Formats dialog. In this dialog, select the new column type Variate and select the Read Text as Date option, and use the Date Format button to specify the particular date format used by Excel.

If a column name in Excel ends with the text :D, this marks the column as a date column, and it will be automatically interpreted as a date whether it comes via the clipboard or is read in from the Excel file. Similarly :T marks the column as a time column, and it will be interpreted as a time.

Saving Results

Generally the output of statistical calculations is displayed within the Output window or within Graphics windows. However, you can save results of statistical analyses to a spreadsheet using the Save buttons on the Stats menus. The Display as Spreadsheet option on the save dialogs will create spreadsheets containing the results.

Enhancements over Excel

Some of the enhancements are that columns and rows can be moved by clicking on the column or name row and dragging it into a new position. Also if the column name is dragged off the title row, GenStat will offer to delete the column for you.

GenStat Cut/Copy and Paste facility is more general than Excel's, and if the copied data does not match the Paste selection, the data on the clipboard will be recycled to fill out the selection.

GenStat has a number of ways of reorganising data in the Spread>Manipulate and Spread>Calculate menus that are not available in Excel.

Moving data between Excel and GenStat spreadsheets

The Excel Spreadsheet Wizard allows data to be read in from an Excel file. Note that if the file is currently open in Excel it should be saved to make sure that the file on the disk is up to date. Data can also be read from the clipboard using the Spread>New>from Clipboard menu, or may be pasted into an existing GenStat spreadsheet. See the page on Transfer data between Windows and Spreadsheets for more information on this.

Tables

Excel has pivot tables (under Data>Pivot Table), where a range of data is selected, and columns are selected to provide cross-tabulation groups and other columns selected for data summaries (averages, counts etc.). The resulting table is embedded into a sheet in the workbook.

GenStat has menus for creating tables. The Frequency Tables menu produces tables of counts, and the Summary of Groups menu produces cross-tabulations of the usual data summaries for selected variate columns.

The results from GenStat are shown in a special Table spreadsheet. The layout of the table may be changed using the Spread>Manipulate>Reorder Table menu or by dragging the columns into new positions. The Spread>Restrict/Filter menus may also be used to subset the rows displayed in the table. The Copy Special menus may be used to copy the table to Rich Text Format (RTF Table) or to Excel (Tabbed Text) via the clipboard. See Frequency Tables or Summaries of Groups (Tabulation) for more information.

Filters

Excel has menu items to filter the rows displayed in a spreadsheet. These are found under the Restrict/Filter item of the Spread menu. GenStat provides many facilities for filtering the rows of a spreadsheet. In GenStat terminology, a filter is also known as a restriction as it also restricts the rows used in any analysis, whereas a filter in Excel does not effect the cell calculations. Also in GenStat the filtered rows can be displayed in another colour using the Spread>Restrict/Filter>Display Restricted Rows menu item. See the Spreadsheet Restrict/Filter Menu page for more details.

Graphics

Graphs in Excel are created with the Graphics Wizard, and then either embedded in a data page, or added in separate a chart page in the workbook. These graphs will dynamically update if the data is changed.

GenStat has a Graphics Wizard, which takes you through the steps of creating a graph. However the resulting graph is not dynamic, and it is not stored in the spreadsheet. Graphs within GenStat are displayed in a separate graphics window. The graph can be saved to a GMF (GenStat graphics file) and the display attributes (fonts, colours, symbols, line styles, etc.) can be edited by clicking on the graph or using the various edit options under the Tools menu in the GenStat Graphics Viewer.

See Also