Personal technology and data analysis*

Barry Kissane

The Australian Institute of Education
Murdoch University, WA

It is a commonplace observation that we live in an information age. Indeed, phrases like the 'information super highway' and 'information technology' are frequently heard in everyday conversation. Although all parts of the school curriculum have a proper role in educating future citizens to deal appropriately with information, mathematics curricula have a special responsibility in this regard, convincingly argued by Jane Watson in her keynote address to the 1997 MANSW Conference.

Similarly, in a chapter entitled 'Uncertainty' in the wonderful book, On the shoulders of giants, (National Research Council, 1990) the statistician David Moore argued a case for statistics in its own right, as a distinctively different form of thinking:

Statistics has some claim to being a fundamental method of human inquiry, a general way of thinking that is more important than any of the specific facts or techniques that make up the discipline. If the purpose of education is to develop broad intellectual skills, statistics merits an essential place in teaching and learning. Education should introduce students to literary and historical methods; to the political and social analysis of human societies; to the probing of nature by experimental science; and to the power of abstraction and deduction in mathematics. Reasoning from uncertain empirical data is a similarly powerful and pervasive intellectual method.

Why teach about data and chance? Statistics and probability are useful in practice. Data analysis in particular helps the learning of basic mathematics. But, most important, it is because statistics is an independent and fundamental intellectual method that it deserves attention in the school curriculum. (pp 134-5)

Many aspects of the collection, organisation, summary, description and interpretation of data have long been addressed in the mathematics curriculum of most Australian students. For example, elements of probability and statistics have been a part of the mathematics syllabuses for all secondary students in Western Australia since the late 1960's, and part of primary syllabuses since the mid-1970's. (Inclusion of relevant material has been somewhat later in some other states, spectacularly so in New South Wales, in fact.) Hence, few were surprised that both A national statement on mathematics for Australian schools (Australian Education Council 1990) and the National Mathematics Profiles included an entire strand devoted to Chance and Data; indeed, we would have been astonished had there not been such a strand.

One aspect of Chance and Data, the focus of this paper, concerns data analysis, and particularly the relationship between data analysis and technology. The National Statement, under the subheading of 'data handling', noted the significance of technology to this part of the curriculum:

Technological changes have influenced the techniques of data collection, retrieval, manipulation, analysis and communication, and have increased our capacity to pursue investigations with large quantities of real and simulated data. The school curriculum should reflect these changes. (1990 p164)

Later, dealing specifically with the lower secondary curriculum, the National Statement provided an elaboration of this view:

Some of the questions which students will investigate will result in moderate to large data sets. It is often inappropriate and inefficient to deal with these manually, and computers or scientific calculators should then be used. It is appropriate to have students calculate summary statistics and draw graphs manually for some small data sets, in order to learn the techniques and so deal with situations where very little by way of data manipulation is required. For all other situations the emphasis should be on the use of calculators and computers so as to minimise the drudgery involved and enable students to concentrate on interpreting the data. Scientific calculators are an obvious tool for the calculation of summary statistics, but a computer will sometimes be more appropriate because of the graphics, editing and more extensive calculating capabilities. (1990 pp173-4)

Indeed, it is not merely a question of efficiency of data analysis that provokes the use of suitable technology, as David Moore also noted:

While the impact of fast, easily accessible computing has had an impact on mathematics as a whole, it has revolutionised the practice of statistics. An obvious effect of the revolution is that more complex analyses on larger data sets are now easy. But the computing revolution has also brought about changes in the nature of statistical practice. In the past, statisticians conducted straightforward but computationally tedious analyses based on a specific mathematical model in order to draw conclusions from data. Instruction in statistics showed a corresponding emphasis on learning to carry out lengthy calculations. Now the paradigm statistical analysis is a dialogue between model and data. ... All [methods] are computationally intensive, and the most widely adopted make heavy use of graphic display. ... Statisticians ... have welcomed calculators and computers as a liberating force. Calculating sums of squares by hand does not increase understanding; it merely numbs the mind. In these circumstances, it is natural for a statistician to urge the use of calculators and computers in instruction at all levels. (1990, pp. 99-100)

Appropriate technology

It is important to realise that, at the time of writing of the National Statement, mainly 1989, the personal technology of the graphics calculator was relatively rare in Australia, although it was rapidly becoming visible in North America. The available technology in schools for data analysis was restricted to the scientific calculator or the computer, each of which had associated limitations as far as school students were concerned.

In the case of the computer, although very many statistics packages were available, and, indeed, many computer integrated packages comprised databases and spreadsheets, each of which are quite useful for aspects of data analysis, a fundamental limitation concerned the availability of the hardware. There are many situations in schools where there is too little computer hardware to go around, and it is not available to students conveniently when and where they actually need it (which is not always a conveniently predictable moment). When computers are housed in computer laboratories, or are available only after queuing in a classroom, or when the teacher makes a special arrangement for a particular lesson, the practical difficulties of student access can readily become overwhelming. Even when the hardware is relatively accessible, problems of navigating one's way around the software (and the computer) as well as problems of data storage and retrieval also militate against the likelihood of students undertaking data analysis with suitable technological help.

As far as the scientific calculator is concerned, problems of access are very much reduced, since personal ownership in lower secondary school is almost universal in Australia these days. Unfortunately, however, what is easily accessible is extremely limited in capacity. Although essentially all scientific calculators contain some statistical capabilities, they are generally restricted to numerical computation of parametric statistics, such as means, variances and linear regression coefficients.

The more recent invention of the graphics calculator overcomes most of the severe limitations of the scientific calculator for data analysis, as discussed in some detail in Hackett & Kissane (1993). Many of the limitations are a consequence of the fact that a scientific calculator does not actually store numerical data (since it doesn't have enough memory to do so) and thus is restricted to storing summaries of the data. A graphics calculator, such as the Casio fx-7400G, allows students to store numerical data in lists, and thus to have access to the data to check (and correct) data entry, to transform data for various purposes, to retain data from day to day for further work later, to sort the data and, perhaps most importantly, to analyse the same data in a number of different ways. All of these are very significant advances on scientific calculators, even if the data analysis capabilities were still restricted to numerical summaries.

As its name suggests, a graphics calculator also allows a student to undertake graphical representations of data, in addition to, or even instead of, numerical representations. In the case of the Casio fx-7400G, for example, students can draw box and whisker plots or histograms of univariate data, scatter plots and various kinds of regression models for bivariate data, and line graphs with associated regression capabilities accessible for time series data. Details of these capabilities and some of their consequences are explored in Kissane (1997a), and some examples are given below.

Considering both numerical and graphical capabilities together, graphics calculators provide an extremely powerful suite of data analysis capabilities for students, yet at a price that is already well within the reach of the great majority of secondary school students. Indeed, although technology becomes much less personal when it is available only through the teacher, rather than individual students having their own, an investment in a class set of graphics calculators costs around the same amount as a single computer these days, even before any computer software is purchased; so the economic problem of delivering appropriate data analysis technology to all students is already overwhelmingly easier to solve with graphics calculators than with computers.

It is unfortunate that the first impressions of many mathematics teachers, and others, regarding graphics calculators are frequently restricted to graphing functions. Indeed, I have elsewhere suggested (Kissane 1997b) that this impression reflects one of the prevailing myths about graphics calculators. In contrast, some teachers at a recent conference in WA suggested to me that for many of their students who owned graphics calculators, the most significant and powerful use was in the area of data analysis. It is of interest that the new Year 9/10 syllabus documents for NSW hardly refer to the use of graphics calculators at all in relation to the Chance and Data strand, even though they are mentioned in several places in the context of dealing with functions and their graphs.

Two examples

Clearly, space restrictions mean that extensive examples of the capabilities of graphics calculators for data analysis cannot be provided here. The two examples below demonstrate some of the capabilities of the Casio fx-7400G. Interested readers are referred to Kissane (1977a) for a more extensive set of examples.

How tall are we?

One advantage of technological help for data analysis is that real data can be collected, and entered directly into the graphics calculator. In this case, we collected the reported heights (in cm) of some Conference workshop participants, noting at the same time the gender of each person. (In a school, the data might actually be obtained with measurement, rather than relying on self-reports, possibly less reliable.) The raw data are given below as they were collected. Although a very small data set, it will nonetheless suffice to illustrate some of the possibilities.

Females

169, 165, 153, 163, 153, 164, 170, 155, 165, 158, 166, 160

Males

180, 178, 177, 175, 165, 173, 160, 185, 172, 196, 178, 177, 168, 168, 170

Data are entered list-wise into the calculator, as shown below, with the female heights in List 1 and the men's heights in List 2. The calculator will accommodate data lists of up to 255 elements, so space is not a problem here. Data entered are easily checked by scrolling; any entry errors are easily corrected by retyping. Figure 1 shows screen dumps from the Casio fx-7400G, with the original data on the left. A natural first form of analysis of such data is to sort them from highest to lowest, for which there is an inbuilt calculator command. The results of using this command are shown in the right screen in Figure 1.

Figure 1: Female and male heights (cm), unsorted and sorted

It is clear from the sorted data that the men are generally taller than the women. A variety of univariate numerical summaries of the data are available with inbuilt calculator commands. Figure 2 shows some of these for the female data in List 1.

Figure 2: Some quantitative summaries of female height data

Both parametric (e.g., mean and standard deviation) and non-parametric (median, quartiles) statistics are provided by the calculator. As well as allowing comparisons between the two groups (e.g. mean and median), some of these, such as n and minX, provide convenient checks on data entry.

The most satisfying way to summarise data of these kinds uses visual methods, however. A convenient choice is a box and whisker plot, or box plot. Figure 3 shows a plot for each of the two data sets, using scales chosen automatically by the calculator.

Figure 3: Box plots for female (left) and male heights

Each box plot can be traced to give the five-number summary statistics (extremes, median and quartiles), as shown in Figure 3. The box plots suggest that each distribution is somewhat skewed, and the male group seems to include an outlier ­ an especially tall gentleman. However, the automatic choice of scales has made the comparison of the two groups more difficult than necessary. One alternative is to manually choose a suitable scale for the box plots, making sure that the same scale is chosen for each. Another alternative is to draw the two box plots together, again allowing the calculator to choose a suitable horizontal scale, as shown in Figure 4.

Figure 4: Box plots for female and male heights.

It is clear from Figure 4 that the group of men is considerably taller than the group of women. Indeed, the upper quartile for the women is less than the lower quartile for the men. This is not surprising for adults, and rather more interesting results may be expected for a similar analysis of lower secondary school children.

Figure 5: Histogram of male heights

Other analyses of the same data are possible. For example, Figure 5 shows a histogram of the male heights, grouped into 5 cm intervals. The histogram has been traced to show that five males have heights in the interval [175,180). Again, the outlier is clearly shown.

Understanding the weather

As well as collecting their own data, students can analyse data collected by others to answer questions of interest. A trip from Perth to Canberra inevitably provokes discussions about the weather, and so the second example refers to some relevant data, conveniently collated by Lovitt & Lowe (1993) in the superb MCTP Chance and Data kit. Long-term average maximum and minimum temperatures (oC) in Canberra and Perth are available at about 10-day intervals. For this example, only temperatures from August to December are involved, on the 5th, 15th and 25th of each month. The first date (August 5th) is coded as 1 and the last (December 25th) is coded as 15 for these time series data. The screens in Figure 6 show the long term average maximum and minimum August temperatures in Canberra (List 2 and List 3) and Perth (List 4 and List 5):

Figure 6: Average max and min temperatures in Canberra and Perth

Making sense of a mass of numerical data of this kind (two temperatures at each of fifteen dates at each of two cities) is certainly helped by some form of graphical representation. One possibility is a line graph, showing the progression of maximum temperatures over the period at each city, as shown In Figure 7. (Coming from Perth, this is perhaps the most natural first analysis, since it seems extraordinary that the maximum temperature somewhere could be as low as 12oC!) For ease of distinguishing, Canberra data have been graphed with small boxes at each data point. Data can be traced to check for entry errors. The vertical axis shows average temperatures between 0oC and 30oC for each graph.

Figure 7: Average maximum temperatures in Canberra and Perth

The warming as summer approaches seems more severe in Canberra's case. In fact, it is more convenient to see both graphs at once, as shown in Figure 8. Perhaps a little surprisingly, there appears to be some convergence of the two line graphs. Canberra seems to become surprisingly warm as summer approaches, at least through Perth eyes.

Figure 8: Comparison of Canberra and Perth maximum temperatures

What happens in the case of the minimum temperatures? Figure 9 suggests again a measure of convergence, with Canberra's very low average minimum temperatures in August becoming closer to those of Perth by December.

Figure 9: Comparison of Canberra and Perth minimum temperatures

These sorts of comparisons suggests that a useful form of data analysis may involve studying the difference between the average minimum and maximum temperatures at each place, which requires data transformations. Rather than calculate each of the differences by hand, the calculator can easily be used to do the transformations, as shown in Figure 10, in which the Canberra maximum temperatures are replaced by the (Perth maximum ­ Canberra maximum) temperatures at each date. The command involved (which does not quite fit on the screen dump) is the intuitive
List 4 ­ List 2.

Figure 10: Transforming data to find differences between city maximums

With space for only six data lists in the calculator, it is necessary to replace the Canberra maximums with the differences. The sixth data list, List 6, can be used to store the differences between the two city average minimums, leaving the Perth data intact, as shown in Figure 11. Of course, maximum Canberra temperatures can be restored later if desired by a further transformation.

Figure 11: Transforming data to find differences between city minimums

A comparison of the differences is shown in Figure 12, with the average minimum differences represented by the small squares and the average maximum differences represented by the line (below the squares). The vertical axis has been rescaled to show differences between 0o and 10o, in order to see the trends more easily.

Figure 12: Trend of average differences (min = n)

As the year unfolds and summer approaches, the differences between the two cities seems to steadily reduce. By the end of December, Canberra maximum temperatures are more like Perth's and the minimum temperatures are also approaching Perth's.

Of course, the calculator allows for quantitative analyses of data like these as well as graphical displays. For example, Figure 13 shows a line of best fit to the maximum difference data, suggesting a drop of about 0.3 degrees every ten days (a Å -0.35) in the differences between the two cities' maximum temperatures.

Figure 13: A linear model for the differences between maximum temperatures

Conclusion

Much more powerful than a scientific calculator and much more accessible than a computer, the graphics calculator is an idea whose time has come. The personal technology of the graphics calculator has enormous potential to provide all students with access to important ideas and techniques of data analysis. The powerful features built in to the Casio fx-7400G can be used readily and flexibly by students to undertake exploratory data analysis of both their own realistic data and those of others. It is hard to imagine a modern mathematics curriculum that ignores the exciting possibilities it offers.

References

Australian Education Council 1990, A national statement on mathematics for Australian schools, Carlton, Curriculum Corporation.

Hackett, P. & Kissane, B. 1993, The graphics calculator: A hand-held statistics package, in Herrington, T. (ed.) New horizons, new challenges, (pp 131-143) Perth, The Australian Association of Mathematics Teachers.

Kissane, B. 1997, Mathematics with a graphics calculator: Casio fx-7400G, Perth, Mathematical Association of Western Australia. (a)

Kissane, B. 1997, Exploring the myths. The Electronic Classroom, 1(1), 1. (b)

Lovitt, C. & Lowe, I. 1993, Chance and data investigations (Volume 2), (Mathematics Curriculum and Teaching Project), Carlton, Curriculum Corporation. (Data Disk)

Moore, D. 1990, Uncertainty, in Steen, L.A. (ed.), On the shoulders of giants: New approaches to numeracy, (pp 95-137) Washington, DC, National Research Council.

*Reproduced by kind permission of The Mathematical Association of New South Wales.

Please cite as:

Kissane, B. 1998. Personal technology and data analysis, Reflections, 23, 4, 40-44.
[http://wwwstaff.murdoch.edu.au/~kissane/data_analysis/data_analysis.htm]