Annual Temperature Plots
xy-Plots Tab

There are a number of times that I wonder how strongly two variables are correlated. On an oscilloscope, this can be visualized by using an xy-plot. As a result, the GHCN Temperature Plotter tool provides an xy tab to produce those types of plots.

Generically, this is similar to the Histogram plots except that one histogram is plotted verses another and the number of counts per bin is represented via a color.

So far, these plots make lots of pretty pictures, but I have not found them to provide anything useful.

On the other hand, the fact that I haven't found anything weird indicates that there are no significant errors in the data beyond those already identified using the other tools.

Overview | Color map | Bin Sizes | Date Ranges | Raw vs Adjusted - Details Ocean data | Other properties | Zooming Images


Overview

This feature plots the value of one property verses another for the selected sites. Because of the nature of the data By default, the bin sizes are adjusted to provide 100 vertical and 200 horizontal bins. (A checkbox is provided to allow the user to specify the bin sizes.)

Currently, only the following properties are supported - these are the same properties supported on the Histogram tab .

Each of these can be associated with either the x- or y-axis. For each site, the code determines which bin (xy-location) its combination of properties should be placed in and increments the count. Once all the selected sites are processed, these counts are used with the provided options to determine the colors displayed. When placing Latitude on the y-axis and Longitude on the x-axis, you will see an image with the selected sites arranged similar to the main map. (A reasonable way to check that at least part of the code is correct.)

The linear regression formula used to compute trends only requires 3 data points (3 years of data). However, data records that short are pretty meaningless - trends of more than 4°C/decade are fairly common near the beginning or end of a data record .. or if the range width is too short. As a result, the application default is to require at least 10 years of data, but can be overridden via the Trend Lines tab. The number of sites that don't meet the minimum requirement will be shown in the excluded field.

Checkboxes and number fields are provided to control the minimum and maximum values of the x- and y-axes.


Color map

Since this an x-verses-y graph, each pixel (each grid point) represents the number of sites with that particular combination. A color map is used to display this information - the number of counts in each bin is checked, from largest to smallest, and assigned one of the following colors.

Of course, the application makes both the colors and ranges user configurable. Depending on your need, I think that Light Grey might be better than Black for the Greater than zero bins. However, I have left the default as Black to make sure the markers are visible on most displays.

If you only want to see the bins with counts between two values, set the limits accordingly and color the other bins White (the background color).


Bin Sizes

By default, the data is scaled to fit a 100x200 grid and the computed bin sizes (rounded to 0.001) are displayed in the provided number fields.

When the Compute bins for 100x200 grid box is unchecked, the user has the option to control the bin sizes. The current configuration is visually indicated by the background color of the number fields - white is editable, grey is readonly.

When only one pixel was used per bin, then the image is large enough to display the bins without any overlaps. However, those are pretty hard to see. To improve visibility, the default markers are significantly larger (6x6), but frequently produce a significant amount of marker overlap. You can use the Marker size control to adjust the overlap. Another approach is to increase the bin sizes which also removes the overlaps, but with a grainier image.

With these custom bin sizes, you can see white space between the bins indicating that there is no marker overlap in the x-direction. To remove the marker overlap in the y-direction requires a Trend bin size of about 0.05 °C/bin. Because the bins are now larger, in the denser parts of the graph there were more sites per bin and the default color mapping produced too many red markers. As a result, the red and orange bin limits were adjusted to display a reasonable number of markers for each color. In the second graph, at least one of the red markers represents over 60 sites with that specific combination.


Date Ranges

Of the selectable properties on the xy tab

The range control is placed under the graph - as with all number fields in the application, you can type a value or use the mouse wheel (hold the shift key to modify digits in the ten's place). As the dates change, the graph is automatically updated. Normally, the range is included with the x-axis label if either the Trend or Number of Years option is selected for either axis. The associated checkbox controls whether the range is included in the x-axis label or not.

When a new range is set, the application determines the number of data points and the slope (via least squares linear regression) in that range. If there are not enough points to compute a trend, then that site will be excluded from the plot and the number of selected, but excluded, sites is reported in the provided field. By default, 10 years of data must exist within the selected range to compute a trend, but you can control that via the Trend Lines tab - only 3 are absolutely necessary, but trends with too few points can be misleading. You really want a large sample .. unless you are specifically looking for problem data.

Since the Baseline Average Temperatures are computed via the Basic Filters tab, the date range controlling those is there.

Remember, the sites used in these plots are selected using a number of filters.


Raw vs Adjusted

The application provides both adjusted and unadjusted (raw) GHCN data (each with and without ocean data). There are many differences between these (Out of 5,539 stations, 761 are identical in both datasets.)

As explained above, using any of the available datasets, you can plot one of the provided properties verses another. In addition, assuming that a raw/adjusted pair of datasets is loaded (the options are greyed out if they are not), you can compare the following properties

by plotting either

The resulting plots where pretty much what I expected - except for the amount of deleted data. I have always been aware that the adjusted data had fewer data points (years per station) than the raw data, but I never expected so many sites to have more than 10 years of data removed via the adjustment process.

The Change in Trend plot has a second image visible when the mouse hovers over the image. This shows the same data except that the 644 absolutely identical sites are removed. Use the Raw vs Adj Filter tab to deselect sites that are identical in both datasets.


Details

There are 2 options to help compare the raw and adjusted data.

The x-axis selection is located just below the Plot (Adjusted - Raw) vs selection radio button and is disabled unless the option is selected. Only the raw versions of the available properties are available.

These 2 radio buttons only work with the Trend, Baseline Temperature, and Num of Yrs options. When either radio button is selected, the other 2 data options (Lat/Long) are disabled and greyed out since their values are identical in both datasets. In addition, all the x-axis radio buttons are disabled since both datasets MUST use the same property.

Since every adjusted site has an associated unadjusted (raw) site, these features use only the sites selected on the adjusted dataset. This means that it does not matter if sites on the raw dataset are selected or not. It also does not matter which dataset is displayed on the map.

When the displayed data is changed from one where both raw and adjusted are available to one where only one is loaded, then

The baseline temperature is an average based on the dates set on the Basic Filters tab. When this is changed on one dataset, be sure to also change it on the other - otherwise, the 2 datasets will not be in sync. When Baseline Temp is selected, there should never be excluded sites - if there are, then check to make sure the baselines are in sync. (I have seen this problem - user error.)


Ocean data

The provided ocean data is identical in both the raw and adjusted datasets (verified using the identical in both option provided on the Raw and Adj Filters tab). As a result, the raw vs adjusted features provided via this tab have no value.

Note: The reported ocean temperatures are only anomaly values (relative to an unspecified baseline). I verified that the baseline is not the same as the application default by simply observing that the values simply cluster around zero. For more information, see my Histograms page discussing oceans.


Other properties

In general, there are many ways to process data - some provide useful results, others are of no obvious value.

These are some of the other parameters that might be useful in an xy-plot - min, max, mean, median, σ2, R2, etc - each associated with some time period.

Obviously, with a large selection, radio buttons are a problem. (They require too much space.) Normally a pull-down listbox is more appropriate with a large number of options. (I strongly dislike write-in data fields because then the user must know in advance what the available options are.)

At any rate, these xy-plots are already of questionable value and I don't want to make it harder to understand and/or use by simply adding a lot of parameters that no one wants.


Zooming Images

Note: All the images on this page can be zoomed by simply using the mouse wheel.
Double click to toggle full size to default size


Author: Robert Clemenzi
URL: http:// mc-computing.com / Science_Facts / Annual_Temperature_Plots / xy_tab.html