ETL 1110-1-175
30 Jun 97
4-9. Cross-Validation for Model
variogram. For example, the lag associated with
the maximum of 8 of the residuals can be a good
(
Verification
first approximation for the range of the theoretical
a. General.
variogram.
(1) Parameters of the theoretical variogram
4-8. Outlier Detection
obtained from the initial fitting and refinement of
the sample variogram are calibrated using a krig-
a. Outliers in a data set can have a substantial
ing cross-validation technique. In this procedure,
adverse effect on 8 . However, divergent data
(
the fitted theoretical variogram is used in a kriging
values can be screened for evaluation using a
analysis in which data values are individually sup-
Hawkins statistic (Hawkins 1980), which is
pressed and estimates made at the location using
described in the context of kriging by Krige and
subsets of the remaining points. As described in
Magri (1982). A neighborhood containing 4 to 10
section 4-3, these subsets are the data points in a
data points, approximately normally distributed,
moving neighborhood surrounding the point under
around each suspected outlier must be defined.
Despite potential outliers in the data set, a best
each data location requires a matrix inversion,
guess initial theoretical variogram also is needed.
which could be very time-consuming if all remain-
ing data locations were used to construct the
b. The Hawkins statistic is obtained by com-
matrices rather than just those within a neighbor-
paring a suspect datum to the mean value of the 4
hood of a limited search radius.
to 10 surrounding data, the smaller number being
sufficient if the variability is lower. The spacing
(2) After kriged values at all data locations
between these surrounding points is accounted for
have been estimated in the above manner, the data
by the properties of the chosen variogram. A value
are used with their kriged values and kriging stan-
for the statistic of 3.84 or higher would indicate an
dard deviation to obtain cross-validation statistics.
interval. A larger number of surrounding points
these statistics, which are described in the next
has the direct effect of increasing the magnitude of
section. If the criteria cannot be reasonably met by
the statistic. Anomalous points are removed from
adjusting the parameters in the given theoretical
the data set and the procedures described for
variogram function, then calibration should be
obtaining the sample variogram are repeated for
reinitialized with a different theoretical variogram
the smaller data set. There were no outlier prob-
function. In some data sets with nonstationary
lems in the Saratoga data.
spatial means, the drift polynomial may have to be
changed as well as the variogram to achieve a
c. There is debate among geostatisticians
satisfactory calibration.
regarding the merit of automated outlier-detection
b. Calibration statistics.
methods. A procedure such as that described here
is presented as an investigative tool with the under-
(1) The kriging cross-validation error ei cor-
standing that the investigator will also use atten-
responding to measurement z(xi) is defined as
statistic to ultimately decide if a data value is
ei = z (x ) & ^ (x )
(4-3)
discarded as a true outlier or retained as a valid
z
i
i
observation. In some situations, highly problem-
atic data values are removed for computation of
where ^ (xi) is the kriged estimate of z (xi) based
z
the sample variogram points but are reinstated for
on the remaining n-1 measurements in the data set.
kriging.
4-14