ETL 1110-1-175
30 Jun 97
matrices takes less time when a smaller number of
considerations needed for proper lagging. As an
data values are used to make estimates and these
example, for data collected on a uniform grid and
equal-sized bins, fixing an n to just satisfy the
efficiencies can be significant when dealing with
minimum N(hk) for the smaller lags will yield
large data sets. Little accuracy is lost because the
insufficient data pairs to meet the minimum N(hk)
nearest neighbors are the most influential in the
for the larger lags. Fixing an n to assure the mini-
kriging weighting scheme.
mum N(hk) for the larger lags will generally have
8
c. A parabolic shape to ( for the Saratoga
N(hk) much greater than the minimum for the smal-
data is shown in Figure 4-3 for the sample vario-
ler lags. Therefore, the question of how much data
gram points plotted for lags up to about 32 km (the
is required to adequately compute a variogram
first four points) and for lags beyond about 56 km.
should also address the relative locations of the
The presence of a parabolic shape in the sample
data-collection sites.
variogram points was not surprising, because
8
c. The first 10 of the 12 bins for ( for the
examination of the data indicates a north-south
Saratoga data contained more than 30 data pairs.
polynomial trend, linear in u and v, was fitted to all
Therefore, the bin width can be decreased to get
8
more points defining the early part of (. These
the data using ordinary least-squares estimation.
8
bin-width adjustments can be made to refine (
Residuals obtained by subtracting this regional
trend surface from the data were used to reestimate
whether it is computed from the data or from the
8
8
( in Equation 4-2 and the sample variogram for the
residuals. A plot of ( for the residuals for the Sar-
residuals is shown in Figure 4-4.
narrowed to about 6.5 km is shown in Figure 4-5.
d. Spatial data are usually not collected on a
4-4. Variogram Refinement
uniform grid but occur in a pattern that reflects
a. In the previous section, an initial 8 was
(
problem areas, accessibility, and general spatial
specified by points computed from Equation 4-2.
coverage. In the Saratoga data set, nonuniform
In general, the larger N(hk) is for any bin or lag
data spacing results in the number of data pairs in
interval k, the more reliable will be the points
each bin, although still greater than 30, being
8
defining ((hk). Also, the larger K is, the greater the
highly variable among the bins. This variability
number of sample variogram points shaping 8 .
(
yields different reliabilities for the points defining
8
(. To establish a balance for N(hk) among the
However, N(hk) and K are competing elements of
8 . Journel and Huijbregts (1978) suggest that
(
bins, variable bin sizes can be used so that each
each lag interval k should have N(hk) equal to at
bin contains approximately the same (large) num-
least 30 pairs. The American Society for Testing
ber of points. A bin with fewer points can be
and Materials (Standard D5922-96) suggests
coalesced with an adjacent bin to form a wider bin
20 pairs for each lag interval. For small data sets
with a greater number of points. Conversely, a bin
the number of intervals may have to be small to
with an excessive number of points can be sub-
guarantee either number of recommended pairs in
divided into adjacent, narrower bins. The coales-
all intervals.
cing and subdividing procedure is largely trial and
error, until the distribution of the pairs of points is
b. It is difficult to determine the minimum
satisfactory to the investigator.
number of data values n needed to satisfy the N(hk)
8
e. The values of ( at the smaller lag values
requirements for all lag intervals of a sample vari-
are the most critical to define the appropriate (.
ogram. Simple combinatorial analysis can estab-
lish a sample size needed to achieve a given total
Therefore, the trade-off between the number of
number of distinct pairs of items taken from the
bins and the number of data pairs within each bin
sample, but it does not address the spatial
can be varied for different regions of the sample
4-6