Curve fitting--part 6.
S. Arlinghaus
1. VARIANTS ON EARLIER CURVE FITTING
A. The exponential curve
Previous curve fitting efforts dealt with using least squares analysis to
fit a straight line,
an exponential, or a logarithmic function to a distribution of data.
In the case of the exponential, the previous work assumed that the curve
eventually settled down to
the x-axis, the line y=0. The x-axis need not be the horizontal asymptote
of this curve, however.
Consider the following example and fit two exponentials--one with
horizontal asymptote y=0 and one with
a different horizontal asymptote.
The general form of the curve is:
y=Ce^(ax) + b
where a < 0 and y=b is the lower bound of the exponential; C is a
constant.
The added term shows how much the curve is lifted above or below the
x-axis.
Exponential curve fit to projected crude birth data, 2005 to 2025
Source: WRD data for Bangladesh.
y=0 y=4
Year WRD proj exp. proj. exp. proj. WRD proj
y-4=Ce^(ax)
2005 2005 LN(y-0) LN(y-4)
y=0 y=4
2005 5.97184 5.85017 5.9407 1.78706 0.67897
2010 5.53296 5.5026 5.48587 1.71072 0.4272
2015 5.07485 5.17568 5.13763 1.6243 0.07218
2020 4.84262 4.86818 4.87101 1.57746 -0.1712
2025 4.69974 4.57895 4.66688 1.54751 -0.357
-0.0122 -0.0534
26.3277 107.75
ln y = -0.01225x+26.32772
ln(y-4) = -0.05341x+107.7501
y=exp(-0.01225x+26.32772)
y-4 = exp(-0.05341x+107.7501)
y=exp(-0.05341x+107.7501)+4
The value of y=4 as a different lower bound was suggested by the WRD data.
Graph:
It appears that the WRD projection, while exponential in general shape,
may have been made using lines of different slopes joined at 2015 (3
above).
The exponential that has y=4 as a horizontal asymptote appears to be
closer to the
criteria used to make forecasts than does the exponential with y=0 as a
horizontal asymptote.
B. The logistic curve (variant).
One variant of the logistic curve, in which the S-shape appears
flatter is the Gompertz curve;
it is used to model growth of various kinds, from financial to population.
The reason the
curve is flatter becomes evident when the logistic equation is written as
a differential equation,
dP/dt = P(a-b*P), and the Gompertz is also written in an equivalent
manner, as
dP/dt = P(a-b*ln P)
the logarithmic factor tends to flatten out the curve and make the
S-shape less curved than would
a logistic fit.
General form for the Gompertz curve:
y=q*e^((-ce)^(-bx))
where q is selected prior to making any analysis and is the value of the
upper bound selected by
the user on carrying capapcity or other bases, and b and c are constants
to be determined depending
on the values selected for q and the beginning and ending times chosen.
There are numerous equivalent forms.
Example. Gompertz curve fit to WRD data from 1955-2025, Bangladesh total
population.
Year Pop. mil. q=300 q=300 Logistic
Gompertz
1955-2025 WRD logistic Gompertz
y=q/(1+ae^(bx)), b<0 y=q*e^((-ce)^(-bx))
5 yr interval
0 45.486 45.486 45.486 y=300/(1+ae^(bx))
y=300*e^((-ce)^(-bx))
1 51.419 54.4071 58.7728
2 58.312 64.6335 73.3423 Find a:
Find c:
3 66.671 76.1858 88.8109 In 1955, t=0, y=45.486.
In 1955, when t=0, y=45.486
4 76.582 89.022 104.782 Thus, 45.486 = 300/(1+a)
Thus, 45.486 = 300*e^((-ce)^(-b0))
5 88.219 103.025 120.879 Solving, a=5.595435
Solving, c=ln(300/45.486)=1.8863779
6 101.147 117.999 136.767 y=300/(1+5.595435e^(bt)
y=300*e^((-1.8863779e)^(-bx))
7 115.593 133.673 152.17
8 132.219 149.716 166.87 Find b. use info. from t=14 in
2025 Find b. use info. from t=14 in 2025
9 150.589 165.765 180.711 In 2025, t=14, y=234.987.
In 2025, t=14, y=234.987
10 170.138 181.457 193.594 Thus,
234.987=300/(1+5.595435e^(14b)) Thus,
234.987=300*e^(-1.8863779e)^(-14b))
11 188.196 196.461 205.464 Solving, b=-0.21477.
Solving, b=1/14 * ln(0.2442523/1.8863779)
12 204.631 210.503 216.306
=0.1460152
13 220.119 223.382 226.135 Logistic equation:
Gompertz equation:
14 234.987 234.981 234.987 y=300/(1+5.595435e^(-0.21477t))
y=300*e^((-1.8863779e)^(-0.1460152x))
Graph:
The two fits are quite far (relatively speaking) from the actual data in
1990.
One might consider, therefore, using 1990 as the endpoint and
extrapolating beyond that.
The procedure is the same as above--just use 1990 instead of 2025 as the
endpoint.
Logistic Gompertz
The value for a is the same as above. The value
for a is the same as above.
y=300/(1+5.595435e^(bt)
y=300*e^((-1.8863779e)^(-bx))
Find b. use info. from t=7 in 1990 Find b.
use info. from t=7 in 1990
In 1990, t=7, y=115.593. In 1990, t=7,
y=115.593
Thus, 115.593=300/(1+5.595435e^(7b)) Thus,
115.593=300*e^(-1.8863779e)^(-7b))
Solving, b=-0.17927 Solving, -b=1/7 *
ln(0.95371/1.8863779) -0.9537
-0.1793 = -0.0974
Logistic equation: Gompertz equation:
y=300/(1+5.595435e^(-0.17927t))
y=300*e^((-1.8863779e)^(-0.09744x))
Year Pop. mil. q=300 q=300
1955-2025 WRD logistic Gompertz
5 yr interval
0 45.486 45.486 45.486
1 51.419 52.8438 54.1925
2 58.312 61.1059 63.5241
3 66.671 70.2925 73.3722
4 76.582 80.3954 83.6207
5 88.219 91.3728 94.1508
6 101.147 103.145 104.846
7 115.593 115.594 115.596
8 132.219 128.563 126.298
9 150.589 141.869 136.861
10 170.138 155.304 147.205
11 188.196 168.654 157.264
12 204.631 181.711 166.983
13 220.119 194.284 176.318
14 234.987 206.209 185.236
Clearly the WRD forecast, if made using this sort of curve, required an
upper bound higher than q=300.
SUMMARY OF CRITERIA FOR CURVE FITTING
Linear
y=mx+b, m the slope, b the second coordinate of the y-intercept.
useful for linear increase.
Exponential
y=e^(mx+b)
Useful to suggest decline toward the horizontal asymptote.
Useful to suggest unbounded increase--"worst case" picture.
Logarithmic
y=ln(mx+b)
Dampened increase in growth
Unbounded
Cubic spline
Exact fit using pieces of cubic curve between given finite set of
evenly-spaced data points.
Bounded fit--not good for forecasting
Interpolating curve.
Logistic
Assumption of exponential growth that tapers off toward some upper
bound.
Produces S-shaped curve (in its full extent) based on two endpoints
to find values of constants.
y=q/(1+ae^(bx)), b<0
Gompertz
Like a logistic curve--produces a flatter S-shape than does a
logistic for the same values.
y=q*e^((-ce)^(-bx))
Other analytical tools
Feigenbaum's graphical analysis
Useful for examining geometric dynamics--point of irreversibility of
dynamic process may suggest
point at which to intervene. The geometric dynamics are based on the
idea of geometric feedback.
Likely useful when real-world feedback can be aligned with geometric
feedback.
Lattices
Useful, potentially, in examining hierarchical structures in which
domination is involved.
Very few applications of this sort of material exist in the
literature.
Fractals
Useful when ideas of self-similarity and scale change are involved.
Difficulty comes in
identifying geometric self-similarity.
Graph theory
Useful when the real-world situation can be partitioned into a set of
nodes (point-locations) and
edges (channels linking these locations). This alignment can be
quite far-fetched, including linking
of ideas--not just the more obvious with transport networks, for
example. Allows for abstract
manipulation based on linkage pattern.
War of the Roses example.