Sunday 17 October 2010

Empir Econ (2010) 38:23–45
DOI 10.1007/s00181-009-0254-1
ORIGINAL PAPER
Analysis of county employment and income growth
in Appalachia: a spatial simultaneous-equations
approach
Gebremeskel H. Gebremariam ·
Tesfa G. Gebremedhin · Peter V. Schaeffer
Received: 15 April 2006 / Accepted: 15 September 2008 / Published online: 31 January 2009
© Springer-Verlag 2009
Abstract County median household income and employment growth rates tend
to be characterized by spatial interaction. A spatial simultaneous-equations growth
equilibrium model was estimated using GS2SLS and GS3SLS. The results indicate
strong feedback simultaneity between employment and median household income
growth rates. They also show spatial autoregressive lag simultaneity and spatial
cross-regressive lag simultaneity with respect to employment and median household
income growth rates, as well as spatial correlation in the error terms. Estimates of
structural parameters show strong agglomerative effects and significant conditional
convergence with respect to employment growth and median household income growth
in Appalachia in the 1990 s.
Keywords Employment · Income · Spatial analysis · Appalachia
JEL Classification C3 · R1 · R5
1 Introduction
State policy makers and local leaders have long placed a high priority on local
economic development (Isserman 1993; Pulver 1989; Ekstrom and Leistritz 1988).
The changing structure of traditional industries and the impact of those changes on
local communities have challenged the efficacy of established policies and strategies.
G. H. Gebremariam
Department of Economics, Virginia Polytechnic Institute, State University, Blacksburg, USA
T. G. Gebremedhin · P. V. Schaeffer (B)
Division of Resource Management, West Virginia University, Morgantown, USA
e-mail: peter.schaeffer@mail.wvu.edu
123
24 G. H. Gebremariam et al.
A better understanding of factors that influence local employment, earning capacity,
and quality of life issues has therefore become important for state, regional, and local
agencies in charge of rural development policies. One of the policy challenges at the
local and county level is spatial interdependencies. Outside of the far west, counties
are small and in most instances cannot be thought of as even rough approximations
of labor markets, and often as market areas for many consumer goods, either. This
is expressed by the mean travel time to work in a rural state such as West Virginia,
which in 2005 ranked fourteenth in the nation, with 24.9min just below the national
average of 25.1min, and ahead of more urbanized states such as Ohio and Michigan
(US Census Bureau 2005).We single outWest Virginia because of its rural nature and
because it is the only state completely contained within Appalachia. The twelve states
with counties in Appalachia have commuting times (one way) of 22.4min or higher.
Dealing with spatial interdependencies is therefore one of the major objectives of this
research.
Many of the forces responsible for past economic and social changes continue to
have an impact. One of these changes was the emergence of computer-based technology
in production, administration and information, which has reduced the role of
economies of scale in many sectors. Studies by Loveman and Sengenberger (1991)
and Acs and Audretsch (1993), for example, have shown a shift in industry structure
toward decentralization and an increased role for small firms. This was mainly due
to changes in production technology, consumer demand, labor supply, and the pursuit
of flexibility and efficiency. These factors led to the restructuring and downsizing of
large enterprises and the entry of newfirms. Brock and Evans (1989) provide extensive
documentation of the changing role of small businesses in the US economy, which are
likely the result of responses to structural adjustments.
Parallel with technical changes leading to new industrial structures, new patterns
of consumer expenditures and demand resulting from rising living standards contributed
to the emergence of fragmented consumer markets, which also favored small
consumer-oriented firms over high volume, production-oriented firms. Thus, new
business opportunities in small and medium size enterprises resulted as large firms
downsized in response to a changing environment. The emerging view among policy
makers is that small business is a key element and driving force in generating employment
and realizing economic development. This paradigm shift has brought about a
revival in small businesses promotion and entrepreneurial initiatives at local, national
and international levels.
Most new businesses start small and small businesses create the majority of new
jobs (Acs and Audretsch 2001; Audretsch et al. 2000; Carree and Thurik 1998, 1999;
Fritsch and Falck 2003; Reynolds 1994; Wennekers and Thurik 1999). A growing
literature has explored the determinants of the regional variations in new business formation
(Acs and Armington 2003; Audretsch and Fritsch 1994; Callejon and Segarra
2001; Davidson et al. 1994; Fotopoulos and Spencer 1999; Fritsch 1992; Garofoli
1994; Guesnier 1994; Hart and Gudgin 1994; Johnson and Parker 1996; Kangasharju
2000; Keeble and Walker 1994; Reynolds 1994).
The geographic impact of the change from large production-oriented plants to
smaller consumer-oriented firms and plants is uncertain.While smaller unitswould tend
to make rural production sites relatively more competitive, the consumer-orientation,
123
Analysis of county employment and income growth in Appalachia 25
which tends to favor locations close to markets, is more likely to have the opposite
effect. Hence, it is not possible to predict the impact of the changes discussed above
on the geographic distribution of economic activity a priori.
The literature on economic growth at the regional level has focused attention on
the so-called convergence hypothesis of neoclassical growth theory which predicts
that poorer regions tend to catch up with the richer regions in per capita income as
time passes, through the process of factor mobility. Because of the spatial structure of
our model, we can test for convergence. Previous studies by Barro and Sala-i-Martin
(1992, 2004) for US states, Japanese prefectures and between European countries,
and by Persson (1997) and Aronsson et al. (2001) across Swedish counties, found
income evidence of convergence. Similar studies by Arbia et al. (2005) of 92 Italian
provinces (1970–2000), Ertur et al. (2006) of 138 European regions (1980–1995), and
Rappaport (1999) ofUS counties (1970–1990), also found income convergence. However,
a study by Glaeser et al. (1995) did not discover significant evidence of income
convergence between US cities. Of particular interest are two papers by Higgins et al.
(2006) and Young et al. (2008) that looked at per capita income net of government
transfers in US counties in all fifty states from 1970 to 1998. They found a speed of
convergence of between 6 to 8%, considerable faster than the approximately 2% typically
reported. Higgins et al. (2006) also found a much a faster speed of convergence
in counties located in southern than in northeastern states.
The relationship between economic growth and its determinants has been studied
extensively. One issue is whether population is driving employment changes or
employment is driving population changes (do ‘jobs follow people’ or ‘people follow
jobs’?). Empirical studies on identification of the direction of causality have resulted
in empirical models of regional development that often reflect the interdependence
between household residential choices and firm location choices (Steinnes and Fisher
1974). To account for this causation and interdependency, Carlino and Mills (1987)
constructed a simultaneous system model with two partial location equations as its
components. They used data for counties in the contiguous United States. The empirical
result from their study of greatest interest to us is the finding that in the 1970 s family
income had a strong impact on the growth of population density as well as employment
density. Recently, Deller et al. (2001) expanded the original Carlino-Mills model and
presented a three-dimensional model (jobs-people-income) that explicitly traces job
quality and the role of income in the regional growth process. They also used county
data, but restricted themselves to non-metropolitan counties; the time period studied
was 1985–1995. Their empirical results indicate that initial conditions co-determine
the eventual outcome and that counties with higher initial population levels tended to
have higher employment growth. However, counties that had higher levels of population,
employment, and per capita income in 1985 tended to have lower rates of overall
growth.
There have also been efforts to model the interactions between employment growth
and human migration (Clark and Murphy 1996; MacDonald 1992), per capita personal
income and public expenditures (Duffy-Deno and Eberts 1991), and net migration,
employment growth, and average per capita income (Greenwood and Hunt
1984; Greenwood et al. 1986; Lewis et al. 2002) in simultaneous-equations models.
Among these contributions, Clark and Murphy’s (1996) findings have been particularly
123
26 G. H. Gebremariam et al.
influential. Their empirical analysis covered the period 1981–1989 and was conducted
at the county level. They expanded the Carlino-Mills model by including amenity measures
beyond climate (temperature), neighborhood poverty, and fiscal variables. Their
results are consistent with those of Carlino and Mills (1987) and, specifically, they
find simultaneity between employment density and population density.
The focus of this empirical analysis is Appalachia, a region that is for many a symbol
of poverty and underdevelopment in the midst of prosperity (Pollard 2003). It is a
region of about 23 million people. Forty-two percent of the population is rural, compared
to 20% for the nation as a whole. Many parts of the region can also be considered
remote because of topography and a comparatively poor transportation infrastructure.
Appalachia also constitutes a separate policy region, with programs administered by
the Appalachian Regional Commission. The unit of analysis is the county, so that we
can trace local economic development in terms of employment and income growth
data, respectively. The time period considered is 1990–2000. This was a decade of
economic growth and expansion in most of the United States. It is of interest to study
if and/or how the boom of the 1990 s impacted Appalachian counties.
Like the studies mentioned above, this article examines the determinants of regional
variations in employment and household income growth rates using county data.
Its novel contribution lies in a methodological innovation. Specifically, the model
introduces both spatial lag and spatial error dependence into a simultaneous equation
model and obtains estimation results using Generalized Spatial 3 Stage Least Squares
(GS3SLS). This has not been previously done and yields more efficient and consistent
estimates. The estimation strategy is discussed in the estimation issue section.
2 Method of analysis
Interdependence between employment and income exists because both households and
firms are mobile and locate to maximize utility and profits, respectively. Households
migrate if they can capture better income opportunities than those available at their
current location and firms move to be near growing markets. The location decisions
of firms are also expected to be influenced by factors such as local business climate,
labor costs, tax rates, local public services and the supply of inputs. In addition,
government-provided incentives may influence where firms locate. Such regional
factors that affect households’ and firms’ decision making are also likely to exhibit
spatial autocorrelation (Anselin 1988, 2003). These assumptions are expressed as
three hypotheses to be tested: (1) Employment growth and median household income
growth are interdependent and jointly determined by regional variables; (2) Employment
growth and median household income growth in a county are conditional upon
initial conditions of that county; and (3) Employment growth and median household
income growth in a county are conditional upon business and median household
income growth in neighboring counties. Emphasis is put on determining the linkages
between employment growth and household median income, as well as on examining
the elasticity of these variables with respect to each of the regional variables.
To test the three hypotheses, a spatial simultaneous equations model of business
growth and householdmedian income is used. Following Carlino and Mills (1987) and
123
Analysis of county employment and income growth in Appalachia 27
building on Boarnet (1994), a model that incorporates own-county and neighboring
counties effects is specified as follows in matrix notation:
EMP∗
i t
= f1 MHY∗
t ,WMHY∗
t ,WEMP∗
t ,
Xem
t−1 (1a)
MHY∗
i t
= f2 EMP∗
t ,WEMP∗
t ,WMHY∗
t ,
Xmh
t−1 (1b)
EMP∗
t and MHY∗
t are the equilibrium levels of private non-farm employment and
median household income, respectively, and t denotes time. W is a row standardized
spatial weights matrix with typical element wi j . Each element wi j represents a measure
of proximity between location i and location j . We define the adjacency criteria
such that wi j equals 1/ni ; ni is the number of nonzero elements in the ith row of W.
The row element is nonzero if location i and j are adjacent and 0 otherwise.WEMP∗
t
and WMHY∗
t represent the equilibrium values of neighboring counties’ effects for
private non-farm employment and median household income, respectively. They are
obtained by multiplying EMP∗
t and MHY∗
t , respectively, with W. The matrices of
additional exogenous variables in the respective equations of the system of spatial
simultaneous equations are given by Xem
t−1 and Xmh
t−1, respectively. The descriptions
of these variables are given in the data section below. Note that equilibrium levels
of private non-farm employment and median household income are assumed to be
functions of the equilibrium values of the respective right-hand endogenous variables,
their spatial lags and the vectors of the additional exogenous variables.
The system of equations in (1a, b) captures the simultaneous nature of the interactions
between employment growth and median household income at equilibrium. The
nature of interaction among the endogenous variables depends on the initial conditions
in a county.
Based on the result of a generalized PE-test, a multiplicative log-linear form of the
model was used. The model specification is discussed in greater detail in the section
“Estimation Issues.” The chosen functional form implies constant elasticity for the
equilibrium conditions given in (1a,b). A log-linear (i.e., log-log) representation of
the equilibrium conditions can thus be expressed as:
EMP∗
t
= MHY∗
t a1 × WEMP∗
t b1 × WMHY∗
t c1 ×
K1

k=1
Xem
kt−1 x1k (2a)
MHY∗
t
= EMP∗
t a2 × WMHY∗
t b2 × WEMP∗
t c1 ×
K2

k=1
Xem
kt−1 x2k (2b)
where ai , bi and ci i = 1, 2 are the exponents on the endogenous variables and their
spatial lags, xikq for i, q = 1, 2 are vectors of exponents on the exogenous variables,
is the product operator, and Ki for i = 1, 2 is the number of exogenous
variables in the private non-farm employment and median household income
equations, respectively. The log-linear specification has the advantage of yielding a
log-linear reduced form for estimation, where the estimated coefficients represent elasticities.
Duffy-Deno (1998) and Mackinnon et al. (1983) also show that, compared to a
123
28 G. H. Gebremariam et al.
linear specification, a log-linear specification ismore appropriate formodels involving
population and employment densities.
Previous empirical studies suggest that employment and median household income
likely adjust to their equilibrium levels with a substantial lag (Aronsson et al. 2001;
Barkley et al. 1998; Boarnet 1994; Carlino and Mills 1987; Deller et al. 2001; Duffy
1994; Duffy-Deno 1998; Edmiston 2004; Hamalainen and Bockerman 2004; Henry
et al. 1999, 1997; Mills and Price 1984). Therefore, based on these studies, a distributed
lag adjustment is introduced and the corresponding partial-adjustment process
for Eqs. (1a,b) takes the form:
EMPt
EMPt−1
=
EMP∗
t
EMPt−1
ηem
→ ln(EMPt )
− ln (EMPt−1) = ηem ln EMP∗
t − ηem (EMPt−1) (3a)
MHYt
MHYt−1
=
MHY∗
t
MHYt−1
ηmh
→ ln(MHYt )
− ln(MHYt−1) = ηmh ln MHY∗
t − ηmh ln(MHYt−1) (3b)
The subscript t − 1 refers to the variable lagged one period, one decade in this study,
and ηem and ηmh are parameters representing the speed of adjustment of employment
and median household income to their respective equilibrium levels. They are interpreted
as the proportions of the respective equilibrium rate of growth that were realized
in each period. If both ηem and ηmh are less than one, then the system is stable and
guaranteed to converge.
The existence of spatial autocorrelation in the errors is tested by means of a Global
Moran’s I test statistic, as suggested by Anselin and Kelejian (1997) for models with
endogenous regressors. A more general version of Moran’s I test statistic and its
asymptotic distribution is given by Kelejian and Prucha (2001). The results of the test
(Table 2) indicate the existence of spatial autocorrelation in the errors of all equations
in (3a, b). Therefore, we need a model that accounts for this spatial effect.We achieve
this by substituting Eqs. (2a, b) into Eqs. (3a, b). Eliminating the unknown equilibrium
values and simplifying the model yields the following system:
EMPRt = α1 + ηema1
ηmh
MHYRt + ηemb1
ηem
WEMPRt + ηemc1
ηmh
WMHYRt
+ηema1 ln(MHYt−1)+ηemb1 ln(WEMPt−1)
+ηemc1 ln(WMHYt−1)
+
K1

k=1
ηem x1kln Xem
kt−1 − ηem ln(EMPt−1) + uem
t (4a)
MHYRt = α2 + ηmha2
ηem
EMPRt + ηmhb2
ηmh
WMHYRt + ηmhc2
ηem
WEMPRt
+ ηmha2 ln(EMPt−1)+ηmhc2 ln(WEMPt−1)
123
Analysis of county employment and income growth in Appalachia 29
+ηmhb2 ln(WMHYt−1)
+
K2

k=1
ηmhx2kln Xge
kt−1 − ηmh ln(MHYt−1) + umh
t (4b)
EMPRt and MHYRt are the log differences between the end and beginning period
values of private non-farm employment and median household income, respectively,
and denote the growth rates of the respective variables. αr and ρr, for r = 1, 2, are
unobserved parameters. uem
t and umh
t are n ×1 vectors of disturbances? Note that the
disturbance vector in the r th equation is generated as:
ut,r = ρrWut,r + εt,r , r = 1, 2
This specification relates the disturbance vector in the r th equation to its own spatial
lag. The vectors of innovations (εi t,r , r = 1,2 or εem
t and εmh
t ) are distributed identically
and independently with zero mean and variance-covariance σ2
r , for r = 1, 2.
Hence, they are not spatially correlated. The specification of the mode, however, allows
for innovations that correspond to the same cross sectional unit to be correlated across
equations. As a result, the vectors of disturbances are spatially correlated across units
and across equations.
Equations (4a, b) constitute a system of simultaneous equations with feedback
simultaneity, spatial autoregressive lag simultaneity, spatial cross-regressive lag simultaneity,
and spatial autoregressive disturbances.The endogenous variables of themodel
are EMPRt and MHYRt . If each equation is investigated separately, we notice that
each of these variables is expressed in terms of the right hand endogenous variables
and their spatial lags, the logs of the lagged endogenous variables and their spatial
lags, and the logs of other exogenous variables. By structure, the spatial lags of the
lagged endogenous variables are, however, included in the spatial lags of the respective
endogenous variables. Hence, in order to avoid multicollinearity, the model is
estimated by excluding all the spatial lags of the lagged endogenous variables.
3 Data types and sources
The data for the 417 Appalachian counties used for the empirical analysis were collected
and compiled from County Business Patterns, Bureau of Economic Analysis,
Bureau of Labor Statistics, Current Population Survey Reports, County and City Data
Book, US Census of Population and Housing, US Small Business Administration,
and Department of Employment Security. Data for county employment and county
median household income were collected for 1990 and 2000.
3.1 Dependent variables
The dependent variables used in the empirical analysis include the growth rate of
employment and the growth rate of median household income.
123
30 G. H. Gebremariam et al.
3.1.1 Growth rate of employment (EMPR)
The growth rate of employment is measured by the log-difference between the 2000
and the 1990 levels of private non-farm employment, exclusive of self-employment.
Empirical research indicates that in the study period most new jobs were generated
by new small businesses (Acs and Audretsch 2001; Audretsch et al. 2000; Carree and
Thurik 1998, 1999; Wennekers and Thurik 1999; Fritsch and Falck 2003). Research
by the US Small Business Administration also shows that job creation capacity in the
US is inversely related to the size of the business. Between 1991 and 1995, for example,
enterprises employing fewer than 500 people created new jobs as follows (size
of enterprise in parenthesis): 3.843 million (1–4), 3.446 million (5–19), 2.546 million
(20–99), and 1.011 million (100–499). During the same period, enterprises employing
500 or more people lost 3.182 million net jobs (US Small Business Administration
(SBA) 1999).
3.1.2 Growth rate of median household income (MHYR)
The log-difference between the 2000 and 1990 levels ofmedian household income in a
given county is used to measure the growth rate of median household income. Median
household income is used as an average overall measure of county-level income.
Median household income is preferable to using the mean household income because
unlike the mean, the median is not influenced by the presence of a few extreme values.
The spatial lags of the Growth Rate of Employment (WEMPR) and Growth Rate
of Median Household Income (WMHYR) are included on the right hand side of each
equation of (4a, b). These spatially lagged endogenous variables are created by multiplying
each of the dependent variables by a row standardized queen-based contiguity
spatial weights matrix W.
3.2 Independent variables
The independent variables include demographic, human capital, labor market, housing,
industry structure, and amenity and policy variables. In line with the literature,
unless otherwise indicated, the initial values of the independent variable are used in
the analysis. This type of formulation also reduces the problem of endogeneity. All
the independent variables are in log form except those that can take negative or zero
values. The descriptions of each of the independent variables of the models are given
below.
Equation (4a) includes a vector of control variables (Xem
kt−1) for k = 1, . . . , K1,
which includes human capital, agglomeration effects, unemployment, and other
regional socio-economic variables that are assumed to influence county employment
growth (business growth) rate. Human capital is measured as the percentage of adults
(over 25 years old) with college degrees and above (POPCD), and the percentage of
adults (over 25 years old) with high school diploma (POPHD). It is expected that educational
attainment is positively associated with employment growth. To control for
agglomeration effects from both the supply and demand sides, county population size
123
Analysis of county employment and income growth in Appalachia 31
(POPs) and the percentage of the population between 25 and 44 of age (POP25-44)
are included and it is expected that agglomeration effects to have a positive impact
on employment growth. The county unemployment rate (UNEMP) is included as a
measure of local economic distress. Although a high county unemployment rate is
normally associated with a poor economic environment, it may provide an incentive
for individuals to form new businesses that can employ not only the owners, but also
others. Thus, we do not know a priori whether the impact of UNEMP on employment
growth is positive or negative. Establishment density (ESBd), which is the total number
of private sector establishments in the county, divided by the county’s population,
is included to capture the degree of competition among firms and crowding of businesses
relative to the population. The coefficient of ESBd is expected to be negative.
Vector Xem
kt−1 also includes OWHU (owner occupied housing) to capture the effects
of the availability of resources to finance businesses and create jobs on employment
growth in the county. The percentage of owner-occupied dwellings is expected to be
positively associated with employment growth in the county. Also included in Xem
kit
are property tax per capita (PCPTAX), percentage of private employment in manufacturing
(MANU), percentage of private employment in wholesale and retail trade
(WHRT), natural amenities index (NAIX), highway density (HWD), gross in-migration
(INM), gross out-migration (OTM), median household income (MHY), and direct local
government expenditures per capita (GEX). Since the percentage of the populations
between 5 and 17 years of age (POP5-17) and above 65 years of age (POP > 65)
do not constitute the prime working age of the population, they are not included in
Eq. (4a). Direct federal expenditures and grants per capita (DFEG) in Appalachia have
been mainly income support in the form of Food Stamps, Social Security Disability
Insurance (SSDI), Temporary Assistance for Needy Families (TANF), and Supplemental
Security Income (SSI) and hence not directly related to employment creation
(Black and Sanders 2004). Homeownership (OWHU) and the social capital index
(SCIX) are highly correlated. In order to avoid the problem of multicolinearity, SCIX
is not included in Eq. (4a). SCIX is a county-level index that incorporates associational
density of associations such as civic groups, religious organizations, sport clubs, labor
unions, political and business organizations, percentage of voters who vote for presidential
elections, county-level response rate to the Census Bureau’s decennial census,
and the number of tax-exempt non-profit organizations (Rupasingha et al. 2006).
We also use the natural amenity index created by McGranahan (1999) from standardized
mean values of climate measures (January temperature, January days of sun,
July temperature, and July humidity), topographic variation and water area as proportion
of county area (see http://www.ers.usda.gov/Data/NaturalAmenities/natamenf.
xls). Note that since both SCIA and NAIX are indices of many exogenous variables,
they will constitute important parts of the instrument matrix that will be used to identify
the endogenous variables of the system.
Equation (4b) contains a vector of exogenous variables (Xmh
kt−1, k = 1, . . . , K2),
which includes, among others, POPs, POPd, FHHF, POPHD, UNEMP, MANU,
WHRT, and Social Capital Index (SCIX).
The initial levels of employment (EMPt −1) and median household income
(MHYt−1) are also included in the respective equations of (4a, b). These variables
are treated as predetermined variables because their values are given at the
123
32 G. H. Gebremariam et al.
beginning of each period and hence are not affected by the endogenous variables.
Table 1 provides the full list of the endogenous, and of the spatial lag and control
variables, their descriptions and the sources of the data.
4 Estimation issues
Equations (4a, b) constitute amodel with feedback simultaneity, spatial autoregressive
lag simultaneity, and spatial cross-regressive lag simultaneity with spatially autoregressive
disturbances. This creates complications, ofwhich the choice of the functional
form of each equation, whether or not each equation is identified, and the choice of
the estimator and instruments are the most important ones.
Concerning the functional form, a generalized PE test was performed (Kmenta
1986, pp. 521–522; Mackinnon et al. 1983) to determine whether a linear or log-linear
specification is most appropriate. The test indicates that the log-linear specification is
preferred to the linear form for all equations. Thus, the model is specified in log-linear
form with two modifications involving the measurement of the explanatory variables.
First, the natural log formulation is dropped for explanatory variables that can assume
negative or zero values. Second, lagged 1990 values are used for all explanatory variables
to avoid simultaneity bias.
Concerning identification, first, for each equation, the number of basic endogenous
variables that appear on the right hand side is smaller than the number of control variables
that appear in the model but not in that equation. Second, in those cases where
there are more instruments than needed to identify an equation, a test statistic1 was
computed (Hausman 1983) to investigate whether the additional instruments are valid
in the sense that they are uncorrelatedwith the error term. That is E(Q ur ) = 0,where
Q is an instrument matrix as defined below. Fulfillment of this condition ensures that
the instrument Q allows us to identify the regression parameters [α

, β

, λ

, γ
] of
Eqs. (4a, b), where α
is a vector of slope coefficients and β

, λ

, γ
are vectors of
coefficients of the right-hand dependent variables, the spatial lag variables, and the
predetermined variables, respectively.
As to the choice of estimator, the Method of Moments is preferred over the
Maximum Likelihood approach because the latterwould involve significant additional
computational complexity.2 The conventional three-stage least squares estimation to
1 This test statistic is nR2
u , where n is the sample size and R2
u is the usual R-squared of the regression of
residuals from the second-stage equation on all included and excluded instruments. In other words, estimate
Eqs. (4a, b) by GS2SLS or any efficient limited-information estimator and obtain the resulting residuals,
ˆ ur . Then, regress these on all instruments and calculate nR2
u . The statistic has a limiting chi-squared distribution
with degree of freedom equal to the number of over-identifying restrictions, under the assumed
specification of the model.
2 In theMaximum Likelihood approach, the probability of the joint distribution of all observations is maximized
with respect to a number of parameters. This involves the calculation of the Jacobian that appears in
the log-likelihood function, which is computationally challenging. The complexity becomes overwhelming
if the sample size is large, which applies in our case, and if the spatial weights matrices are not symmetric,
which also applies in our case, even if the sample size is moderate (Kelejian and Prucha 1999, 1998).
We also do not expect the error terms in our model to be normally distributed, which is required for the
Maximum Likelihood procedure.
123
Analysis of county employment and income growth in Appalachia 33
Table 1 Descriptive statistics
Variable code Variable description Mean SD Minimum Maximum
Constant 1.00 0.00 1.00 1.00
EMPR Employment Growth Rate
1990–2000
0.17 0.25 −0.69 1.79
MHYR Median Household Income Growth
Rate 1990–2000
0.48 0.31 −0.49 1.40
WEMPR Spatial Lag of EMPR 0.18 0.14 −0.18 0.81
WMHYR Spatial Lag of MHYR 0.47 0.19 −0.11 1.02
POPs Population, 1990 10.30 0.94 7.88 14.11
POPd Population Density, 1990 4.28 0.90 1.85 7.75
POP5-17 Percent of Population between
5–17Years, 1990
2.92 0.12 2.17 3.22
POP25-44 Percent of Population between
25–44Years Old, 1990
3.38 0.08 2.79 3.74
POP > 65 Percent of Population above
65Years Old, 1990
2.60 0.20 1.55 3.20
FHHF Percent of Female Householder,
Family Householder, 1990
2.32 0.20 1.81 3.19
POPHD Persons 25Years and over, % High
School only, 1990
4.10 0.17 3.57 4.47
POPCD Persons 25Years and over, %
Bachelor’s Degree or above, 1990
2.27 0.41 1.31 3.73
OWHU Owner-Occupied Housing Unit in
Percent, 1990
4.33 0.08 3.87 4.47
MHV Median Value of Owner Occupied
Housing 1990
10.74 0.26 9.67 11.68
UNEMP Unemployment Rate 1990 2.15 0.35 1.22 3.25
AGFF % Employed in Agriculture,
Forestry and Fisheries 1990
3.62 2.66 0.00 17.10
MANU % Employed in Manufacturing
1990
3.14 0.57 0.79 3.98
WHRT % Employed in Wholesale and
Retail Trade 1990
2.92 0.19 2.16 3.32
FIRE % Employed Finance, Insurance
and Real Estate 1990
1.23 0.33 0.00 2.23
HLTH % Employed Health Service 1990 1.95 0.34 0.74 3.44
NAIX Natural Amenities Index 1990 0.14 1.16 −3.72 3.55
ESBd Establishment Density 1990 2.93 0.34 1.87 4.09
EFIR Earnings in Finance Insurance and
Real Estate 1990
21075.08 96011.09 0.00 1638807.0
CSBD Commercial and Saving Banks
Deposits 1990
12.21 1.07 8.83 16.95
DFEG Direct Federal Expenditure and
Grants per Capita 1990
7.99 0.38 6.98 10.18
FGCE Federal Government Civilian
Employment per 10,000 Pop.
1990
60.48 101.03 0.00 1295.00
PCTAX Per Capital Local Tax 1990 5.91 0.53 4.51 7.42
PCPTAX Property Tax Per Capita 1990 5.52 0.62 3.91 7.36
SCIX Social Capital Index 1987 −0.60 0.94 −2.53 5.64
HWD Highway Density 1990 0.69 0.40 −0.34 2.63
123
34 G. H. Gebremariam et al.
Table 1 continued
Variable code Variable description Mean SD Minimum Maximum
ESBs Establishment Size 1990 2.53 0.30 1.49 3.60
AWSR Average Annual Wage and Salary Rate 1990 9.75 0.19 9.31 10.35
EMP Employment 1990 8.83 1.25 5.42 13.38
INM In-Migration 1990 7.09 1.00 4.54 10.52
OTM Out-Migration 1990 7.04 0.97 4.50 10.55
MHY Median Household Income 1989 9.94 0.23 9.06 10.68
GEX Direct General Expenditures per Capita 1992 7.23 0.28 6.49 8.11
All variables are expressed in logs except AGFF, EFIR, FGCE, SCIX, and NAIX
handle the feedback simultaneity is inappropriate, because of the spatial autoregressive
lag and spatial cross-regressive lag simultaneities terms. The Spatial Generalized
Methods of Moments approach used by Rey and Boarnet (2004) in a Monte Carlo
analysis of alternative approaches to modeling spatial simultaneity is also inappropriate,
because the model includes spatially autoregressive disturbances. Therefore,
we use the Generalized Spatial Two-Stage Least Squares (GS2SLS) as suggested by
Kelejian and Prucha (1998, 1999), and the Generalized Spatial Three-Stage Least
Squares (GS3SLS) approach as outlined by Kelejian and Prucha (2004).
TheGS2SLS and GS3SLS procedures are carried out in three and four step routines,
respectively. The first three steps are common to both routines. In the first step, the
parameter vector α

, β

, λ

, γ
is estimated by two stage least squares (2SLS), using
an instrument matrix Q that consists of a subset of linearly independent columns
X,WX,W2X, where X is the matrix that includes the control variables in the model.
W is a weights matrix. The disturbances for each equation in the model are computed
using the estimates of α

, β

, λ

, γ
from the first step. In the second step, the estimates
of the disturbances are used to estimate the autoregressive parameter ρ for each
equation, using Kelejian and Prucha (2004) generalized moments procedure. In the
third step, a Cochran–Orcutt-type transformation is performed, using the estimates for
ρ from the second step to account for the spatial autocorrelation in the disturbances.
The GS2SLS estimates of [β

, λ

, γ
] are then obtained by estimating the transformed
model using a subset of the linearly independent columns of [X,WX,W2X] as the
instrument matrix.
Although the GS2SLS takes the potential spatial correlation into account, it does
not utilize the information available across equations because it does not account for
the potential cross equation correlation in the innovation vectors (εem
i t , εmh
i t ). The correlation
coefficient between the residuals of the GS2SLS (εem
i t and εmh
i t ) is given in
Table 2. The full system information is utilized by stacking the Cochran–Orcutt-type
transformed equations (from the second step) in order to jointly estimate them. Thus,
in the fourth step, theGS3SLS estimates of the betas, lambdas, and gammas [β

, λ

, γ
]
are obtained by estimating this stacked model. The GS3SLS estimator is more efficient
than theGS2SLS estimator. Further, consistent estimates of the covariance matrix
are used to obtain the Feasible Generalized Three-Stage Least Squares (FGS3SLS)
estimators of α

, β

, λ

, γ
.
123
Analysis of county employment and income growth in Appalachia 35
Table 2 Correlation matrix of
the residuals from generalized
spatial two-stage least squares
(GS2SLS) estimation of the
model
Equation 1 Equation 2
Equation 1 1.0000
Equation 2 −0.3974 1.0000
5 Discussion and analysis of results
The GS2SLS and GS3SLS parameter estimates of the system represented by
Eqs. (4a, b) are reported in Table 3. These values are consistent with theoretical
expectations and with the results of many other cross-sectional empirical studies (Boarnet
1994; Deller et al. 2001; Henry et al. 1997). The coefficients of the endogenous
variables (EMPR and MHYR) are positive and statistically significant, indicating
strong interdependence between employment and median household income growth
rates. This interdependence is consistent with economic theory and empirical results.
Increases in the demand for goods and services that result from increases in family
median or per capita income are associated with increases in employment (Armington
and Acs 2002), which create opportunities for even more people to work and earn
income. However, the effect of median household income growth on employment
growth is stronger than that of employment growth on median household income
growth.
In the business employment (EMPR) equation, fifteen of the coefficient estimates
are significantly different from zero at the 10% level or better. The results suggest a
positive and significant parameter estimate for the spatial autoregressive lag variable
(WEMPR). This indicates that employment growth tends to spill over to neighboring
counties. The results also show a negative coefficient for (WEMPR) in the (MHYR)
equation, indicating that employment growth rates in neighboring counties tend to
unfavorably affect median household income growth rates (MHYR) in a given county.
These estimates are important for policy because they indicate that employment growth
in neighboring counties has positive and negative spillover effects on a given county’s
EMPR and MHYR, respectively. Furthermore, the significant spatial lag effects indicate
that EMPR not only depends on characteristics within the county, but also on
those of its neighbors. Hence, spatial effects should be tested empirically involving
employment growth rates and household income growth rates. Our model specification
incorporates a spatially autoregressive spatial process besides the spatial lag in the
dependent variables. The negative estimate for ρ1 (see Table 3) indicates that random
shocks to EMPR do not only affect the county where the shocks originated and its
neighbors, but also create negative shock waves across Appalachia.
To control for agglomeration effects, the model includes population statistics, such
as the initial county population size (POPs) and the percentage of population between
25 and 44 years old (POP25_44). The result shows that both POPs and POP25_44
have positive and significant effects on EMPR, even after accounting for potential spatial
spillover effects. This result is consistent with the literature (Acs and Armington
2004) which indicates that a growing population increases the demand for consumer
goods and services as well as the pool of potential entrepreneurs which encourage
business formation. This result is important from a policy perspective. It indicates
123
36 G. H. Gebremariam et al.
Table 3 Generalized spatial 2SLS (GS2SLS) and full information generalized spatial 3SLS (GS3SLS)
estimation results
Variables GS2SLS GS3SLS
EMPR Equation MHYR Equation EMPR Equation MHYR Equation
Coefficient t-statistic Coefficient t-statistic Coefficient t-statistic Coefficient t-statistic
Constant −7.5180∗∗∗ −4.07 7.7602∗∗∗ 3.95 −8.53228∗∗∗ −5.01698 8.6547∗∗∗ 4.714
EMPR 0.2825 1.66 0.6156∗∗∗ 4.0457
MHYR 0.1685 1.59 0.3735∗∗∗ 3.8956
WEMPR 0.2492∗ 1.94 −0.1423 −0.98 0.2792∗∗ 2.2949 −0.2694∗ −1.7
WMHYR 0.1657 1.44 −0.0559 −0.43 0.1147 1.1999 −0.1063 −0.8495
POPs 0.8367∗∗∗ 4.32 0.0877 0.78 0.7724∗∗∗ 4.3572 −0.0299 −0.2807
POPd −0.0101 −0.3 −0.0123 −0.4054
POP5-17 −0.1566 −0.9 −0.1072 −0.6642
POP25-44 0.2806 1.48 0.3093∗ 1.807
POP > 65 0.1046 0.98 0.1576 1.6024
FHHF −0.0031 −0.03 −0.0034 −0.3856
POPHD −0.1589 −1.03 −0.2439 −1.15 −0.1487 −1.0167 −0.1556 −0.7667
POPCD 0.0561 1 −0.0989 −1.35 0.0789 1.4827 −0.1147 −1.6361
OWHU −0.4079∗ −1.77 −0.368∗ −1.76
MHV −0.0309 −0.32 0.0955 0.76 −0.0483 −0.5198 0.0763 0.6308
UNEMP −0.0825∗∗ −2.05 0.0442 0.79 −0.079∗∗ −2.0599 0.0706 1.3197
AGFF −0.0055 −1.11 0.0025 0.38 −0.006 −1.2612 0.0032 0.5017
MANU 0.0856∗∗ 2.65 −0.0008 −0.02 0.0772∗∗ 2.5484 −0.0324 −0.8124
WHRT 0.3734∗∗∗ 4.5 −0.0727 −0.65 0.3719∗∗∗ 4.7178 −0.1916∗ −1.8012
FIRE 0.0177 0.39 −0.0471 −0.86 0.0282 0.6542 −0.0616 −1.168
HLTH −0.0079 −0.2 0.0297 0.56 −0.0157 −0.4067 0.0277 0.5475
NAIX 0.0072 0.72 −0.0063 −0.47 0.0062 0.645 −0.0064 −0.4944
ESBd 0.7049∗∗∗ 3.82 0.0242 0.27 0.6574∗∗∗ 3.9138 −0.0495 −0.5689
EFIR −1.05216D-08 −0.09 −1.16242D-08 −0.1113
CSBD 0.0406 1.14 0.0304 0.9565
DFEG 0.0002 0.01 −0.0071 −0.1973
FGCE 0.0001 0.6 4.78E-05 0.5158
PCTAX −0.0706 −1.25 −0.062 −1.2314
PCPTAX 0.0108 0.26 0.01095 0.2924
SCIX 0.0439∗ 1.7 0.046∗ 1.974
HWD −0.002 −0.04 −0.0062 −0.1303
ESBs 0.5536∗∗ 2.87 0.5345∗∗∗ 3.0658
AWSR 0.0912 0.94 0.0822 0.9521
EMP −0.8647∗∗∗ −4.7 −0.0223 −0.28 −0.8151∗∗∗ −4.8863 0.0941 1.2818
INM 0.1122 1.38 −0.1245 −1.25 0.1424∗ 1.8427 −0.1792∗ −1.8725
OTM −0.1382 −1.65 0.0693 0.65 −0.1401∗ −1.7571 0.1248 1.215
MHY 0.2334 1.32 −0.7671∗∗∗ −4.35 0.3636∗∗ 2.2161 −0.7976∗∗∗ −4.7331
GEX 0.0608 1.33 0.0684 1.24 0.04105 0.9472 0.0477 0.8971
Rho (ρ) −0.0428 0.1913 −0.0428 0.1913
123
Analysis of county employment and income growth in Appalachia 37
Table 3 continued
Variables GS2SLS GS3SLS
EMPR Equation MHYR Equation EMPR Equation MHYR Equation
Coefficient t-statistic Coefficient t-statistic Coefficient t-statistic Coefficient t-statistic
nR2∼χ2
(30,36)a 46.4608 0.02807b 39.1464 0.3305b 46.4608 0.02807b 39.1464 0.3305b
Moran I −0.2058 −5.0284c 0.1336 3.0753c −0.2058 −5.0284c 0.1336 3.0753c
Eta (η) 0.8647 0.7671 0.8151 0.7976
Half-Life (years) 8.47 8.65 8.47 8.65
PE test log log log log
n 417 417 417 417
*, **, and *** denote statistical significance level at the 10, 5, and 1%, respectively
a 30, 36 represent the degree of freedoms which are equal to the over-identifying restrictions in the EMPR, MHYR
equations, respectively
b p-values
c Z-values for Moran I
that counties with high population concentration are benefiting from the resulting
agglomerative and spillover effects that lead to localization of economic activities, in
line with Krugman (1991a,b) argument on regional spillover effects.
The county unemployment rate (UNEMP) is included among the exogenous variables
to measure local economic distress. The results suggest that a high unemployment
rate is associated with low business growth. This indicates that the poor economic
environment in Appalachia did not provide incentives for individuals to form new
businesses that employ not only the owner, but others. Unemployed individuals may
not have the capital to start a business. Furthermore, a high level of unemployment is
indicative of a relatively low aggregate demand, which also discourages new firm formation.
This result is consistent with the findings of Acs and Armington (2004). They
found that unemployment is negatively associated with new firm formation during
economic growth periods and positively during economic recession periods.
The coefficient of the variable representing the percentage of homes that are owned
by their own occupants (OWHU) is negative and statistically significant at the 10%
level. This result indicates that high home ownership is negatively associated with
business formation in Appalachia. This is contrary to the expectation that high home
ownership signals the availability of household assets and is therefore an indicator of
the capacity to finance new businesses by potential entrepreneurs, either by using the
house as collateral for loan or as indication of availability of other personal financial
resources. The result, however, shows that in Appalachia during the study period home
ownership was positively correlated with level of economic distress (Pollard 2003),
and home ownershipwas higher in distressed counties (76%), and lowest in attainment
counties (69%). Homeownership was also higher in central Appalachia (76%) than
in the more developed northern or southern sub-regions; and Appalachia non-metro
areas had higher ownership rates (76%) than its metro areas (72%). Thus, the result
indicates that home ownership is not a good indicator of the availability of resources
to start new business, at least in Appalachia.
123
38 G. H. Gebremariam et al.
The coefficients for MANU and WHRT are positive and significant at the 5 and 1%
levels, respectively. These results indicate that counties with a higher initial percentage
of their labor force employed in manufacturing and the wholesale and retail trade
showed higher growth rates in business than other counties.
The percentage of people employed in manufacturing (MANU) and the percentage
of people employed in wholesale and retail trade (WHRT) are included in the
EMPR equation to control for the influence of sectoral employment concentration on
the overall employment growth rate. The coefficient on MANU is positive and statistically
significant at the 5% level, indicating a direct relationship between growths
in overall employment and manufacturing employment at the beginning of the
periods. The coefficient on WHRT is also positive and significant at the 1% level,
indicating the positive role played by the service sector in expanding employment in
Appalachia during the study period. Thus, these results tend to suggest that
Appalachian counties that had a higher proportion of their labor force employed in
manufacturing and whole sale and retail trade at the beginning the periods experienced
higher growth rates in overall employment. This seems realistic since Appalachia has
experienced a shift from resource-based economic activities to manufacturing and,
particularly, to services. The coefficient on WHRT is higher and even more significant
than the coefficient on MANU in the EMPR equation, indicating that the contribution
of WHRT to overall employment growth was higher and more sustained than that of
MANU.
Establishment density (ESBd), defined as the total number of private sector establishments
in the county divided by the county’s population, is included in the model
to capture the degree of competition among firms and the concentration of businesses
relative to the population density. The average size of establishment (ESBs), defined
as total private sector employment divided by the number of private establishments
in the county, is also included to capture the effects of barriers to entry of new small
firms on employment growth. The coefficient for ESBd is positive and statistically
significant at the 1% level, indicating that the Appalachian region is far below the
threshold where competition among firms for consumer demands crowds businesses.
According to the results, a high ESBd is associated with growth in employment (business
growth), indicating that firms tend to locate near each other, possibly due to
localization and agglomeration economies of scale. The coefficient for ESBs is also
positive and significant indicating the existence of low barriers to new firm formation
and employment generation in Appalachia during the study period.
The results indicate that the county employment level is dependent on gross
in-migration, gross out-migration, and median household income. The coefficient for
INM, for example, is positive and significant at the 5% level. The coefficient for OTM
is negative and statistically significant at the 1% level. These are consistent with theoretical
expectations and empirical findings (Borts and Stein 1964). In-migration tends
to shift both the labor supply and labor demand curve right-wards, and out-migration
tends to lead to leftward shift of the curves. Thus, in-migration leads to increases in
employment, whereas out-migration leads to decreases in employment. A growing
population increases the demand for consumer goods and services and is positively
related to business formation (Acs and Armington 2004).
123
Analysis of county employment and income growth in Appalachia 39
Consistent with theoretical expectations and empirical findings, the coefficient for
MHY is positive and statistically significant at the 5% level. Increases in the demand
for goods and services that result from increases in family median or per capita income
are associated with increases in employment (Armington and Acs 2002).
An interesting observation from the empirical results pertains to the role of local
government in employment growth. The model predicts that local governments,
through their spending and taxation functions, play critical roles in creating and
enabling economic environments for businesses to prosper. The empirical results, however,
indicate that local governments have not played significant roles in employment
growth in Appalachia. Given the economic hardship and high level of underdevelopment
in Appalachia, these results are indications that local governments may need to
reassess or step up their efforts to create incentives for employment growth in this
region.
The elasticity of EMPR with respect to the initial employment level (EMP) is negative
and statistically significant, indicating convergence in the sense that counties with
low levels of employment at the beginning of the period (1990) tend to show a higher
rate of business growth than counties with high initial levels of employment, conditional
on the other explanatory variables. This result is consistent with prior studies on
rural renaissance (Deller et al. 2001; Lundberg 2003). The speed of adjustment, ηem,
is calculated at 0.8151, which indicates that just over 81% of the equilibrium rate of
growth in the employment rate of growth was realized during the period 1990–2000.
That is 8.151% annually, giving a half-life time of 8.47 years.
The parameter estimates for the MHYR equation also shows a positive estimate
for ρ2. This indicates that random shocks into the system with respect to MHYR not
only affect the county where the shocks originate and its neighbors, but create positive
spillover effects across Appalachia. The elasticity of EMPR with respect to the initial
median household income (MHY) is negative and statistically significant, indicating
convergence in the sense that counties with low median household incomes at the
beginning of the period (1990) tend to show higher rates of growth of median household
incomes than counties with high initial median household incomes, everything
else being equal. The speed of adjustment, ηmh, is calculated at 0.7976, which indicates
that about 80% of the equilibrium rate of growth in themedian household income
growth ratewas realized during the period 1990–2000. That is 7.976% annually, giving
a half-life time of 8.65 years. This result is comparable to the speed of convergence
estimates obtained by Higgins et al. (2006) and Young et al. (2008).
The effect of out-migration on the growth rate of median household income is negative
and statistically significant. If migrants’ endowments of human capital in the
form of education, accumulated skills, or entrepreneurial talents are higher compared
to the sending population, then the loss of their skills, inventiveness and innovativeness
would contribute to a decline in local productivity. Migrants may also own physical
and financial capital that they may take with them leading to a loss in investment in the
sending county. Moreover, out-migrants may contribute to a decline in the growth of
markets and scale and agglomerations economies in the sending county. Such demand
effects are the sources of loss in the growth of per capita personal incomes.
The coefficient for the index of social capital (SCIX) is positive and significant,
suggesting that high levels of social capital increase the wellbeing of a county. The
123
40 G. H. Gebremariam et al.
coefficients for the proportion of school age population (POP5-17), the proportion of
the population above 65 years old (POP > 65), and the proportion of female headed
households (FHHF) are negative, positive, and negative, respectively, as expected.
Counties with high proportions of POP5-17 and FHHF tend to have low levels of
median household incomes, whereas counties with a high proportion of POP > 65
tend to have high levels of MHY. These results are consistent with empirical results
of previous studies.
6 Conclusions
Themain objective of this studywas to test the hypotheses that (1) employment growth
and median household income growth are interdependent and jointly determined by
regional variables; (2) employment and median household income growth in a county
are conditional upon initial conditions of the county; and (3) employment and median
household income growth in a county are conditional upon employment and median
household income growth in neighboring counties. To test these hypotheses, a spatial
simultaneous equations model was developed. GS2SLS and GS3SLS coefficients of
the parameters were obtained by estimating the model using data covering the 417
Appalachian counties for the 1990–2000 period. The empirical results of the study
support the three hypotheses. In particular, the employment growth rate in one county
is positively affected by the employment growth rate and themedian household income
growth rate in neighboring counties, and the median household income growth rate in
one county is negatively affected by employment growth rate and median household
income growth rate in neighboring counties.
A policy implication of the finding is that counties may be more successful in
creating environments (business climate) to make themselves attractive to firms if
several neighboring counties pool their resources. The results also indicate the presence
of spatial correlation in the error terms, which implies that a random shock into
the system spreads across the region. The results further indicate convergence across
counties in Appalachia with respect to employment growth and median household
income growth rates, conditional upon the initial conditions of the explanatory variables
in the model. This information indicates that the divergence in the economic
status among Appalachian counties is narrowing and could mean that the efforts of
the Appalachian Regional Commission are showing results.
The empirical results indicate the presence of significant agglomerative effects:
counties with higher population concentrations showed significant business growth.
Combined with the findings of spillover effects, this might justify favoring focusing
investments in areas capable of generating agglomeration effects.
The study also produces useful information concerning the creation of new or the
expansion of existing businesses in Appalachia. Establishment density, which captures
the degree of competition among firms and crowding of businesses relative
to the population, indicates that Appalachia is below the threshold where competition
among firms for consumer demands crowds businesses. In addition, the results
indicate low barriers to new firm formation and employment generation during the
study period.
123
Analysis of county employment and income growth in Appalachia 41
While incorporating spatial interdependencies adds to the model’s computational
complexities, the returns are not only improved estimates, but the analysis also yields
information about spatial relationships that would not otherwise be available. For the
study period, this research suggests that a growth pole approach that spatially concentrates
scarce policy investments could benefit the region. Such insight requires
a spatially explicit model otherwise they are based on guesswork and intuition. Of
course, given the short time period of our analysis, additional research is needed to
determine if this result is stable over time or changes with the business cycle.
In general, this study confirms the importance of spatial effects in regional development.
The empirical results indicate the presence of spatial correlation in the error
terms and of spatial autoregressive lag. Failure to account for spatial interaction effects
results in less efficient and consistent estimates, as well as loss of insight.
Acknowledgments This research was partially funded by the West Virginia Agricultural and Forestry
Experiment Station. We acknowledge helpful comments by Dale Colyer and two referees. We thank
Anil Rupasingha, Stephan Goetz and David Freshwater for allowing the use of their Social Capital Index
data set for US counties. The usual caveat applies.
Appendix A: Derivation of the reduced form of the model
Let the system given in (4a, b) be written as:
Y = YB + XΓ + WYΛ + U. (I)
U = WUC + E and
Y = ( y1, . . . , yG) X = (x1, . . . , xK ) U = (u1, . . . , uG)
WU = (Wu1, . . . ,WuG) , C = diagGj
=1 ρj , E = (ε1,…, εG)
where y j is the n by 1 vector of cross sectional observations on the dependent variable
in the j th equation, xl is an n by 1 vector of cross sectional observations on the j th
exogenous variable, u j is an n by 1 vector of error terms in the j th equation, and B
and Γ are correspondingly defined parameter matrices of dimension G by G and K
by G, respectively. B is a diagonal matrix. Λ is G by G matrix of parameter estimates
of the spatial lag variables. It not diagonal and hence each equation includes spatial
cross-regressive lag variable in addition to its own spatial lag. Hence the model has
the same structure as that in Kelejian and Prucha (2004).
Note that ρj denotes the spatial autoregressive parameter in the j th equation and
since C is taken to be diagonal, the specification relates the disturbance vector in
the j th equation only to its own spatial lag. Since it is assumed that E(ε) = 0 and
E(εε

) = Σ ⊗ In, the disturbances, however, will be spatially correlated across units
and across equations.
The system in Eq. (I) can be expressed in a form where its solution for the endogenous
variables is clearly revealed. But, first consider the following vector transformations:
123
42 G. H. Gebremariam et al.
vec(Y) = vec(YB) + vec(XΓ ) + vec(WYΛ) + vec(U)
vec(Y) = vec(YB) + vec(XΓ ) + vec(WYΛ) + vec(UWC + E)
= B ⊗ I vec(Y) + Γ
⊗ I vec(X) + Λ
⊗ W vec(Y)
+ C ⊗ W vecU + vecE
Letting y = vec(Y), x = vec(X), u = vec(U), and ε = vec(E), it follows from
Eq. (I) that:
y = B ⊗ I y + Γ
⊗ I x + C ⊗ W u + ε
or
y = B ⊗ I y + Γ
⊗ I x + u,
u = C ⊗ W u + ε
(II)
Let B∗ = [(B ⊗ I) + (Λ
⊗ W)], Γ
∗ = (Γ
⊗ I ) and C∗ = C ⊗ W = diagGj
=1
(ρ jW), then Eq. (II) can be written in more compact form as:
y = B∗ y + Γ
∗x + u,
u = C∗u + ε
(III)
Assuming that InG − B∗ and InG − C∗ are nonsingular matrices with |ρj | < 1, j =
1, . . . , G, the system in Eq. (III) can be expressed in its reduced form as:
y = InG − B∗
−1 Γ
∗x + u ,
u = InG − C∗
−1
ε
(IV)
Based on the results of our estimation, we found that InG − B∗ and InG − C∗
have full column ranks and |ρj | < 1, j = 1, 2. From this we can conclude that
the reduced form of the system [Eq. (IV)] is properly defined and there also exists
spatial multiplier working in the system.
References
Acs ZJ, Armington C (2003) Endogenous growth and entrepreneurial activity in cities. http://ideas.repec.
org/p/cen/wpaper/03-02.html. Accessed 8 December 2008
Acs ZJ, Armington C (2004) The impact of geographic differences in human capital on service firm formation
rates. J Urban Econ 56:244–278
Acs ZJ, Audretsch DB (1993) Introduction. In: Acs ZJ, Audretsch DB (eds) Small firms and entrepreneurship:
an east-west perspective. Cambridge University Press, Cambridge
Acs ZJ, Audretsch DB (2001) The Emergence of the Entrepreneurial Society. Present. for the accept. of the
Int. Award for Entrepr. and Small Bus. Res., Stockh, 3 May
Anselin L (2003) Spatial externalities, spatial multipliers and spatial econometrics. Int Reg Sci Rev
26(2):153–166
Anselin L (1988) Spatial econometrics: methods, and models. Kluwer, Dordrecht
Anselin L, Kelejian HH (1997) Testing for spatial error autocorrelation in the presence of endogenous
regressors. Int Reg Sci Rev 20(1&2):153–182
123
Analysis of county employment and income growth in Appalachia 43
Arbia G, Basile R, Piras G (2005) Using panel data in modelling regional growth and convergence. Reg.
Econ Appl Lab Work Pap No. 55. Univ. Ill, Urbana-Champaign
Armington C, Acs ZJ (2002) The determinants of regional variation in new firm formation. Reg Stud
36(1):33–45
Aronsson T, Lundberg J,WikstromM (2001) Regional income growth and net migration in Sweden 1970–
1995. Reg Stud 35(9):823–830
Audretsch DB, Fritsch M (1994) The geography of firm births in Germany. Reg Stud 28(4):359–365
Audretsch DB, Carree MA, van Stel AJ, Thurik AR (2000) Impeded Industrial Restructuring: The Growth
Penalty. Res Pap Cent for Adv Small Bus Econ Erasmus Univ., Rotterdam
Barkley DL, HenryMS, Bao S (1998) The role of local school quality and rural employment and population
growth. Rev Reg Stud 28(1):81–102
Barro RJ, Sala-i-Martin X (1992) Convergence. J Polit Econ 100:223–251
Barro RJ, Sala-i-Martin X (2004) Economic growth, 2nd edn. MIT Press, Cambridge
Black DA, Sanders SG (2004) Labor market performance, poverty, and income inequality in Appalachia.
http://www.arc.gov/images/reports/labormkt/labormkt.pdf. Accessed 8 December 2008
Boarnet MG (1994) An empirical model of intra-metropolitan population and employment growth. Pap
Reg Sci 73(2):135–153
Borts GH, Stein JL (1964) Economic growth in a free market. Columbia University Press, New York
Brock WA, Evans DS (1989) Small business economics. Small Bus Econ 1(1):7–20
CallejonM, SegarraA(2001) Geographical determinants of the creation ofmanufacturing firms: the regions
of Spain. http://www.ub.es/graap/pdfcallejon/RS01.pdf. Accessed 8 December 2008
Carlino OG, Mills ES (1987) The determinants of county growth. J Reg Sci 27(1):39–54
Carree MA, Thurik AR (1998) Small firms and economic growth in Europe. Atl Econ J 26(2):137–146
Carree MA, Thurik AR (1999) Industrial structure and economic growth. In: Audretsch DB, Thurik AR
(eds) Innovation, industry evolution and employment. Cambridge University Press, Cambridge
Clark D, Murphy CA (1996) Countywide employment and population growth: an analysis of the 1980s.
J Reg Sci 36(2):235–256
Davidson P, Lindmark L, Olofsson C (1994) New firm formation and regional development in Sweden.
Reg Stud 28(4):395–410
Deller SC, Tsai TH, Marcouiller DW, English DBK (2001) The role of amenities and quality of life in rural
economic growth. Am J Agric Econ 83(2):352–365
Duffy NE (1994) The determinants of state manufacturing growth rates: a two-digit-level analysis. J Reg
Sci 34(2):137–162
Duffy-Deno KT (1998) The effect of federal wilderness on county growth in the inter-mountain western
United States. J Reg Sci 38(1):109–136
Duffy-Deno KT, Eberts RW (1991) Public infrastructure and regional economic development: a simultaneous
equations approach. J Urban Econ 30(3):329–343
Edmiston KD (2004) The net effect of large plant locations and expansions on county employment. J Reg
Sci 44(2):289–319
Ekstrom B, Leistritz FL (1988) Rural community decline and revitalization: an annotated bibliography.
Garland Publ, New York
Ertur C, Le Gallo J, Baumont C (2006) The European regional convergence process, 1980–1995: do spatial
regimes and spatial dependence matter? Int Reg Sci Rev 29(1):3–34
Fotopoulos G, Spencer N (1999) Spatial variations in new manufacturing plant openings: some empirical
evidence from greece. Reg Stud 33(3):219–229
Fritsch M (1992) Regional differences in new firm formation: evidence from West Germany. Reg Stud
26(3):233–244
Fritsch M, Falck O (2003) New firm formation by industry over space and time: a multilevel analysis.
Discuss Pap German Inst for Econ Res Berlin
Garofoli G (1994) New firm formation and regional development: the Italian case. Reg Stud 28(4):
381–393
Glaeser EL, Scheinkman JA, Shleifer A (1995) Economic growth in a cross-section of cities. J Monet Econ
36(1):117–143
Greenwood MJ, Hunt GL (1984) Migration and interregional employment redistribution in the United
States. Am Econ Rev 74(5):957–969
Greenwood MJ, Hunt GL,McDowel JM (1986) Migration and employment change: empirical evidence on
spatial and temporal dimensions of the linkage. J Reg Sci 26(2):223–234
123
44 G. H. Gebremariam et al.
Guesnier B (1994) Regional variation in new firm formation in France. Reg Stud 28(4):347–358
Hamalainen K, Bockerman P (2004) Regional labor market dynamics, housing, and migration. J Reg Sci
44(3):543–568
Hausman J (1983) Specification and estimation of simultaneous equations models. In: Griliches Z,
Intriligator M (eds) Handbook of econometrics. North Holland, Amsterdam
Hart M, Gudgin G (1994) Spatial variations in new firm formation in the Republic of Ireland, 1980–1990.
Reg Stud 28(4):367–380
HenryMS, BarkleyDL, Bao S (1997) The hinterland’s stake inmetropolitan growth: evidence from selected
southern regions. J Reg Sci 37(3):479–501
HenryMS, Schmitt B, KristensenK, Barkley DL, Bao S (1999) Extending Carlino-Mills models to examine
urban size and growth impacts on proximate rural areas. Growth Change 30(4):526–548
Higgins MJ, Levy D, Young AT (2006) Growth and convergence across the US: evidence from county-level
data. Rev Econ Stat 88(4):671–681
IssermanAM (1993) State economic development policy and practice in the United States: a survey article.
Int Reg Sci Rev 16(1–2):49–100
Johnson P, Parker S (1996) Spatial variations in the determinants and effects of firm births and deaths. Reg
Stud 30(7):676–688
Kangasharju A (2000) Regional variations in firm formation: panel and cross-section data evidence from
Finland. Reg Sci 79(4):355–373
Keeble D, Walker S (1994) New firms, small firms and dead Firms: spatial pattern and determinants in the
United Kingdom. Reg Stud 28(4):411–427
Kelejian HH, Prucha IR (1998) A generalized two-stage least squares procedure for estimating a spatial
autoregressive model with spatial autoregressive disturbances. J Real Estate Finance Econ 17(1):99–
121
Kelejian HH, Prucha IR (1999) A generalized moments estimator for the autoregressive parameter in a
spatial model. Int Econ Rev 40(2):509–533
Kelejian HH, Prucha IR (2001) On the asymptotic distribution of the Moran I test statistic with applications.
J Econ 104(2):219–257
Kelejian HH, Prucha IR (2004) Estimation of simultaneous systems of spatially interrelated cross sectional
equations. J Econ 118(1):27–50
Kmenta J (1986) Elements of econometrics. Macmillan, New York
Krugman P (1991a) Increasing returns and economic geography. J Polit Econ 99(3):483–499
Krugman P (1991b) Geography and trade. MIT Press, Cambridge
Lewis DJ, Hunt GL, Plantinga AJ (2002) Does public land policy affect local wage growth. Growth Change
34(1):64–86
Loveman G, Sengenberger W (1991) The re-emergence of small-scale production: an international comparison.
Small Bus Econ 3(1):1–37
Lundberg J (2003) On the determinants of average income growth and net migration at the municipal level
in Sweden. Rev Reg Stud 32(2):229–253
MacDonald JF (1992) Assessing the development status of metropolitan areas. In: Mills ES,MacDonald JF
(eds) Sources of metropolitan growth. Cent. for Urban Policy Res, New Brunswick
Mackinnon JG, White H, Davidson R (1983) Tests for model specification in the presence of alternative
hypotheses: Some further results. J Econ 21(1):53–70
McGranahan DA (1999) Natural amenities drive rural population change. http://www.ers.usda.gov/
publications/aer781/aer781i.pdf. Accessed 9 December 2008
Mills ES, Price R (1984) Metropolitan suburbanization and central city problems. JUrban Econ 15(1):1–17
Persson J (1997) Convergence across the Swedish counties, 1911–1993. Eur Econ Rev 41(9):1834–1852
Pollard KM(2003) Appalachia at the millennium: an overview of the results from census 2000. Popul. Ref.
Bur., Washington
Pulver GC (1989) Developing a community perspective on rural economic development policy. J Community
Dev Soc 20(2):1–4
Rappaport J (1999) Local growth empirics. Cent for Int Dev Work Pap No. 23. Harvard University Press,
Cambridge
Rey SJ, BoarnetMG(2004)Ataxonomy of spatial econometric models for simultaneous equations systems.
In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and
applications. Springer, Berlin
123
Analysis of county employment and income growth in Appalachia 45
Reynolds PD (1994) Autonomous firm dynamics and economic growth in the United States, 1986–1990.
Reg Stud 28(4):429–442
Rupasingha A, Goetz SJ, FreshwaterD (2006) The production of social capital in US counties. J Socio-Econ
35(1):83–101
Steinnes DN, Fisher WD (1974) An econometric model of intra-urban location. J Reg Sci 14(1):65–80
US Census Bureau (2005) Mean travel time to work for workers 16 years and over who did not work
at home (Minutes): 2005. (2005 American Community Survey). http://factfinder.census.gov/servlet/
DatasetMainPageServlet?_ds_name=ACS_2005_EST_G00_&_lang=en&_ts=199031476495. Accessed
4 June 2007
US Small Business Administration (SBA) (1999) The state of small business: a report of the President. US
Gov Print Press, Washington
Wennekers S, Thurik AR (1999) Linking entrepreneurship and economic growth. Small Bus Econ
13(1):27–55
Young AT, Higgins MJ, Levy D (2008) Sigma convergence versus beta convergence: evidence from US
county-level data. J Money Credit Bank 40(5):1083–1093
123

Panel estimation of state-dependent adjustment

Empir Econ
DOI 10.1007/s00181-010-0419-y
Panel estimation of state-dependent adjustment
when the target is unobserved
Ulf von Kalckreuth
Received: 30 July 2009 / Accepted: 22 July 2010
© Springer-Verlag 2010
Abstract Understanding adjustment processes has become central in economics.
Empirical analysis is fraught with the problem that the target is usually unobserved.
This article develops and simulates GMM methods for estimating dynamic adjustment
models in a panel data context with partially or entirely unobserved targets and
endogenous, time-varying persistence. In this setup, the standard first differenceGMM
procedure fails. Four estimation strategies are proposed. Two of them are based on
quasi-differencing. The third is characterised by a state-dependent filter, while the last
is an adaptation of the GMM level estimator.
Keywords Dynamic panel data methods · Economic adjustment ·
GMM · Quasi-differencing · Non-linear estimation
JEL Classification C23 · C15 · D21
1 Introduction
New Keynesian economics, with its emphasis on real and financial frictions, has introduced
a focus on microeconomic adjustment dynamics into the empirical literature.
Adjustment dynamics are essential for understanding aggregate behaviour and its sensitivity
towards shocks. Important examples range from price adjustment and its sig-
This article was presented at the 2009 Panel Data Conference at the University of Bonn. It draws on Chap.
3 of the author’s habilitation thesis at the University of Mannheim.
The views expressed in this article do not necessarily reflect those of the Deutsche Bundesbank or its staff.
All the errors and omissions are those of the author.
U. von Kalckreuth (B)
Deutsche Bundesbank Research Centre, Wilhelm Epstein-Str. 14,
60431 Frankfurt am Main, Germany
e-mail: ulf.von-kalckreuth@bundesbank.de
123
U. von Kalckreuth
nificance for the New Keynesian Phillips curve (Woodford 2003), over plant level
adjustment and aggregate investment dynamics (Caballero et al. 1995; Caballero
and Engel 1999; Bayer 2006), to aggregate employment dynamics, building from
microeconomic evidence (Caballero et al. 1997). In these studies, as in von Kalckreuth
(2006), the adjustment dynamics itself becomes the principal object of analysis,
instead of being treated as an important, but burdensome obstacle to understanding
equilibrium phenomena.
In a rather general form, economic adjustment can be framed by a ‘gap equation’,
as formalised by Caballero et al. (1995):
yi,t = gi,t , xi,t · gi,t , where
gi,t = yi,t−1 − y∗
i,t
Here, subscripts refer to individual i at time t, and gi,t is the gap between the state
yi,t−1 inherited from the last period and the target y∗
i,t that would be realised if adjustment
costs were zero for one period of time. The speed of adjustment, which is written
as a function of the gap itself and additional state variables xi,t , determines the fraction
of the gap that is removed within one period of time. The adjustment function
will reflect convex or non-convex adjustment costs, irreversibility and indivisibilities,
financing constraints or other restrictions, and the uncertainty of expectations formation.
With quadratic adjustment costs or Calvo-type probabilistic adjustment, will
be a constant.1
Estimating the function is inherently difficult. In general, both y∗
i,t and gi,t will
not be observable. However, some measure of the gap is needed for any estimation,
and if explicitly depends on gi,t , this measure will move to the centre stage. In order
to address this issue, one may try to do the utmost to observe the target as exactly
as possible. The controversy between Cooper and Willis (2004) and Caballero and
Engel (2004) on interpreting the results of gap equation estimates bear testimony to
the problems that may result from imperfect measures of the gap. However, there is
an alternative. In linear dynamic panel estimation, the problem of unobserved targets
can successfully be addressed by positing an error component structure for the measurement
error and eliminating the individual fixed effect by a suitable transformation,
such as first differencing. See Bond et al. (2003) and Bond and Lombardi (2007) for
an error correction model of capital stock adjustment.
In the unrestricted, non-linear case, this approach is not feasible, as a host of
incidental parameters will preclude identification. However, there may be direct qualitative
information on the level of , e.g. from survey data, ratings or market information
services. If one is willing to treat the adjustment process as piecewise linear,
distinguishing regimes of adjustment, then, as will be shown, this information can be
harnessed to eliminate the incidental parameters from the problem completely.
1 Calvo-type adjustment refers to adjustment costs that are infinite with probability 1 − λ and zero with
probability λ. In other words: a randomly drawn share λ of market participants receives the chance to
adjust costlessly. As a modelling device, this assumption is ubiquitous in the monetary Dynamic General
Equilibrium literature. Sometimes this state-independent adjustment is playfully referred to as the working
of the ‘Calvo fairy’.
123
Panel estimation of state-dependent adjustment
Linear dynamic panel estimation was pioneered by Anderson and Hsiao (1982),
and it was developed and perfected by Holtz-Eakin et al. (1988); Arellano and Bond
(1991); Arellano and Bover (1995) and Blundell and Bond (1998). This article shows
how classic dynamic panel estimation methodology can be adapted for the analysis of
economic adjustment if the target is unobserved and the nonlinearity takes the form of
discrete regimes. This is not straightforward, as the unknown and time-varying adjustment
coefficient interacts with the equally unknown individual specific measurement
error. However, the reward is substantial: a well-known array of estimation procedures
and tests can be brought to bear on the investigation of economic adjustment.
The estimation methods presented here are geared to short panels that do not allow
a full direct identification of individual targets. The study was motivated by the problem
of characterising the speed of capital stock adjustment as depending on financing
constraints, in an environment where categorical information on the financing situation
is available; see von Kalckreuth (2008a) and von Kalckreuth (2008b).2 The
procedures allow addressing a number of important research questions, including the
state-dependence of pricing behaviour (is there a Calvo fairy?), the adjustment of
the financial structure of companies or banks after shocks, the asymmetry of factor
adjustment (downward rigidities, firing costs), or the implications of irreversibility.
Section 2 of this article characterises the stochastic process to be estimated.
A continuous scalar and a discrete regime vector are evolving jointly, and the
adjustment speed of the continuous-type variable depends on the regime. It is
shown that the standard procedure for estimating linear dynamic panel models
is not applicable. Section 3 assumes predetermined regimes and proposes two
estimators on the basis of quasi-differencing—one of them with the virtue of
great simplicity, the other being more efficient. Both are nonlinear, which may
lead to a small sample bias if in one of the regimes the adjustment speed is
almost zero. A Generalised Methods of Moments (GMM) estimator using statedependent
filtering is suggested, which is immune to this problem. Section 4 works
out sets of moment conditions that can be applied when the regimes are contemporaneously
correlated. Using a level estimator on an amplified equation, the assumption
of predetermined regimes can be dropped at the price of stricter prerequisites regarding
the fixed effect. Under the same conditions, a version involving first differences
is feasible, too. Section 5 compares the moment conditions and discusses their use.
Section 6 tests the proposed routines in a Monte Carlo study. Section 7 concludes.
Appendix A discusses error correction models with state-dependent dynamics, and
Appendix B contains the proofs.
2 A regime-specific adjustment process
A situation where a variable yi,t reverts to some target level y∗
i,twhich is characteristic
of individual i is examined. The speed of adjustment is state-dependent, following the
equation
2 The study successfully applies the estimator QD2, as exposed in Sect. 3 of this article.
123
U. von Kalckreuth
yi,t = − 1 − αi,t−1 yi,t−1 − y∗
i,t + εi,t , (1)
with
αi,t = α
ri,t .
The L-dimensional column vector α holds the state-dependent adjustment coefficients
relevant for each state. The adjustment coefficient αi,t = α
ri,t varies over time and
individuals, depending on the state ri,t, an L-dimensional column vector of regime
indicator variables, with one element taking a value of 1, and all others being zero. The
adjustment speed at date t is given by 1 − αi,t−1 . If the process is stable, it would
eventually settle in the target in the absence of shocks. The target level y∗
i,t is unobservable.
The panel dimension can help identify the adjustment process nonetheless, as it
allows an error component approach for modelling the unobserved target. An assumption
is made of the target to follow an equation that contains an individual-specific
latent term:
y∗
i,t
= x
i,tβ + μi .
The idiosyncratic componentμi in the adjustment equationmay reflect ameasurement
error or unobserved explanatory variables. The vector xi,t may encompass random
explanatory variables, deterministic time trends and also time dummies. In its absence,
the target level is entirely unobservable, but static. A generalized, error correction version
of the adjustment equation is discussed in the Appendix A.
Solving Eq. (1) for yi,t yields:
yi,t = αi,t−1 yi,t−1 + 1 − αi,t−1 x
i,tβ + 1 − αi,t−1 μi + εi,t
la te nt
. (2)
For later purposes, it is useful to state the backward solution to this stochastic difference
equation. For t ≥ 1 and a given starting value yi,0 it is
yi,t = yi,0 − x
i,1β − μi
t−1


τ=0
αi,τ + x
i,tβ + μi + Ai,t , (3)
with
Ai,t =
t−1

l=1
εi,l − x
i,l+1β
t−1


τ=l
αi,τ + εi,t . (4)
The solution has three components. The first term captures the influence of the initial
deviation. The second term is the target level at time t, x
i,tβ+μi . The third term, Ai,t ,
represents the effect of shocks and target changes, past and present. In the long run,
when the influence of the initial conditions has died out, Ai,t is equal to the deviation
from the target.
123
Panel estimation of state-dependent adjustment
In Eq. (2), both the individual effect and xi,t interact with a time-varying and endogenous
variable. This precludes the classical strategy for estimating linear dynamic
panel equations with fixed effects, namely to transform the equation by taking first
differences and use moment conditions involving higher lags of the dependent and
explanatory variables to accommodate for the fact that the transformed residual will
be correlated with the lagged endogenous variable. First differencing the Eq. (2) yields
yi,t = α

ri,t−1 yi,t−1 + (1 − α)

ri,t−1x
i,t β − α

ri,t−1μi + εi,t . (5)
Unlike the case of linear adjustment, the expression containing the unobserved μi is
not differenced out, and we have to deal with a time-varying error component that
is correlated with the explanatory variables. The following sections are devoted to
finding moment restrictions that make estimation feasible. The last set of restrictions
that will be discussed actually involves an amplified version of Eq. (5).
3 Predetermined regimes
In most applications, it will not be possible to treat ri,t as fully exogenous. If, for
example, εi,t is the error term in a capital accumulation equation and ri,t is a regime
indicating the degree of financing constraints, then the two variables should be correlated.
This section examines the case when the regime indicator, ri,t−1, can at least
be considered as predetermined with respect to the contemporaneous error term, εi,t .
Let us start by assuming the error term to be a martingale difference sequence:
E εi,t i,t−1 = 0, with
i,t−1 =
ri,t−1, ri,t−2, . . . , xi,t−1, xi,t−2, . . . , εi,t−1, εi,t−2, . . . , μi , y0i . (6)
Accommodation of the more general assumption
E εi,t


i,t−k = 0, k ≥ 1, with


i,t−k
=
ri,t−1, ri,t−2, . . . , xi,t−k , xi,t−k−1, . . . , εi,t−k, εi,t−k, . . . , μi , y0i , (7)
is straightforward. Note that

i,t−k in assumption (7) is not simply a lagged version
of i,t−1, as the generalisation maintains the assumption of a predetermined ri,t−1.
The case of contemporaneously correlated regime indicators will be treated in Sect. 4.
3.1 Two moment conditions based on quasi-differencing
This subsection discusses two nonlinear transformations of the adjustment equation
that serve to eliminate the unobserved heterogeneity. Holtz-Eakin et al. (1988)
proposed quasi-differencing as a strategy in a case where fixed effects are subject
to time-varying shocks that arecommonacross individuals.3 It is nowexplored whether
3 See also Chamberlain (1983), pp. 1263–1264.
123
U. von Kalckreuth
this method can be generalised to themore complicated case at hand, where adjustment
coefficients are endogenous and vary over time and individuals.
Applied to the problem at hand, the quasi-differencing procedure as proposed by
Holtz-Eakin et al. (1988) would involve lagging Eq. (2), multiplying both sides by
1 − αi,t−1 / 1 − αi,t−2 and subtracting the result from Eq. (2). After reordering
coefficients, this gives
yi,t−1 − αi,t−1
1 − αi,t−2
αi,t−2 yi,t−1 − 1 − αi,t−1 x
i,tβ=εi,t − 1 − αi,t−1
1 − αi,t−2
εi,t−1.
(8)
The unobserved heterogeneity has duly been eliminated, but the error structure is difficult
to deal with, because αi,t−1 will in general be correlated with εi,t−1 and αi,t−2.
The underlying idea nonetheless leads to useful moment conditions, actually in two
different ways. First, dividing Eq. (8) by 1 − αi,t−1 gives
1
1 − αi,t−1
yi,t − αi,t−2
1 − αi,t−2
yi,t−1 − x
i,tβ = ψi,t ,
with ψi,t = εi,t
1 − αi,t−1
− εi,t−1
1 − αi,t−2
. (9)
This transformation—which shall be referred to as ‘QD1’—corresponds to solving
Eq. (1) for the deviation from the target, yi,t−1 − x
i,tβ − μi , and then solving the
lagged version of (1) for the past deviation from the target, yi,t−2 −x
i,t−1β −μi , and
finally differencing μi out. On the basis of Eq. (9), moment conditions for parameter
estimation can be formulated.
Second, we may multiply Eq. (9) by 1 − αi,t−2, to obtain
1 − αi,t−2
1 − αi,t−1
yi,t − αi,t−2 yi,t−1 − 1 − αi,t−2 x
i,tβ = ξi,t , (10)
with ξi,t = 1 − αi,t−2
1 − αi,t−1
εi,t − εi,t−1. (11)
This transformation shall be labelled ‘QD2’. It corresponds to multiplying Eq. (1) by
1 − αi,t−2 / 1 − αi,t−1 and subtracting the lag of the original adjustment equation.
Proposition 1 Under assumption (6) assuming the absence of serial correlation in
the error term, the levels yi,t−p, p ≥ 2, are instruments in Eqs. (9) and (10):
E yi,t−pψi,t = 0, (12)
E yi,t−pξi,t = 0. (13)
Proof See Appendix B.
Likewise, it can be shown that xi,t−p and the regime indicators ri,t−p, p ≥ 2,
are instruments in the Eqs. (9) and (10). If assumption (6) of no serial correlation is
123
Panel estimation of state-dependent adjustment
replaced by (7), then the set of instruments is pushed backwards in time accordingly:
The lags yi,t−k−p and xi,t−k−p, p ≥ 1 are instruments in the Eqs. (9) and (10). Note
that the regime indicator ri,t−1 is still assumed to be predetermined with respect to
εi,t ; thus, all lags ri,t−p, p ≥ 2 are instruments irrespective of k.
To discuss estimation on the basis of the two sets of moment conditions, it is useful,
however, to restate the transformations (9) and (10). Equation (9) has the convenient
feature that x
i,tβ enters additively. Collecting terms, one obtains
ψi,t = yi,t−1 + 1
1 − αi,t−1
yi,t − 1
1 − αi,t−2
yi,t−1
− x
i,tβ
= yi,t−1 + γ
ri,t−1 yi,t − x
i,tβ
= yi,t−1 + γ

ri,t−1 yi,t − x
i,tβ (14)
with γ
= 1
1−α1
. . . 1
1−αL
. (15)
Equation (14) is linear in the coefficient vectors γ and β, and can be estimated by
linear GMM using the moment conditions (12) of Proposition 1. The structural coefficients
α are related to the elements of γ by the nonlinear one-to-one transformation
(15). Inverting this transformation, therefore, gives a nonlinear GMM estimator of α.
Standard deviations and co-variances can be assessed using the delta method.
Making use of QD2 for GMM estimation is trickier. Let d ri,t−2, ri,t−1 be an
L2 × 1 indicator vector, where each element is a dummy variable indicating one of
the possible combinations of ri,t−2 and ri,t−1. Let λ be the vector of coefficients
1 − αi,t−2 / 1 − αi,t−1 corresponding to the elements of d (·):
λ
= 1 1−α1
1−α2
1−α1
1−α3
· · · 1−αL
1−αL−2
1−αL
1−αL−1
1 .
Let furthermore δ be a vector of products of the adjustment coefficients and β:
δ = (1 − α) ⊗ β =

⎜⎜⎜⎝
(1 − α1) β
(1 − α2) β
...
(1 − αL ) β

⎟⎟⎟⎠
.
Finally, let
h (α, β) =


λ
−α
−δ


(16)
123
U. von Kalckreuth
be an L (L + 1 + K) × 1 vector of reduced form coefficients, of which L (L + K)
are unknown. This results in
ξi,t = λ
d ri,t−2, ri,t−1 yi,t − α
ri,t−2 yi,t−1 − δ
ri,t−2 xi,t
= d ri,t−2, ri,t−1

yi,t r
i,t−1 yi,t−1 r
i,t−1 xi,t h (α, β) . (17)
In this case, there is no convenient one-to-one transformation from the elements
of h (α, β) to the underlying structural parameters. The nonlinearity of the problem
therefore has to be treated explicitly. Consider the simplest case, with two states and
no explanatory variables xi,t . Then λ and α have two elements each and one can write
π
= h (α)
= 1 1−α1
1−α2
1−α2
1−α1
1 −α1 −α2 .
Though nonlinear in the parameters, this equation is linear in the transformed variables.
This makes it easy to apply the Gauss–Newton method for solving the optimisation
problem inherent in GMM estimation. The Gauss–Newton method iterates
on a linearised moment function, sequentially improving the estimation. Calculating
pseudo-observations for each step, the estimation problem can be solved using routines
for the estimation of linear econometric models.4 As initial values for the iteration,
one can use the results from QD1 estimation exposed earlier in this section.
The transformations QD1 and QD2 are nonlinear, and the stochastic properties of
the transformed residuals depend on the adjustment parameters. Consider the transformed
residuals ψi,t = εi,t /(1 − αi,t−1) − εi,t−1/(1 − αi,t−2) on the one hand and
ξi,t = (1 − αi,t−2)/(1 − αi,t−1)εi,t − εi,t−1 on the other. The variance of ψi,t , will
become large if one or both alpha-coefficients are in the neighbourhood of 1, creating
problems in small samples. An adjustment coefficient approaching 1 will affect
the transformed error term of QD2, ξi,t , to a lesser degree. First, only one of the two
components of the difference is affected. Second, the effect is mitigated by the denominator,
1 − αi,t−2. Indeed, if the alpha coefficients in different regimes are of similar
size, the random factor will stay in the neighbourhood of 1. Therefore, when the alpha
coefficients are high (i.e. adjustment speed is low), considerable efficiency gains can
be expected from using QD2. This will be investigated in a simulation study in Sect. 6.
3.2 Generalised Differencing
As has been exposed above, the nonlinear transformations QD1 and QD2 may lead
to poor results if in one or more of the regimes the adjustment speed is very low.
The transformations cannot be used at all if one of the regimes is characterised by an
adjustment speed of exactly zero. This is a case of considerable theoretical interest,
4 The Gauss–Newton method has originally been developed for nonlinear least squares problems. See
Davidson and MacKinnon (1993) on the use of Gauss–Newton in nonlinear least squares and instrumental
variables estimation, Hayashi (2000), on GMM estimation, and Judge et al. (1985) on numerical methods
in maximisation. An unpublished appendix on the use of Gauss–Newton in the current context is available
from the author upon request.
123
Panel estimation of state-dependent adjustment
as the presence of fixed adjustment costs or irreversibility leads to bands around the
target where no adjustment takes place—the solution to the stochastic control problem
triggers adjustment when some threshold level is surpassed. Threshold behaviour
should be expected for decisions on single projects, not for firms or sectors, where
many such projects are aggregated. However, for small units it is certainly useful to
explicitly consider regimes of no adjustment, as have done Caballero et al. (1995) in
the context of plant level investment.
Therefore, it is worth asking whether there is a transformation that eliminates the
fixed effect in the target equation without affecting the size of the idiosyncratic errors.
It turns out that there is such a transformation, provided that the regime indicator has
limited memory with respect to εi,t . Consider again the first-differenced adjustment
Eq. (5) above:
yi,t = α

ri,t−1 yi,t−1 + (1 − α)

ri,t−1x
i,t β − α

ri,t−1μi + εi,t .
Whenever ri,t−1 = ri,t−2, this simplifies to
yi,t = α
ri,t−1 yi,t−1 + (1 − α)
ri,t−1 x
i,tβ + εi,t .
This expression looks very much like the first difference in the linear case, although
there is more than one adjustment coefficient to estimate. It is only taking first differences
of observations that belong to different regimes which leads to a latent term
−α

ri,t−1μi that will be correlated with the lagged dependent variable under a variety
of circumstances.
As it is this term that precludes the use of the standard technique, the following strategy
comes to mind: Differences are only formed for observations with ri,t−2 = ri,t−1.
The first element of α1 is estimated on the basis of cases where two consecutive observations
belong to the first regime, and using differences of observations that both
belong to the second regime leads to inference on the second adjustment coefficient,
etc. In this straight fashion, however, the idea will not work. If ri,t−1 and εi,t−1 are
correlated and groups of observations are formed according to regimes, then the transformed
residual εi,t will have a (conditional) expectation different from zero in those
groups. This will lead to biased estimators.
Under certain additional assumptions, however, a straightforwardmodification will
yield useful moment conditions:
1. Let q be the maximum τ for which there is a correlation between ri,t and εi,t−τ ,
e.g. as a consequence of a moving average structure of the state variable driving
the regime indicator. Then the observation is to be transformed subtracting past
observations of the same regime with a lag of at least = q + 2.
2. If an observation is not matched by a 2 + q-lag in the same regime, then it may
be transformed using a higher lag > q + 2.
The first part of the rule proposes a dynamic filter, which varies according to regimes.
The second avoids the loss of many observations in cases where regimes in t and t +q
do not match.
123
U. von Kalckreuth
The th difference is
yi,t − yi,t− = α
ri,t−1 yi,t−1 − ri,t− −1 yi,t− −1
+(1 − α)
ri,t−1x
i,t
− ri,t− −1x
i,t− β
−α
ri,t−1 − ri,t− −1 μi + εi,t − εi,t− ,
which simplifies to
yi,t − yi,t− = α
ri,t−1 yi,t−1 − yi,t− −1 + (1 − α)
ri,t−1 x
i,t
− x
i,t− β
+εi,t − εi,t− , (18)
if the two observations are characterised by the same regime, such that ri,t−1 =
ri,t− −1. When does the expectation of the residual term, εi,t − εi,t− , conditional
on ri,t−1 and the equality ri,t−1 = ri,t− −1, become zero? It is sufficient that εi,t and
εi,t− are both uncorrelated with the two conditioning variables ri,t−1 and ri,t− −1.
According to assumption (6), εi,t is uncorrelated with ri,t−1 and ri,t− −1. Then the
same is true with respect to εi,t− and ri,t− −1. Therefore, by choosing , it only
remains to make sure that εi,t− and ri,t−1 are uncorrelated. With = 1, this will
not be the case if εi,t and ri,t are contemporaneously correlated. However, if ri,t is
uncorrelated with all lags of εi,t , then = 2 will ensure that
E εi,t − εi,t− ri,t−1, ri,t−1 = ri,t− −1 = 0. (19)
More generally, if there is correlation between ri,t and εi,t−τ up to lag τ = q, the
difference that guarantees the above equation to hold will have to be at least of order
= q + 2. However, one is not restricted to using only differences of the order that
is ‘just right’, i.e. q + 2. Any other difference of order ≥ q + 2 will fulfil Eq. (19)
just as well. It is straightforward to construct a difference using the most proximate
observation of the same regime with lag ≥ q + 2. With respect to admissibility of
instruments, the rules of the classic first-difference approach apply: the instruments
need to be uncorrelated with the earlier of the two observations that make up the difference.
In the following, this procedure is called the Generalised Difference estimator.
For the moment conditions to hold, it is necessary to strengthen assumption (6). In
addition to the variables in the conditioning set i,t−1, εi,t must also be uncorrelated
with the future regimes ri,t+q+1, ri,t+q+2, . . ..
Proposition 2 Let the conditional expectation of εi,t satisfy
E εi,t i,t−1, ri,t+q+1, ri,t+q+2, . . . = 0, (20)
with i,t−1 defined as in assumption (6). Then the lagged levels yi,t− −p, p ≥ 1 are
instruments in Eq. (18), the adjustment equation transformed by taking the th difference,
with ≥ q + 2, conditional on the regimes being the same in each pair of
observations:
E εi,t − εi,t− yi,t− −p ri,t−1, ri,t−1 = ri,t− −1 = 0, with ≥ q + 2.
123
Panel estimation of state-dependent adjustment
Proof See Appendix B.
Likewise, it can be shown that xi,t− −p and the regime indicators ri,t− −p are
instruments in Eq. (18), given ri,t−1 = ri,t− −1. As in Proposition 1 above, if i,t−1
in (20) is replaced by

i,t−k , as defined in assumption (7), with k being the minimum
τ such that εi,t does not vary with εi,t−τ and xi,t−τ , the set of instruments is pushed
backwards in time: The lags yi,t− −k−p and xi,t− −k−p, p ≥ 1 are instruments. As
the regime indicator ri,t−1 is still assumed to be predetermined all lags ri,t−p, p ≥ 2
are instruments irrespective of k.
It is an identifying assumption for the process that drives the regime indicator to
have finite memory with respect to innovations εi,t . This is a limitation. If ri,t are correlated
with all past values of εi,t , then the conditional expectation of the transformed
error term resulting from a difference of two observations from the same regime will
not disappear. The resulting bias can be expected to wane if the minimum lag length is
chosen to be large. However, doing so would result in the loss of many observations,
exacerbating another weakness of the estimation strategy. In principle, assuming a
finite memory of the regime indicator with respect to εi,t is rather similar in kind to
the assumption of a finite memory of εi,t with respect to earlier shocks,which is needed
to use lagged endogenous variables as instruments in the standard approach. Whether
the condition (20) can be expected to hold or not will depend on the estimation problem
at hand. In the context of estimating the microeconomic adjustment of the capital
stock under financing constraints, it may be realistic to assume that, after the shock to
capital demand, the financing structure of a firm will be restored in finite time.5
3.3 Testing finite memory and deciding on the length of memory
In order to use Generalised Differencing, it is necessary to test the condition (20) and
decide on the length of the memory of the process driving the regime with respect
to εi,t . There are two simple solutions. The first is to use the test of overidentifying
restrictions associated with Sargan (1958) and Hansen (1982) to check the validity of
the moment conditions. The drawback is that this test is generally used as an omnibus
test of the specification, including the choice of the instruments. It is preferable to
have a more specific test concerning the appropriate lag length.
Such a specific test can be based on the fact that the expected value of the residual
will not disappear if the lag length chosen is too short. In that case, the choice
of observations according to regime will select positive or negative outcomes of εi,t ,
because of the correlation between the regime variable and the error component εi,t .
If regime dummies are added to the adjustment equation, then their coefficients will
be estimated as positive or negative quantities according to the direction of selectivity,
although they should be zero according to the basic specification. Furthermore,
it is known how these estimates for regime constants are distributed under the null
of a correct specification. Using a GMM estimator, they are asymptotically normal,
with mean zero, and their standard deviation is given by the standard deviation of the
coefficient. Therefore, the t-value on these coefficients is a valid test statistic.
5 For a theoretical model that makes this prediction, see von Kalckreuth (2004, 2008b, Chap. 1).
123
U. von Kalckreuth
It may be argued that this test ignores the possibility that the regime-specific constants
truly belong into the equation. Consider a trend in the term in the brackets of
Eq. (1) that makes the target level of yi,t change over time:
yi,t = − 1 − αi,t−1 yi,t−1 − κt − μi + εi,t .
Solving for yi,t yields
yi,t = αi,t−1 yi,t−1 + 1 − αi,t−1 κt + 1 − αi,t−1 μi + εi,t .
After transforming the equation by subtracting an observation belonging into the same
regime, lagged periods, one obtains
yi,t − yi,t− = αi,t−1 yi,t−1 − yi,t− −1 + 1 − αi,t−1 κ + εi,t − εi,t− .
Regime-specific constants may thus be the result of a trending target variable. Actually,
this is a case of misspecification: the time trend should have figured in xi,t. The
regime constants should be proportional to each other, with a factor of proportionality
given by the adjustment speeds.6 More generally, they should not be of different sign,
as it will be the case if the coefficient on the regime dummy collects the residuals
selected for their high or low value.
4 Moment restrictions for contemporaneously correlated regimes
All moment restrictions discussed in the previous section require the regime indicator
to be predetermined with respect to the current shock term. This may hold in many
applications, specifically if there are long planning and gestation lags as in the case of
fixed investment. In other circumstances, the error term in the adjustment equation and
the threshold variable governing the adjustment regime may be contemporaneously
correlated. Let us investigate an approach that can be brought to bear in this case.
For greater clarity, the adjustment equation shall be rewritten with a modified dating,
to highlight the possibility of a contemporaneous correlation between the speed of
adjustment and εi,t :
yi,t = − 1 − αi,t yi,t−1 − x
i,tβ − μi + εi,t , (21)
or
yi,t = αi,t yi,t−1 + 1 − αi,t x
i,tβ + μi + εi,t . (22)
It will now be shown that the requirement of predetermined regimes can be dropped at
the cost of additional assumptions regarding the fixed effect. Under these assumptions,
6 Let z1 and z2 be two regime dummy coefficients, with α1 and α2 the corresponding adjustment coefficients.
If the regime dummies result from a trending target as above, then the nonlinear restriction between
coefficients is z1/z2 = (1 − α1)/(1 − α2). It is rather straightforward to test this restriction after estimation.
123
Panel estimation of state-dependent adjustment
it is possible to leave the fixed effect in an equation amplified by regime dummies and
use first differences as instruments. Under the same conditions, first differences will
also serve as instruments for a modified version of the first differenced Eq. (5).
Level estimation was introduced by Arellano and Bover (1995) and Blundell and
Bond (1998) as a response to a specific problem arising in the standard autoregressive
model with fixed effects. If the coefficient of the lagged dependent variable is in the
neighbourhood of one, then the level behaves like a random walk, and it will be a
weak instrument in the differenced equation. These authors use the following moment
condition for estimation in the estimation of the standard autoregressive model:
E yi,t−p μi + εi,t = 0,
with p ≥ 1. If εi,t is serially uncorrelated, then it is sufficient that yi,t is mean
stationary and displays a constant correlation with μi for the moment equation to
hold. This implies a requirement on the initial conditions: the deviation of the starting
value from the stationary level needs to be uncorrelated with the stationary level itself.
The latent term of Eq. (22) is given by 1 − αi,t μi + εi,t. In the attempt to use
first differences as instruments for levels, let us first take a look at
E yi,t−p 1 − αi,t μi + εi,t .
This expectation will be zero if, first, E yi,t−p = 0, and second, yi,t−p is uncorrelated
with both 1 − αi,t−1 μi and εi,t . The first condition requires the process to
be mean stationary, as in the derivation of Blundell/Bond and Arellano/Bover. The
second condition is hard to fulfil. To see the reason, one may adjust the backward
solution in (3) and (4) to the modified dating:
yi,t = yi,0 − x
i,1β − μi
t


τ=1
αi,τ + x
i,tβ + μi + Ai,t ,
where
Ai,t =
t

l=2
εi,l−1 − x
i,lβ
t


τ=l
αi,τ + εi,t .
Plugging this back into (21) yields the expression:
yi,t = − 1 − αi,t

yi,0 − x
i,1β − μi
t−1


τ=1
αi,τ + Ai,t−1 − x
i,tβ

+ εi,t .
(23)
The difference yi,t−p is a function of all εi,τ, xi,τ and αi,τ , τ ≤ t − p, as well as of
the initial condition, the deviation yi,0 − x
i,1β − μi . One of the requirements for the
covariance of yi,t−p and 1 − αi,t μi to disappear is therefore a limited memory of
123
U. von Kalckreuth
αi,t = α
ri,t with respect to its own past. Fixed effects in ri,t are thus excluded. This
would be hard to defend in many applications, given the presence of a fixed effect in
the law of motion governing yi,t .
In order to weaken the requirements, one may decompose the individual target
level, μi , into its expectation over all individuals, μe, and the individual deviation
from this expectation, μ

i . Let, therefore,
μi = μe + μ

i , with μe = Ei (μi ).
By definition, E μ

i = 0. Rewriting the adjustment equation in (22) gives
yi,t = α
ri,t yi,t−1 + (1 − α)
ri,tx
i,tβ + μe (1 − α)
ri,t + μ

i (1 − α)
ri,t + εi,t
laten t term
.
(24)
Written this way, the equation contains a regime-specific shift term μe (1 − α) ri,t .
In estimation, this term can be taken into account by introducing the regime vector
ri,t as a regressor into the equation.
Proposition 3 Consider the conditions
E εi,t εi,t−k, εi,t−k−1, . . . , xi,t−k ,
xi,t−k−1, . . . , ri,t−k , ri,t−k−1, . . . , yi,0 − x
i,1β − μi = 0, (25)
E μ

i
εi,t ,
ri,t ,
xi,t , yi,0 − x
i,1β − μi = 0, (26)
with k ≥ 1, where a term in curly brackets denotes an entire time series. Jointly, these
conditions are sufficient for the following moment restrictions to hold in Eq. (24):
E yi,t−p εi,t + 1 − αi,t μ

i = 0 with p ≥ k, (27)
Proof See Appendix B.
It follows immediately from the condition (25) that appropriately lagged values
xi,t−p and ri,t−p can also be used as instruments. Some comments are in order.
It is natural that one has to impose conditions on μ

i , now that μi is not differenced out
of the error term. The invariance of expected μ

i with respect to the time path
εi,t
is rather unproblematic. It agrees well with the basic structure of the error component
model. The irrelevance of the regime process is less innocuous. It is well conceivable
that a real-world data generating process for ri,t may contain a fixed effect that is
correlated with μ

i . Similar reservations apply with respect to the required irrelevance
of
xi,t . Finally, the necessity of having an expected value of μi that is independent
of the initial deviation was also found by Blundell and Bond (1998) when investigating
the use of moment equations for levels in a linear context. The condition is not
innocuous either: it excludes an initial condition such as yi,0 = 0. It can be replaced
123
Panel estimation of state-dependent adjustment
by the requirement that the process has been running for a ‘very long’ time, as the first
term inside the bracket of Eq. (23) will disappear asymptotically.7
As a corollary to Proposition 3, it follows that lags of yi,t can also be used as
instruments in a differenced version of the augmented adjustment Eq. (24):
yi,t = α

ri,t yi,t−1 + (1 − α)

ri,tx
i,t β − μeα

ri,t + εi,t − μ

i α

ri,t
laten t term
.
Under conditions (25) and (26), the following restriction will hold8:
E yi,t−p−1 εi,t − αi,tμ

i = 0 with p ≥ k. (28)
Note that the moment restrictions for differences in (28) do not use all the information
contained in the moment restriction for levels: the first are implied by the latter but not
vice versa. Furthermore, because the residuals in (28) are first differenced, one observation
is lost, and the instruments have to be removed one period in time. However,
the moment condition is not necessarily useless: estimators based on condition (28)
may be more robust against violations of assumption (26) regarding the fixed effect,
especially when regime changes are relatively infrequent, as μ

i is differenced out of
(28) whenever ri,t = ri,t−1.
5 A synopsis
At this point, it is interesting to compare the conditions for Propositions 1, 2 and 3.
All of them require the expected value of εi,t to be invariant with respect to past
values εi,t−k, εi,t−k−1, . . ., the levels or first differences of xi,t−k , xi,t−k−1, . . . as well
as to μi and/or the initial deviation. Propositions 1 and 2 also need εi,t to be uncorrelated
with ri,t−1, the regime indicator figuring in the current date adjustment equation,
whereas for Proposition 3, invariance of εi,t with respect to lag k and earlier of the
regime indicator is sufficient. As an additional identifying assumption for the Generalised
Differencing approach, the memory of ri,t needs to be finite with respect to lags
of εi,t . This excludes, for example, an autoregressive process for the state variable
underlying the adjustment indicator, with the innovation contemporaneously correlated
to εi,t . The level estimator, for its part, needs the expected value of the individual
effect μi to be unrelated to the process governing the idiosyncratic error, changes
in the forcing term xi,t , the regimes and the initial deviation. Both these restrictions
may impose considerable limitations. However, estimators based on Propositions 2
and 3 are able to fulfil special tasks. The Generalised Difference estimator will be
unbiased even if some of the alpha coefficients are large—in fact, it still works if
7 Such a process may also be observed by means of a ‘short’ panel—what matters is not the length of
the panel, but whether or not the process has been running long enough to bring the effect of the initial
condition in Eq. (23) into the neighbourhood of zero.
8 This follows directly from E yi,t−p−1 εi,t + 1 − αi,t μ

i = 0 and E( yi,t−p−1(εi,t +
(1 − αi,t )μ

i )) = 0.
123
U. von Kalckreuth
one of them is exactly equal to 1 or even greater. Like the standard first-difference
estimator in the linear case, the Generalised Difference estimator can be supposed to
deliver imprecise results if all the adjustment coefficients are in the neighbourhood
of 1, as then the level instruments are weak. In this case, the level estimator will perform
better. Perhaps even more importantly, this latter estimator is also capable of
dealing with regime indicators that are contemporaneously correlated with the error
term.
6 Implementing and simulating the estimators
This section compares the four sets of moment conditions exposed in the Propositions
1, 2 and 3, using them separately for estimation on simulated panel data sets.
6.1 Setting up the simulation
For the regime indicator, a threshold process is specified. The kth element of ri,t is
given by
r(k)i,t = Ind ¯sk−1 ≤ si,t ≤ ¯sk .
The numbers ¯s0, . . . , ¯sL are thresholds, with the first and the last element being equal
to−∞and∞, respectively. As an example for a threshold process with infinite memory
with respect to the error term, an AR(1) is used as a process for the latent state
si,t :
si,t = asi,t−1 + υi,t ,
where the current shock υi,t is contemporaneously correlated with the error term εi,t .
Alternatively, as an example of a process with finite memory, it is assumed that the
threshold process is driven by an MA(q):
si,t = b +
q

j=0
c jηi,t−j , with c0 = 1.
The elements of the moving average conform to
E ηi,t = 0, E ηi,tηi,t−p = 0∀p > 0, E ηi,t εi,t = 0, E ηi,t εi,t−p = 0∀p > 0.
Concretely, the two interrelated processes
ri,t , yi,t are simulated as follows:
Regime-dependent error correction process: εi,t is standard normal, μi is distributed
N (1, 1) , εi,t and μi are independent.
Regime indicator process: Regarding the number of regimes, let L = 2. If the
threshold process is driven by an AR(1), then let E υ2
i,t
= 1, E υi,t εi,t = 0.8, υi,t
being calculated as a weighted sum of εi,t and an independent Gaussian process. The
123
Panel estimation of state-dependent adjustment
AR-parameter a is 0.8. Likewise, for the MA(q), the stochastic structure is chosen as
E η2
i,t
= 1, E ηi,t εi,t = 0.8, with ηi,t being calculated as a weighted sum of εi,t
and an independent Gaussian process. The threshold level is set equal to zero, resulting
in an equal number of observations in each regime on average. Let us experiment
with a MA(0) (uncorrelated regimes states) and a MA(1) with c1 = 0.8. Note that the
assumed contemporaneous correlation between the shocks in the regime equation and
the error term is very high.
Panel structure: The panel is unbalanced, with individuals carrying either 8, 9 or
10 observations, 1,000 individuals of each type, that is, 3,000 individuals in total. For
each individual, the process is simulated for 50 periods, and only the last 8, 9 or 10
observations are used for estimation.
All the estimators are implemented by first calculating the transformed observations
and the instruments and then adapting and using the routines supplied with the
DPD module for Ox proposed by Doornik et al. (2002) to perform GMM estimates
and tests.9 Details on the estimation routines are given below and in the notes to the
tables.
6.1.1 Quasi-difference estimations QD1 and QD2
Let us assume an AR(1) as a process driving the threshold variable that constitutes the
regime. The estimation equations are transformed in the way described in Sect. 3. The
first quasi-differencing approach, QD1, is implemented by estimating the transformed
equation using a standard linear GMM estimator and then calculating the structural
parameters by inverting Eq. (15). The more complicated QD2 estimation is performed
by treating the moment as a nonlinear function of the structural parameters, using the
iterative Gauss–Newton method.
Estimates on the basis of the QD1 transformation are used as initial values. As
instruments, levels lagged twice are used. It turns out that the instruments are more
informative (the estimates being more precise) if they are separated out in regimes,
which means: For purposes of instrumentation, the lags of yi,t−2 are interacted with
regime dummies, ri,t−2.
6.1.2 Generalised Difference estimation
The transformation described in Proposition 2 consists in taking the th difference,
with chosen such that regimes ri,t−1 and ri,t− −1 match, subject to some minimum
order of difference. Available instruments are levels lagged + 1, + 2, . . .. As the
appropriate depends on the regime process, so does the set of instruments. By taking
the earlier of the two observations as a point of reference yi,t and assigning to it
the nearest lead yi,t+ of the same regime with ≥ 2 + q, the definition of suitable
instruments is straightforward. One can uniformly use lags yi,t−1, yi,t−2 and earlier as
instruments. As in Quasi-Difference estimation, let us interact the lagged levels yi,t−1
9 Ox is an object-oriented matrix programming language. For a complete description of Ox, see Doornik
(2001).
123
U. von Kalckreuth
with regime indicators ri,t−1. In order to test the validity of the transformation, regime
dummies are included as additional RHS variables. They also enter the instrument
set.
6.1.3 Level estimation
As described in Sect. 4, the level estimator is implemented by specifying an auxiliary
equation that contains a set of regime dummies as an additional RHS variable. Instruments
are first differences of lagged endogenous variables, interacted with regime
indicators, ri,t−1 yi,t−1 ri,t−2 yi,t−2 (four variables!) plus differenced indicators
for regime 1 taken from ri,t−1, ri,t−2. Simulations are performed both for the case
where a predetermined regime regime ri,t−1 enters the adjustment equation, and for
the case of a contemporaneously correlated regime ri,t governing the adjustment.
6.2 Simulation results
Tables 1 and 2 show estimates on the basis of quasi-difference transformations QD1
and QD2 (1,000 runs). The theoretical discussion has shown that the finite sample
properties of the estimators may depend on the size of the regime-specific coefficients,
notably on their difference from 1. Therefore, estimations for a whole range
of parameters are shown. The true value for α1 is set as 0.3, whereas the value for α2
ranges from 0.3 to 0.9. Larger ranges and finer steps are plotted in Figs. 1 and 2.
Table 1 and Fig. 1 display results for the simpler QD1 transformation. Although
for smaller coefficient values, the estimator performs well and yields correct estimates
with a good precision, it is less reliable if one of the regime-specific coefficients is
large. For α1 = α2 = 0.3, the mean bias is only of the order of −0.004 for both
parameters. It will be 0.0133 for ˆα2 when α2 is raised to 0.7, and for α2 = 0.9, the
finite sample bias of ˆα2 becomes a non-negligible −0.0414.10 The estimates ˆα1 also
deteriorate, although less markedly. The table also gives t-values and Sargan statistics.
The bias leads the t-tests reject the true value too often when one of the coefficients is
too high: In the extreme case of α2 = 0.9, the true value is rejected 77.9% of the times.
The same is true for the Sargan test of instrument validity: with large regime-specific
coefficients, it rejects the instruments 81.6% of the times when α2 = 0.9. One can
conclude that slow speeds of adjustment (high persistence) create a problem for QD1
estimation.
Table 2 and Fig. 2 give results for the QD2 transformation. As is expected, for large
values of regime specific adjustment coefficients the estimator performs better than
its counterpart based on QD1. In the extreme cases of α1 = 0.3 and α2 = 0.9, the bias
is still only 0.0152 and −0.0209, respectively. For smaller values of regime-specific
coefficients, there is hardly any bias at all. Sargan statistics and t-values are reliable,
except for very high values of α2.
10 Whether one considers the bias as large will also depend on the way one looks at the parameter. The
state-dependent speed of adjustment is given by 1 − αi,t−1. A bias of −0.0415 when the true value of α2
is 0.9 will, therefore, overestimate the adjustment speed by 41.5%.
123
Panel estimation of state-dependent adjustment
Table 1 Quasi-differences, QD1 transformation, 1,000 runs
Simulation # (1) (2) (3) (4)
Specification state variable underlying regimes AR(1)
True α1 0.3 0.3 0.3 0.3
True α2 0.3 0.5 0.7 0.9
α1
Mean parameter estimate 0.2930 0.2955 0.2939 0.2687
Mean bias –0.0041 –0.0045 –0.0061 –0.0313
Mean estimated std. deviation 0.0220 0.0236 0.0276 0.0351
Std. dev. parameter estimate 0.0218 0.0247 0.0298 0.0533
RMSE 0.0222 0.0251 0.0304 0.0618
Freq. rejections of true value on 5% conf. level 4.6% 6.8% 5.9% 25.7%
α2
Mean parameter estimate 0.2957 0.4938 0.6868 0.8586
Mean bias –0.0043 –0.0062 –0.0133 –0.0414
Mean estimated std. deviation 0.0194 0.0189 0.0177 0.0139
Std. dev. parameter estimate 0.0197 0.0190 0.0188 0.0203
RMSE 0.0202 0.0200 0.0230 0.0262
Freq. rejections of true value on 5% conf. level 6.0% 5.4% 12.3% 77.9%
Freq. rejection by Sargan–Hansen on 5% conf. level 8.1% 9.4% 16.4% 81.6%
Valid obs. in estimation 21,000 21,000 21,000 21,000
Notes: the table shows GMM estimates of α1 and α2 on the basis of the transformation QD1, see Proposition
1. Columns vary by parameters α1 and α2 used for generating panels according to Eq.(2). Each column
represents 1,000 repetitions of two-stage GMM estimates using an unbalanced panel of 3,000 individuals
with 10, 9 and 8 observations (1,000 individuals each). The number of valid observations is reduced
by the need to transform variables. Instruments are the levels of ri,t−2 yi,t−2 (i.e. two interaction terms)
and a constant. Estimated standard deviations are derived from reduced form estimates using the delta
method. Sargan-Hansen test is the test of overidentifying restrictions associated with Sargan (1958) and
Hansen (1982). Estimation is executed using DPD package version 1.2 on Ox version 3.30 and additional,
user-written routines
The theoretical discussion in Sect. 3 has shown that the precision of the QD2 estimator
should depend on the ratio of adjustment speeds. If both of them are high, but
of similar size, then the ratio 1 − αi,t−2 / 1 − αi,t−1 in the definition of the transformed
error term ξi,t cancels out in Eq. (11). The error term in QD1, in contrast,
depends on the absolute distance of the regime-specific coefficients from unity. To
study this issue, the simulations of QD1 and QD2 estimation are performed using a
value of α1 = 0.8 as a platform and varying over α2.The result is shown in Figs. 3 (QD1
estimation) and 4 (QD2 estimation). Here, the QD1 estimates are biased throughout
the range. The bias of ˆα2 switches from positive to negative, whereas the bias of ˆα2 is
negative throughout. In contrast, with QD2, the bias practically disappears when both
parameters are large, to be noticeable only when α1 is small.
Table 3 and Figs. 5 and 6 give results using GMM on observations transformed by
Generalised Differences. InColumns 1 and 2, the estimator is correctly used. Thememory
of the regime process is restricted—Column (1) assumes uncorrelated regimes,
and Column (2) assumes a threshold process driven by anMA(1). The minimum leads
used in transformation are 2 and 3, respectively. In both cases, the Generalised
Difference estimator performs well. The estimates are unbiased. The standard deviations
are similar to what can be obtained from the quasi-difference estimates for the
123
U. von Kalckreuth
Table 2 Quasi-differences, QD2 transformation, 1,000 runs
Simulation # (1) (2) (3) (4)
Specification state variable underlying regimes AR(1)
True α1 0.3 0.3 0.3 0.3
True α2 0.3 0.5 0.7 0.9
α1
Mean parameter estimate 0.2998 0.3006 0.3021 0.3152
Mean bias −0.0002 0.0006 0.0021 0.0152
Mean estimated std. deviation 0.0221 0.0229 0.0261 0.0418
Std. dev. parameter estimate 0.0217 0.0235 0.0270 0.0463
RMSE 0.0217 0.0235 0.0271 0.0487
Freq. rejections of true value on 5% conf. level 4.7% 5.8% 5.8% 9.5%
α2
Mean parameter estimate 0.2985 0.4982 0.6943 0.8791
Mean bias −0.0014 −0.0018 −0.0057 −0.0209
Mean estimated std. deviation 0.0195 0.0194 0.0187 0.0174
Std. dev. parameter estimate 0.0195 0.0192 0.0188 0.0170
RMSE 0.0196 0.0193 0.0197 0.0269
Freq. rejections of true value on 5% conf. level 5.9% 4.5% 5.9% 23.0%
Freq. rejection by Sargan–Hansen on 5% conf. level 5.2% 6.0% 6.0% 22.9%
Valid obs. in estimation 21,000 21,000 21,000 21,000
Notes: the table shows GMM estimates of α1 and α2 on the basis of the transformation QD2, see Proposition
1. Columns vary by parameters α1 and α2 used for generating panels according to Eq. (2). Each
column represents 1,000 repetitions of a two-stage GMM procedure iterating on pseudoregressors, using
an unbalanced panel of 3,000 individuals with 10, 9 and 8 observations (1,000 individuals each). As an
initial value, an estimate on the basis of QD1 was used. The number of valid observations is reduced by
the need to transform variables. Instruments are the levels of ri,t−2 yi,t−2 (i.e. two interaction terms) and
a constant. Estimated standard deviations are calculated as a by-product from the final Gauss–Newton iteration
step. Sargan-Hansen test is the test of overidentifying restrictions associated with Sargan (1958) and
Hansen (1982). Estimation is executed using DPD package version 1.2 on Ox version 3.30 and additional,
user-written routines
smaller of the two coefficients and actually somewhat lower for the higher coefficient.
In the case of an MA(1) regime process, standard deviations are higher, as less
observations can be used. Column (1), with a minimum lead of 2, yields an average of
15,058 valid observations per estimation. This number decreases to 11,277 in Column
(2), when a minimum lead of 3 is imposed. On the same set of simulated data, the
estimates based on quasi-differencing can use 21,000 observations each run. Figure 5
shows that the average deviation of the Generalised Difference estimator from the true
parameter value is very small when the conditions for its use are met and does not
depend systematically on the size of the adjustment coefficients. Even regime-specific
coefficients equal to or larger than 1 can be accommodated, as long as the overall
process remains stable. Columns (3) and (4) do ‘the wrong thing’. For Column (3), a
minimum lead of 2 is used on data generated with a regime process generated by an
MA(1), where a lead of ≥ 3 is warranted. Column (4) assumes an AR(1) process
driving the threshold variable: this process has infinite memory. Unsurprisingly, in
both cases, the estimator turns out to be biased. However, in spite of a strong correlation
between the shock in the regime variable and the error term, the bias is moderate.
In Column (3), only the estimates ˆα2 are biased, to a degree that is similar to the
performance of the QD2 estimator under the same (unfavourable) parameter values.
123
Panel estimation of state-dependent adjustment
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-0.040
-0.035
-0.030
-0.025
-0.020
-0.015
-0.010
-0.005
Quasi-Differences 1: Bias as a function of alpha2
bias alpha1 × alpha2 bias alpha2 × alpha2
Fig. 1 Mean bias for estimates on the basis of QD1, with α1 = 0.3 and α2 varying
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-0.020
-0.015
-0.010
-0.005
0.000
0.005
0.010
0.015
Quasi-Differences 2: Bias as a function of alpha2
bias alpha1 × alpha2 bias alpha2 × alpha2
Fig. 2 Mean bias for estimates on the basis of QD2, with α1 = 0.3 and α2 varying
When, as assumed in Column (4), the regime process is driven by a process with
infinite memory, the resulting bias is larger, similar in size to the weak performance
of the QD1 estimator when one of the coefficients is large. Figure 6 shows how in this
latter case the bias depends on the alpha-parameters.
123
U. von Kalckreuth
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-0.05
-0.04
-0.03
-0.02
-0.01
0.00
0.01
0.02
Quasi-Differences 1: Bias as a function of alpha2
bias alpha1 × alpha2 bias alpha2 × alpha2
Fig. 3 Mean bias for estimates on the basis of QD1, with α1 = 0.8 and α2 varying
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-0.0100
-0.0075
-0.0050
-0.0025
0.0000
0.0025
0.0050
0.0075
0.0100 Quasi-Differences 2: Bias as a function of alpha2
bias alpha1 × alpha2 bias alpha2 × alpha2
Fig. 4 Mean bias for estimates on the basis of QD2, with α1 = 0.8 and α2 varying
The specification tests do not fail to detect the erroneous assumption regarding
the warranted order of differentiation. In both cases, the regime constant test rejects
the specification in 100% of the cases. As the estimated coefficients are of opposite
sign, they cannot be caused by trending target values. The regime dummies
123
Panel estimation of state-dependent adjustment
Table 3 Generalised Differences estimation, (α1, α2) = (0.3, 0.8), 1,000 runs
... using appropriate leads ... using inappropriate leads
Specification state variable (1) (2) (3) (4)
underlying regimes MA(0) MA(1) MA(1) AR(1)
lead = 2 lead = 3 lead = 2 lead = 2
α1
Mean estimate (true value 0.3) 0.2990 0.2994 0.2950 0.2767
Mean est. std. dev. 0.0215 0.0286 0.0243 0.0223
Mean bias −0.0010 −0.0006 −0.0050 −0.0232
RMSE 0.0118 0.0276 0.0239 0.0322
Freq. rejections of true value 6.4% 3.8% 5.1% 18.0%
on 5% conf. level
α2
Mean estimate (true value 0.8) 0.7978 0.7995 0.7736 0.7568
Mean est. std. dev. 0.0261 0.0304 0.0298 0.0276
Mean bias −0.0021 −0.0005 −0.0264 −0.0432
RMSE 0.0113 0.0296 0.0399 0.0518
Freq. rejections of true value 3.7% 4.5% 14.2% 34.6%
on 5% conf. level
Specification tests
G1
Mean estimate −0.0001 0.0000 −0.0765 −0.0977
Mean est. std. dev. 0.0116 0.0145 0.0114 0.0102
Freq. rejections of zero value 5.8% 5.4% 100% 100%
on 5% conf. level
G2
Mean estimate −0.0007 −0.0010 0.0822 0.0977
Mean est. std. dev. 0.0114 0.0141 0.0114 0.0102
Freq. rejection of zero value 4.9% 4.7% 100% 100%
on 5% conf. level
Freq. rejection by Sargan–Hansen 5.1% 4.8% 91.9% 23.2%
on 5% conf. level
Av. no. of valid observations 15,058 11,277 14,107 15,117
Notes: the table shows GMM estimates of α1 and α2 on the basis of Generalised Differencing, see Proposition
2. Columns vary by the stochastic specification of the regime indicator when generating the panels
according to Eq. (2) and by the lead used for transformation. Columns (1), (2), and (3) specify processes
where the memory of the regime variable is limited over time, and the state variable that underlies the regime
indicator follows an MA process. In column (4), the regime process is supposed to have infinite memory.
In all columns, α1 = 0.3 and α2 = 0.8. Each column represents 1,000 repetitions of two-stage GMM
estimates using an unbalanced panel of 3,000 individuals with 10, 9 and 8 observations (1,000 individuals
each). The number of valid observations is reduced by the need to transform variables. Instruments are the
levels of ri,t−1 yi,t−1 (i.e. two interaction terms) and a constant. G1 and G2 are regime dummy coefficients
introduced as a specification test for the correct lag length, see Subsection 3.3. Sargan-Hansen test is the test
of overidentifying restrictions associated with Sargan (1958) and Hansen (1982). Estimation is executed
using DPD package version 1.2 on Ox version 3.30 and additional, user-written routines
have ‘captured’ the regime-specific non-zero expectations of the differenced residuals
E εi,t − εi,t−2 ri,t−1,ri,t−1 = ri,t−3 for the two values that ri,t−1 can take.
The Sargan test is sensitive for the misspecification in Column (3) where the wrong
lead is used, rejecting 91.9% of the estimates. Detecting an infinite memory of the
regime variable is harder for the Sargan test: only 23.2% of estimates in Column (4)
are rejected.
123
U. von Kalckreuth
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
-0.0030
-0.0025
-0.0020
-0.0015
-0.0010
-0.0005
0.0000
0.0005 GD estimation: bias as a function of alpha2
bias alpha1 × alpha2 bias alpha2 × alpha2
Fig. 5 Mean bias for Generalised Differences estimates, with α1 = 0.8 and α2 varying. Here regime
process uncorrelated over time, correct lead of 2
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-0.055
-0.050
-0.045
-0.040
-0.035
-0.030
-0.025
-0.020
-0.015
GD estimation: bias as a function of alpha2
bias alpha1 × alpha2 bias alpha2 × alpha2
Fig. 6 Mean bias for Generalised Differences estimates, with α1 = 0.8 and α2 varying. Here regime
process unlimited memory AR(1), misspecified lead of 2
Table 4, together with Figs. 7 and 8, show simulation results for the level estimator,
both for the case of a predetermined regime and a contemporaneous regime.
In both cases, a regime process with infinite memory is assumed. In the table and
123
Panel estimation of state-dependent adjustment
Table 4 Level estimation, 1,000 runs
Simulation # (1) (2) (3) (4)
Regime indicator Predetermined Contemporaneous
State variable underlying regimes AR(1)
True α1 0.3 0.3 0.3 0.3
True α2 0.8 1.1 0.8 1.1
α1
Mean parameter estimate 0.3031 0.3004 0.3006 0.2951
Mean bias 0.0031 0.0004 0.0006 −0.0049
Mean estimated std. deviation 0.0197 0.0074 0.0255 0.0073
Std. dev. parameter estimate 0.0187 0.0078 0.0252 0.0079
RMSE 0.0190 0.0078 0.0252 0.0094
Freq. rejections of true value on 5% conf. level 4.3% 4.2% 4.7% 10.5%
α2
Mean parameter estimate 0.7987 1.1001 0.7891 1.1004
Mean bias −0.0013 0.0001 −0.0109 0.0004
Mean estimated std. deviation 0.0188 0.0013 0.0283 0.0017
Std. dev. parameter estimate 0.0191 0.0013 0.0285 0.0017
RMSE 0.0192 0.0013 0.0305 0.0017
Freq. rejections of true value on 5% conf. level 5.7% 5.2% 7.0% 5.6%
Auxiliary regime constants
G1
Mean estimate 0.6985 0.698 0.6795 0.6922
Theoretically expected 0.7 0.7 0.7 0.7
G2
Mean estimate 0.2030 −0.0984 0.2434 −0.0852
Theoretically expected 0.2 −0.1 0.2 −0.1
Freq. rejection by Sargan–Hansen on 5% conf. level 4.1% 3.0% 5.2% 4.9%
Valid obs. in estimation 24,000 24,000 24,000 24,000
Notes: the table shows GMM estimates of α1 and α2 on the basis of level estimation (see Proposition 3).
Columns vary by parameters α2 and by the stochastic specification of the regime indicator used for generating
the panels according to Eq. (2). In all cases, the regime process is supposed to have infinite memory,
following an AR(1) process. Columns (1) and (2) relate to processes where the regime variable is predetermined
in the adjustment equation, and Columns (3) and (4) relate to results for regime variables that are
contemporaneously correlated with the error term. In all columns, α1 = 0.3. While Columns (1) and (3)
specify α2 = 0.8, columns (2) and (4) show results for α2 = 1.1. Each column represents 1,000 repetitions
of two-stage GMMestimates using an unbalanced panel of 3,000 individuals with 10, 9 and 8 observations
(1,000 individuals each). Instruments are first differences of lagged endogenous variables, interacted with
the regime indicators, ri,t−1 yi,t−1 ri,t−2 yi,t−2 (i.e. four variables) plus dummies for the first regime
from ri,t−1 and ri,t−2. G1 and G2are coefficients of regime dummies introduced into the equation to capture
the regime-specific shift term in Eq. (24). Sargan–Hansen test is the test of overidentifying restrictions
associated with Sargan (1958) and Hansen (1982). Estimation is executed using DPD package version 1.2
on Ox version 3.30 and additional, user-written routines
the figures α2 varies, with a fixed value of α1 = 0.3. In the predetermined case,
there is little bias over the whole range of parameters, with the possible exception
of α2 = 1, where the bias of ˆα1 assumes a moderate value of 0.0117 (not shown in
the table). Standard deviations are similar to those that were obtained with the other
estimators. If α2 assumes a value larger than 1, then the estimates become extremely
exact.
Columns (3) and (4), as well as Fig. 8, show that the level estimator indeed successfully
copes with contemporaneous regime variables, a problem that cannot be
123
U. von Kalckreuth
solved by any of the other approaches. There is a moderate bias that peaks at 0.012
for ˆα2 when α2 = 0.9 (not shown in the table), and the standard deviations are higher
than with a predetermined regime for α2 < 1. Again, for α2 > 1, the level estimates
become very exact. In all the columns, the regime dummy is very near the theoretical
value of E 1 − αi,t μe, a term that is introduced into Eq. (24) by splitting up the firm
fixed effect into its expectation and a deviation uncorrelated with the shocks in the
other processes.
7 Conclusion and outlook
Four different ways of estimating an adjustment equation with time-varying persistence
have been presented, all within a GMM framework, albeit with a different set
of moment conditions.
Two estimation techniques rely on transforming the original equation using quasidifferences.
Both quasi-differences estimators are very precise when all the coefficients
are small. When both coefficients are large and of similar size (high persistence
throughout the regimes), the results of QD1 estimation have been shown to be unusable
in simulation, whereas the QD2 approach continues to deliver correct results. In
von Kalckreuth (2008a), the QD2 estimator is successfully employed for estimating
differential adjustment speeds for the capital stock. The most difficult parameterisation
is observed when coefficients are widely different, while one of them is large.
While affected by small sample problems, the QD2 estimator performs clearly better
in this situation. In direct comparison, the major virtue of the QD1 estimator lies in
its surprising simplicity.
The third method involves transformation using Generalised Differences, with a
lead that is long enough to overcome the memory of the εi,t -shocks in the process
driving the regime indicator. This method is applicable only when the memory of the
regime process is limited.We have seen above how to test this requirement. Although
a limited memory may be a good approximation in a number of circumstances, such
as investment under financing constraints, the requirement will not always be fulfilled.
If the conditions are met, then this method leads to a linear estimator which remains
unbiased also if some of the coefficients are in the neighbourhood of 1 or larger. The
fourth method leaves the equation untransformed, and past differences are used as
instruments. Regime dummies are employed to capture and neutralise the time-varying
non-zero expected value of the residual process. Thememory of the regime process
is irrelevant for this technique. However, one needs to assume the individual-specific
deterministic equilibrium as being unrelated to the process governing the idiosyncratic
error, changes in the forcing term xi,t , the initial deviation and the regimes. The level
estimator is very precise with regard to larger coefficients. This is not really surprising:
the use of level equations has originally been proposed to overcome the problem
of weak instruments in cases where the autoregressive parameter approaches unity.
More important is another virtue of the fourth method: the level estimator is the sole
procedure that can be used when the regime indicator is contemporaneous to the error
term in the adjustment equation.
123
Panel estimation of state-dependent adjustment
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
-0.0025
0.0000
0.0025
0.0050
0.0075
0.0100
Level estimation: bias as a function of alpha2
bias alpha1 × alpha2 bias alpha2 × alpha2
Fig. 7 Mean bias for level estimation with predetermined regimes, with α1 = 0.3 and α2 varying
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
-0.012
-0.010
-0.008
-0.006
-0.004
-0.002
0.000
Level estimation: bias as a function of alpha2
bias alpha1 × alpha2 bias alpha2 × alpha2
Fig. 8 Mean bias for level estimation with contemporaneous regimes, with α1 = 0.3 and α2 varying
In dealing with a practical estimation problem, one first of all needs to decide
whether the assumption of contemporaneous regimes is warranted in the given situation.
If this is the case, then it is the quasi-differencing methods that impose the least
stringent conditions. They can be used and interpreted like first-difference estimations
in the standard model. Owing to the fact that the nonlinear transformation will affect
123
U. von Kalckreuth
the latent terms in QD1 stronger than in QD2, the latter is to be preferred, although
the former may be used as a starting point for specification search and as a convenient
way to generate initial values for QD2 estimation. If adjustment speeds are low and
dissimilar, then even QD2 can lead to non-trivial small sample biases. In this case, the
Generalised Differences method may be preferable, subject to a test on the requirement
of finite memory, in spite of losing many observations.
If adjustment regimes are likely to be contemporaneously correlated, then estimation
using the level approach is possible if the fixed effect is ‘benevolent’, as explained
above. In that case, the level approach will also be a useful device if the speed of adjustment
is slow. Under the same conditions, one may also use the lagged first differences
as instruments in a differenced equation. This, however, is less efficient, as observations
are lost and not all the information in the moment restriction is used.
In deciding upon the use of the moment conditions, one may ask whether they
can be meaningfully combined in estimation. In general, this will not be the case.
If regimes are contemporaneously correlated with the idiosyncratic error term, then
the moment conditions from Propositions 1 and 2 should not be used at all. If the
regime can be considered as predetermined, then one will want to avoid imposing the
additional conditions needed for the level estimator. They are more restrictive than
in the standard autoregressive case. With predetermined regimes, the moment conditions
from Proposition 1 and 2 are to be regarded as alternatives. For higher speeds
of adjustment, they are very similar, and a possible gain from augmenting QD2 by
Generalised Differences will not be worth the risk of erroneously imposing additional
constraints, whereas, for very low and dissimilar speeds, even QD2 breaks down and
Generalised Differencing should be used on its own.
Appendix A: A state-dependent ECM
Formally, the state-dependent adjustment equation considered in this article involves a
lagged dependent variable and a forcing term xi,t . However, in addition, higher-order
adjustment processes can be accommodated, by redefining states appropriately.
Consider a linear autoregressive process with distributed lags in a forcing term xi,t
and an individual specific constant μi :
A (L) yi,t = B (L) xi,t + μi + εi,t .
where A (L) and B (L) are lag polynomials. As is well known, the process can always
be written in the error correction format. If, for example, A (L) and B (L) are of order
2, then this leads to
yi,t = −φ yi,t−1 − β
xi,t−1 − μ

i + γ 0
xi,t + γ 1
xi,t−1 + ω yi,t−1 + εi,t .
In the first line, the term in brackets is the deviation from the static equilibrium, where
β may be interpreted as a cumulative long-run effect of a shock in xi,t . The transformed
constant μ

i is equal to [A (L)]−1 μi. The termφ is the speed of adjustment. If
the process is stable, then |φ| < 1. The second line depicts the transitional dynamics,
123
Panel estimation of state-dependent adjustment
which is not directly related to the deviation from equilibrium. With A (L) or B (L)
of order higher than 2, the transitional dynamics in the error correction format would
involve higher-order lags of differences xi,t and yi,t .
A generalisation of the adjustment process considered hitherto makes φ, γ 0, γ 1,
and ω state-dependent, while leaving the transformed constant μ

i and the long-run
effect β time invariant. The latter imposes a constraint on the time-varying coefficients:
yi,t = −φi,t−1 yi,t−1 − β
xi,t−1 − μ

i + γ 0
i,t−1

xi,t
+γ 1
i,t−1

xi,t−1 + ωi,t−1 yi,t−1 + εi,t .
Now let again ri,t be an indicator variable characterising the speed of adjustment. As
the adjustment process is parameterised over two lags, it is straightforward to model
the time-varying parameters as a function involving the state variables in two periods,
t −1 and t −2. Finally, let di,t−1 be an indicator vector of dummies for all the possible
values ri,t−1, ri,t−2 can take. Then we can write
φi,t−1 = ϕ
di,t−1, ωi,t−1 = ω
di,t−1, γ 0
i,t−1
=
0di,t−1, γ 1
i,t−1
=
1di,t−1,
with ϕ, ω,
0 and
1 vectors and matrices of state-dependent adjustment coefficients
remaining to be estimated. Written this way, the problem is fully equivalent to the one
that has been treated in this article, with di,t−1 taking the place of ri,t−1 with respect
to the adjustment speed, φi,t−1, and using appropriate interaction terms for all the
other state-dependent coefficients.With the help of quasi-differencing or Generalised
Differencing, one can eliminate the fixed effect from the adjustment equation. With
contemporaneous adjustment coefficients, one may use the level estimator. It has to
be noted though that—compared to a first-order adjustment process—the Generalised
Difference estimator will be difficult to use, as there are L2 states to be considered
here, and only pairs of observations belonging to the same regime with a given minimum
time distance can be used. The other two estimation principles are not affected
by this profusion of states, except for the fact that the number of coefficients is higher.
Appendix B: Proofs
Proof of proposition 1 If E εi,t
i,t−1 = 0, then any function f i,t−1 will be
orthogonal to εi,t , because
E f i,t εi,t = E i,t E f i,t−1 εi,t
i,t−1
= E i,t f i,t−1 E εi,t i,t−1 = 0. (29)
Consider first E yi,t−pψi,t , with p ≥ 2. Equation (3) and (4) show that yi,t−p is a
function of ri,t−p−1, ri,t−p−2, . . . , xi,t−p, xi,t−p−1, . . . , εi,t−p, εi,t−p−1, . . . μi , yi,0 .
The expressions 1/(1 − αi,t−1) and 1/(1 − αi,t−2) are functions of ri,t−1 and ri,t−2.
Applying (29) to the products yi,t−p/(1 − αi,t−1)εi,t and yi,t−p/(1 − αi,t−2)εi,t−1
yields E yi,t−pψi,t = 0. The same argument holds for E yi,t−pξi,t , with p ≥ 2.


123
U. von Kalckreuth
Proof of proposition 2 The proposition follows from the law of iterated expectations:
E yi,t− −p εi,t − εi,t− ri,t−1, ri,t− −1
= Eyi,t− −p E yi,t− −p εi,t − εi,t− ri,t−1, ri,t− −1, yi,t− −p
= Eyi,t− −p yi,t− −p · E εi,t − εi,t− ri,t−1, ri,t− −1, yi,t− −p = 0,
because the conditional expectation within the brackets is zero for ≥ 2 + q. The
backward solution (3) decomposes yi,t into the initial deviation, yi,0 − x
i,1β − μi ,
and the history of xi,t , ri,t and εi,t . The assumption (20) ensures that conditioning
on ri,t−1, ri,t− −1 and yi,t− −p, p ≥ 1 will preserve a zero expectation of εi,t and
εi,t− −1.

Proof of Proposition 3 The restriction (27) holds for p = k if, first,
E yi,t−kεi,t = E yi,t−k · E εi,t
yi,t−k = 0, (30)
and second,
E yi,t−k 1 − αi,t μ

i = 0. (31)
Given the backward solution (23), condition (25) is sufficient for the expectation
in the bracket of (30) to be identically zero, as yi,t−k is a function of
εi,t−k, εi,t−k−1, . . . , xi,t−k, xi,t−k−1, . . . , ri,t−k , ri,t−k−1, . . . , yi,0−x
i,1β − μ .
Similarly, one has
E yi,t−k 1 − αi,t μ

i = E yi,t−k 1 − αi,t · E μ

i yi,t−k 1 − αi,t .
If the expectation of μ

i is zero conditional on all random variables that constitute
yi,t−k according to its reduced form in (23), then the expectation in (31) vanishes.


Acknowledgments The author thanks Jörg Breitung for important discussions, encouragement and
patience. Olympia Bover made a vital comment that gave the article a new turn. Vassilis Hajivassiliou
and Sarah Rupprecht discussed earlier conference versions. Two anonymous referees made extremely helpful,
detailed and constructive comments. This article has been presented in part or fully at the 2009 Panel
Data Conference in Bonn, the 2008 Econometric Society European Meeting in Mailand, the 2007 Deutsche
Bundesbank and Banque de France Spring Conference on Microdata Analysis and Macroeconomic
Implications in Eltville, the 2007 Annual Meeting of the Verein für Socialpolitik in Munich and the 2007
CES-Ifo Conference on Survey Data in Economics—Methodology and Applications, in Munich.
References
Anderson TW,HsiaoC (1982) Formulation and estimation of dynamicmodels using panel data. J Economet
18:47–82
Arellano M, Bond S (1991) Some tests of specification for panel data: Monte Carlo evidence and an application
to employment equations. Rev Econ Stud 58:277–297
Arellano M, Bover O (1995) Another look at the instrumental variable estimation of error component
models. J Economet 68:29–51
123
Panel estimation of state-dependent adjustment
Bayer C (2006) Investment dynamics with fixed adjustment costs and capital market imperfections. J Monetary
Econ 53:1909–1947
Blundell R, Bond S (1998) Initial conditions and moment restrictions in dynamic panel data models. J
Economet 87:115–143
Bond S, LombardiD (2007) To buy or not to buy? Uncertainty, irreversibility and heterogeneous investment
dynamics in Italian company data. IMF Staff Papers 53:375–400
Bond S, Elston JA, Mairesse J, Mulkay B (2003) Financial factors and investment in Belgium, France,
Germany, and the United Kingdom: a comparison using company panel data. Rev Econ Stat 85:153–
165
Caballero RJ, Engel EMRA (1999) Explaining investment dynamics in U.S. manufacturing: a generalised
(S,s) approach. Econometrica 67:783–826
Caballero RJ, Engel EMRA (2004) A comment on the economics of labor adjustment: mind the gap: reply.
Am Econ Rev 94:1238–1244
Caballero RJ, Engel EMRA, Haltiwanger JC (1995) Plant level adjustment and aggregate investment
dynamics. Brookings Papers Econ Act 1995(2):1–39
Caballero RJ, Engel EMRA, Haltiwanger JC (1997) Aggregate employment dynamics: building from
microeconomic evidence. Am Econ Rev 87:115–137
Chamberlain G (1983) Panel data, Chap 22. In: Griliches Z, Intriligator M (eds) The handbook of econometrics,
vol II. Amsterdam, North Holland pp 1247–1318
Cooper R,Willis JL (2004) A comment on the economics of labor adjustment: mind the gap. Am Econ Rev
94:1223–1237
Davidson R, MacKinnon JG (1993) Estimation and inference in econometrics. Oxford University Press,
New York
Doornik JA (2001) Ox 3.0. An object-oriented matrix programming language, 4th edn. Timberlake Consultants,
London
Doornik JA, Arellano M, Bond S (2002) Panel data estimation using DPD for Ox documentation accompanying
the DPD for Ox module code, dated 23 Dec 2002
Hansen L (1982) Large sample properties of generalized method of moments estimators. Econometrica
50:1029–1054
Hayashi F (2000) Econometrics. Princeton University Press, Princeton
Holtz-Eakin D, Newey WK, Rosen HS (1988) Estimating vector autoregressions with panel data.
Econometrica 56:1371–1395
Judge GG, Griffith WE, Hill RC, Lütkepohl H, Lee TC (1985) The theory and practice of econometrics.
2nd edn. Wiley, New York
Sargan JD (1958) The estimation of economic relationships using instrumental variables. Econometrica
26:393–415
von Kalckreuth U (2004) Financial constraints for investors and the speed of adaptation: are innovators
special? Deutsche Bundesbank Discussion Paper Series 1, No. 20/04
von Kalckreuth U (2006) Financial constraints and capacity adjustment: evidence from a large panel of
survey data. Economica 73:691–724
von Kalckreuth U (2008a) Financing constraints, micro adjustment of capital demand and aggregate implications.
Deutsche Bundesbank Discussion Paper Series 1, No 11/08
von Kalckreuth U (2008b) Financing constraints and the adjustment dynamics of enterprises. Habilitation
thesis, University of Mannheim, May 2008
Woodford M (2003) Interest and prices. Foundations of a theory of monetary policy. Princeton University
Press, Princeton
123