Empir Econ (2010) 38:23–45

DOI 10.1007/s00181-009-0254-1

ORIGINAL PAPER

Analysis of county employment and income growth

in Appalachia: a spatial simultaneous-equations

approach

Gebremeskel H. Gebremariam ·

Tesfa G. Gebremedhin · Peter V. Schaeffer

Received: 15 April 2006 / Accepted: 15 September 2008 / Published online: 31 January 2009

© Springer-Verlag 2009

Abstract County median household income and employment growth rates tend

to be characterized by spatial interaction. A spatial simultaneous-equations growth

equilibrium model was estimated using GS2SLS and GS3SLS. The results indicate

strong feedback simultaneity between employment and median household income

growth rates. They also show spatial autoregressive lag simultaneity and spatial

cross-regressive lag simultaneity with respect to employment and median household

income growth rates, as well as spatial correlation in the error terms. Estimates of

structural parameters show strong agglomerative effects and significant conditional

convergence with respect to employment growth and median household income growth

in Appalachia in the 1990 s.

Keywords Employment · Income · Spatial analysis · Appalachia

JEL Classification C3 · R1 · R5

1 Introduction

State policy makers and local leaders have long placed a high priority on local

economic development (Isserman 1993; Pulver 1989; Ekstrom and Leistritz 1988).

The changing structure of traditional industries and the impact of those changes on

local communities have challenged the efficacy of established policies and strategies.

G. H. Gebremariam

Department of Economics, Virginia Polytechnic Institute, State University, Blacksburg, USA

T. G. Gebremedhin · P. V. Schaeffer (B)

Division of Resource Management, West Virginia University, Morgantown, USA

e-mail: peter.schaeffer@mail.wvu.edu

123

24 G. H. Gebremariam et al.

A better understanding of factors that influence local employment, earning capacity,

and quality of life issues has therefore become important for state, regional, and local

agencies in charge of rural development policies. One of the policy challenges at the

local and county level is spatial interdependencies. Outside of the far west, counties

are small and in most instances cannot be thought of as even rough approximations

of labor markets, and often as market areas for many consumer goods, either. This

is expressed by the mean travel time to work in a rural state such as West Virginia,

which in 2005 ranked fourteenth in the nation, with 24.9min just below the national

average of 25.1min, and ahead of more urbanized states such as Ohio and Michigan

(US Census Bureau 2005).We single outWest Virginia because of its rural nature and

because it is the only state completely contained within Appalachia. The twelve states

with counties in Appalachia have commuting times (one way) of 22.4min or higher.

Dealing with spatial interdependencies is therefore one of the major objectives of this

research.

Many of the forces responsible for past economic and social changes continue to

have an impact. One of these changes was the emergence of computer-based technology

in production, administration and information, which has reduced the role of

economies of scale in many sectors. Studies by Loveman and Sengenberger (1991)

and Acs and Audretsch (1993), for example, have shown a shift in industry structure

toward decentralization and an increased role for small firms. This was mainly due

to changes in production technology, consumer demand, labor supply, and the pursuit

of flexibility and efficiency. These factors led to the restructuring and downsizing of

large enterprises and the entry of newfirms. Brock and Evans (1989) provide extensive

documentation of the changing role of small businesses in the US economy, which are

likely the result of responses to structural adjustments.

Parallel with technical changes leading to new industrial structures, new patterns

of consumer expenditures and demand resulting from rising living standards contributed

to the emergence of fragmented consumer markets, which also favored small

consumer-oriented firms over high volume, production-oriented firms. Thus, new

business opportunities in small and medium size enterprises resulted as large firms

downsized in response to a changing environment. The emerging view among policy

makers is that small business is a key element and driving force in generating employment

and realizing economic development. This paradigm shift has brought about a

revival in small businesses promotion and entrepreneurial initiatives at local, national

and international levels.

Most new businesses start small and small businesses create the majority of new

jobs (Acs and Audretsch 2001; Audretsch et al. 2000; Carree and Thurik 1998, 1999;

Fritsch and Falck 2003; Reynolds 1994; Wennekers and Thurik 1999). A growing

literature has explored the determinants of the regional variations in new business formation

(Acs and Armington 2003; Audretsch and Fritsch 1994; Callejon and Segarra

2001; Davidson et al. 1994; Fotopoulos and Spencer 1999; Fritsch 1992; Garofoli

1994; Guesnier 1994; Hart and Gudgin 1994; Johnson and Parker 1996; Kangasharju

2000; Keeble and Walker 1994; Reynolds 1994).

The geographic impact of the change from large production-oriented plants to

smaller consumer-oriented firms and plants is uncertain.While smaller unitswould tend

to make rural production sites relatively more competitive, the consumer-orientation,

123

Analysis of county employment and income growth in Appalachia 25

which tends to favor locations close to markets, is more likely to have the opposite

effect. Hence, it is not possible to predict the impact of the changes discussed above

on the geographic distribution of economic activity a priori.

The literature on economic growth at the regional level has focused attention on

the so-called convergence hypothesis of neoclassical growth theory which predicts

that poorer regions tend to catch up with the richer regions in per capita income as

time passes, through the process of factor mobility. Because of the spatial structure of

our model, we can test for convergence. Previous studies by Barro and Sala-i-Martin

(1992, 2004) for US states, Japanese prefectures and between European countries,

and by Persson (1997) and Aronsson et al. (2001) across Swedish counties, found

income evidence of convergence. Similar studies by Arbia et al. (2005) of 92 Italian

provinces (1970–2000), Ertur et al. (2006) of 138 European regions (1980–1995), and

Rappaport (1999) ofUS counties (1970–1990), also found income convergence. However,

a study by Glaeser et al. (1995) did not discover significant evidence of income

convergence between US cities. Of particular interest are two papers by Higgins et al.

(2006) and Young et al. (2008) that looked at per capita income net of government

transfers in US counties in all fifty states from 1970 to 1998. They found a speed of

convergence of between 6 to 8%, considerable faster than the approximately 2% typically

reported. Higgins et al. (2006) also found a much a faster speed of convergence

in counties located in southern than in northeastern states.

The relationship between economic growth and its determinants has been studied

extensively. One issue is whether population is driving employment changes or

employment is driving population changes (do ‘jobs follow people’ or ‘people follow

jobs’?). Empirical studies on identification of the direction of causality have resulted

in empirical models of regional development that often reflect the interdependence

between household residential choices and firm location choices (Steinnes and Fisher

1974). To account for this causation and interdependency, Carlino and Mills (1987)

constructed a simultaneous system model with two partial location equations as its

components. They used data for counties in the contiguous United States. The empirical

result from their study of greatest interest to us is the finding that in the 1970 s family

income had a strong impact on the growth of population density as well as employment

density. Recently, Deller et al. (2001) expanded the original Carlino-Mills model and

presented a three-dimensional model (jobs-people-income) that explicitly traces job

quality and the role of income in the regional growth process. They also used county

data, but restricted themselves to non-metropolitan counties; the time period studied

was 1985–1995. Their empirical results indicate that initial conditions co-determine

the eventual outcome and that counties with higher initial population levels tended to

have higher employment growth. However, counties that had higher levels of population,

employment, and per capita income in 1985 tended to have lower rates of overall

growth.

There have also been efforts to model the interactions between employment growth

and human migration (Clark and Murphy 1996; MacDonald 1992), per capita personal

income and public expenditures (Duffy-Deno and Eberts 1991), and net migration,

employment growth, and average per capita income (Greenwood and Hunt

1984; Greenwood et al. 1986; Lewis et al. 2002) in simultaneous-equations models.

Among these contributions, Clark and Murphy’s (1996) findings have been particularly

123

26 G. H. Gebremariam et al.

influential. Their empirical analysis covered the period 1981–1989 and was conducted

at the county level. They expanded the Carlino-Mills model by including amenity measures

beyond climate (temperature), neighborhood poverty, and fiscal variables. Their

results are consistent with those of Carlino and Mills (1987) and, specifically, they

find simultaneity between employment density and population density.

The focus of this empirical analysis is Appalachia, a region that is for many a symbol

of poverty and underdevelopment in the midst of prosperity (Pollard 2003). It is a

region of about 23 million people. Forty-two percent of the population is rural, compared

to 20% for the nation as a whole. Many parts of the region can also be considered

remote because of topography and a comparatively poor transportation infrastructure.

Appalachia also constitutes a separate policy region, with programs administered by

the Appalachian Regional Commission. The unit of analysis is the county, so that we

can trace local economic development in terms of employment and income growth

data, respectively. The time period considered is 1990–2000. This was a decade of

economic growth and expansion in most of the United States. It is of interest to study

if and/or how the boom of the 1990 s impacted Appalachian counties.

Like the studies mentioned above, this article examines the determinants of regional

variations in employment and household income growth rates using county data.

Its novel contribution lies in a methodological innovation. Specifically, the model

introduces both spatial lag and spatial error dependence into a simultaneous equation

model and obtains estimation results using Generalized Spatial 3 Stage Least Squares

(GS3SLS). This has not been previously done and yields more efficient and consistent

estimates. The estimation strategy is discussed in the estimation issue section.

2 Method of analysis

Interdependence between employment and income exists because both households and

firms are mobile and locate to maximize utility and profits, respectively. Households

migrate if they can capture better income opportunities than those available at their

current location and firms move to be near growing markets. The location decisions

of firms are also expected to be influenced by factors such as local business climate,

labor costs, tax rates, local public services and the supply of inputs. In addition,

government-provided incentives may influence where firms locate. Such regional

factors that affect households’ and firms’ decision making are also likely to exhibit

spatial autocorrelation (Anselin 1988, 2003). These assumptions are expressed as

three hypotheses to be tested: (1) Employment growth and median household income

growth are interdependent and jointly determined by regional variables; (2) Employment

growth and median household income growth in a county are conditional upon

initial conditions of that county; and (3) Employment growth and median household

income growth in a county are conditional upon business and median household

income growth in neighboring counties. Emphasis is put on determining the linkages

between employment growth and household median income, as well as on examining

the elasticity of these variables with respect to each of the regional variables.

To test the three hypotheses, a spatial simultaneous equations model of business

growth and householdmedian income is used. Following Carlino and Mills (1987) and

123

Analysis of county employment and income growth in Appalachia 27

building on Boarnet (1994), a model that incorporates own-county and neighboring

counties effects is specified as follows in matrix notation:

EMP∗

i t

= f1 MHY∗

t ,WMHY∗

t ,WEMP∗

t ,

Xem

t−1 (1a)

MHY∗

i t

= f2 EMP∗

t ,WEMP∗

t ,WMHY∗

t ,

Xmh

t−1 (1b)

EMP∗

t and MHY∗

t are the equilibrium levels of private non-farm employment and

median household income, respectively, and t denotes time. W is a row standardized

spatial weights matrix with typical element wi j . Each element wi j represents a measure

of proximity between location i and location j . We define the adjacency criteria

such that wi j equals 1/ni ; ni is the number of nonzero elements in the ith row of W.

The row element is nonzero if location i and j are adjacent and 0 otherwise.WEMP∗

t

and WMHY∗

t represent the equilibrium values of neighboring counties’ effects for

private non-farm employment and median household income, respectively. They are

obtained by multiplying EMP∗

t and MHY∗

t , respectively, with W. The matrices of

additional exogenous variables in the respective equations of the system of spatial

simultaneous equations are given by Xem

t−1 and Xmh

t−1, respectively. The descriptions

of these variables are given in the data section below. Note that equilibrium levels

of private non-farm employment and median household income are assumed to be

functions of the equilibrium values of the respective right-hand endogenous variables,

their spatial lags and the vectors of the additional exogenous variables.

The system of equations in (1a, b) captures the simultaneous nature of the interactions

between employment growth and median household income at equilibrium. The

nature of interaction among the endogenous variables depends on the initial conditions

in a county.

Based on the result of a generalized PE-test, a multiplicative log-linear form of the

model was used. The model specification is discussed in greater detail in the section

“Estimation Issues.” The chosen functional form implies constant elasticity for the

equilibrium conditions given in (1a,b). A log-linear (i.e., log-log) representation of

the equilibrium conditions can thus be expressed as:

EMP∗

t

= MHY∗

t a1 × WEMP∗

t b1 × WMHY∗

t c1 ×

K1

k=1

Xem

kt−1 x1k (2a)

MHY∗

t

= EMP∗

t a2 × WMHY∗

t b2 × WEMP∗

t c1 ×

K2

k=1

Xem

kt−1 x2k (2b)

where ai , bi and ci i = 1, 2 are the exponents on the endogenous variables and their

spatial lags, xikq for i, q = 1, 2 are vectors of exponents on the exogenous variables,

is the product operator, and Ki for i = 1, 2 is the number of exogenous

variables in the private non-farm employment and median household income

equations, respectively. The log-linear specification has the advantage of yielding a

log-linear reduced form for estimation, where the estimated coefficients represent elasticities.

Duffy-Deno (1998) and Mackinnon et al. (1983) also show that, compared to a

123

28 G. H. Gebremariam et al.

linear specification, a log-linear specification ismore appropriate formodels involving

population and employment densities.

Previous empirical studies suggest that employment and median household income

likely adjust to their equilibrium levels with a substantial lag (Aronsson et al. 2001;

Barkley et al. 1998; Boarnet 1994; Carlino and Mills 1987; Deller et al. 2001; Duffy

1994; Duffy-Deno 1998; Edmiston 2004; Hamalainen and Bockerman 2004; Henry

et al. 1999, 1997; Mills and Price 1984). Therefore, based on these studies, a distributed

lag adjustment is introduced and the corresponding partial-adjustment process

for Eqs. (1a,b) takes the form:

EMPt

EMPt−1

=

EMP∗

t

EMPt−1

ηem

→ ln(EMPt )

− ln (EMPt−1) = ηem ln EMP∗

t − ηem (EMPt−1) (3a)

MHYt

MHYt−1

=

MHY∗

t

MHYt−1

ηmh

→ ln(MHYt )

− ln(MHYt−1) = ηmh ln MHY∗

t − ηmh ln(MHYt−1) (3b)

The subscript t − 1 refers to the variable lagged one period, one decade in this study,

and ηem and ηmh are parameters representing the speed of adjustment of employment

and median household income to their respective equilibrium levels. They are interpreted

as the proportions of the respective equilibrium rate of growth that were realized

in each period. If both ηem and ηmh are less than one, then the system is stable and

guaranteed to converge.

The existence of spatial autocorrelation in the errors is tested by means of a Global

Moran’s I test statistic, as suggested by Anselin and Kelejian (1997) for models with

endogenous regressors. A more general version of Moran’s I test statistic and its

asymptotic distribution is given by Kelejian and Prucha (2001). The results of the test

(Table 2) indicate the existence of spatial autocorrelation in the errors of all equations

in (3a, b). Therefore, we need a model that accounts for this spatial effect.We achieve

this by substituting Eqs. (2a, b) into Eqs. (3a, b). Eliminating the unknown equilibrium

values and simplifying the model yields the following system:

EMPRt = α1 + ηema1

ηmh

MHYRt + ηemb1

ηem

WEMPRt + ηemc1

ηmh

WMHYRt

+ηema1 ln(MHYt−1)+ηemb1 ln(WEMPt−1)

+ηemc1 ln(WMHYt−1)

+

K1

k=1

ηem x1kln Xem

kt−1 − ηem ln(EMPt−1) + uem

t (4a)

MHYRt = α2 + ηmha2

ηem

EMPRt + ηmhb2

ηmh

WMHYRt + ηmhc2

ηem

WEMPRt

+ ηmha2 ln(EMPt−1)+ηmhc2 ln(WEMPt−1)

123

Analysis of county employment and income growth in Appalachia 29

+ηmhb2 ln(WMHYt−1)

+

K2

k=1

ηmhx2kln Xge

kt−1 − ηmh ln(MHYt−1) + umh

t (4b)

EMPRt and MHYRt are the log differences between the end and beginning period

values of private non-farm employment and median household income, respectively,

and denote the growth rates of the respective variables. αr and ρr, for r = 1, 2, are

unobserved parameters. uem

t and umh

t are n ×1 vectors of disturbances? Note that the

disturbance vector in the r th equation is generated as:

ut,r = ρrWut,r + εt,r , r = 1, 2

This specification relates the disturbance vector in the r th equation to its own spatial

lag. The vectors of innovations (εi t,r , r = 1,2 or εem

t and εmh

t ) are distributed identically

and independently with zero mean and variance-covariance σ2

r , for r = 1, 2.

Hence, they are not spatially correlated. The specification of the mode, however, allows

for innovations that correspond to the same cross sectional unit to be correlated across

equations. As a result, the vectors of disturbances are spatially correlated across units

and across equations.

Equations (4a, b) constitute a system of simultaneous equations with feedback

simultaneity, spatial autoregressive lag simultaneity, spatial cross-regressive lag simultaneity,

and spatial autoregressive disturbances.The endogenous variables of themodel

are EMPRt and MHYRt . If each equation is investigated separately, we notice that

each of these variables is expressed in terms of the right hand endogenous variables

and their spatial lags, the logs of the lagged endogenous variables and their spatial

lags, and the logs of other exogenous variables. By structure, the spatial lags of the

lagged endogenous variables are, however, included in the spatial lags of the respective

endogenous variables. Hence, in order to avoid multicollinearity, the model is

estimated by excluding all the spatial lags of the lagged endogenous variables.

3 Data types and sources

The data for the 417 Appalachian counties used for the empirical analysis were collected

and compiled from County Business Patterns, Bureau of Economic Analysis,

Bureau of Labor Statistics, Current Population Survey Reports, County and City Data

Book, US Census of Population and Housing, US Small Business Administration,

and Department of Employment Security. Data for county employment and county

median household income were collected for 1990 and 2000.

3.1 Dependent variables

The dependent variables used in the empirical analysis include the growth rate of

employment and the growth rate of median household income.

123

30 G. H. Gebremariam et al.

3.1.1 Growth rate of employment (EMPR)

The growth rate of employment is measured by the log-difference between the 2000

and the 1990 levels of private non-farm employment, exclusive of self-employment.

Empirical research indicates that in the study period most new jobs were generated

by new small businesses (Acs and Audretsch 2001; Audretsch et al. 2000; Carree and

Thurik 1998, 1999; Wennekers and Thurik 1999; Fritsch and Falck 2003). Research

by the US Small Business Administration also shows that job creation capacity in the

US is inversely related to the size of the business. Between 1991 and 1995, for example,

enterprises employing fewer than 500 people created new jobs as follows (size

of enterprise in parenthesis): 3.843 million (1–4), 3.446 million (5–19), 2.546 million

(20–99), and 1.011 million (100–499). During the same period, enterprises employing

500 or more people lost 3.182 million net jobs (US Small Business Administration

(SBA) 1999).

3.1.2 Growth rate of median household income (MHYR)

The log-difference between the 2000 and 1990 levels ofmedian household income in a

given county is used to measure the growth rate of median household income. Median

household income is used as an average overall measure of county-level income.

Median household income is preferable to using the mean household income because

unlike the mean, the median is not influenced by the presence of a few extreme values.

The spatial lags of the Growth Rate of Employment (WEMPR) and Growth Rate

of Median Household Income (WMHYR) are included on the right hand side of each

equation of (4a, b). These spatially lagged endogenous variables are created by multiplying

each of the dependent variables by a row standardized queen-based contiguity

spatial weights matrix W.

3.2 Independent variables

The independent variables include demographic, human capital, labor market, housing,

industry structure, and amenity and policy variables. In line with the literature,

unless otherwise indicated, the initial values of the independent variable are used in

the analysis. This type of formulation also reduces the problem of endogeneity. All

the independent variables are in log form except those that can take negative or zero

values. The descriptions of each of the independent variables of the models are given

below.

Equation (4a) includes a vector of control variables (Xem

kt−1) for k = 1, . . . , K1,

which includes human capital, agglomeration effects, unemployment, and other

regional socio-economic variables that are assumed to influence county employment

growth (business growth) rate. Human capital is measured as the percentage of adults

(over 25 years old) with college degrees and above (POPCD), and the percentage of

adults (over 25 years old) with high school diploma (POPHD). It is expected that educational

attainment is positively associated with employment growth. To control for

agglomeration effects from both the supply and demand sides, county population size

123

Analysis of county employment and income growth in Appalachia 31

(POPs) and the percentage of the population between 25 and 44 of age (POP25-44)

are included and it is expected that agglomeration effects to have a positive impact

on employment growth. The county unemployment rate (UNEMP) is included as a

measure of local economic distress. Although a high county unemployment rate is

normally associated with a poor economic environment, it may provide an incentive

for individuals to form new businesses that can employ not only the owners, but also

others. Thus, we do not know a priori whether the impact of UNEMP on employment

growth is positive or negative. Establishment density (ESBd), which is the total number

of private sector establishments in the county, divided by the county’s population,

is included to capture the degree of competition among firms and crowding of businesses

relative to the population. The coefficient of ESBd is expected to be negative.

Vector Xem

kt−1 also includes OWHU (owner occupied housing) to capture the effects

of the availability of resources to finance businesses and create jobs on employment

growth in the county. The percentage of owner-occupied dwellings is expected to be

positively associated with employment growth in the county. Also included in Xem

kit

are property tax per capita (PCPTAX), percentage of private employment in manufacturing

(MANU), percentage of private employment in wholesale and retail trade

(WHRT), natural amenities index (NAIX), highway density (HWD), gross in-migration

(INM), gross out-migration (OTM), median household income (MHY), and direct local

government expenditures per capita (GEX). Since the percentage of the populations

between 5 and 17 years of age (POP5-17) and above 65 years of age (POP > 65)

do not constitute the prime working age of the population, they are not included in

Eq. (4a). Direct federal expenditures and grants per capita (DFEG) in Appalachia have

been mainly income support in the form of Food Stamps, Social Security Disability

Insurance (SSDI), Temporary Assistance for Needy Families (TANF), and Supplemental

Security Income (SSI) and hence not directly related to employment creation

(Black and Sanders 2004). Homeownership (OWHU) and the social capital index

(SCIX) are highly correlated. In order to avoid the problem of multicolinearity, SCIX

is not included in Eq. (4a). SCIX is a county-level index that incorporates associational

density of associations such as civic groups, religious organizations, sport clubs, labor

unions, political and business organizations, percentage of voters who vote for presidential

elections, county-level response rate to the Census Bureau’s decennial census,

and the number of tax-exempt non-profit organizations (Rupasingha et al. 2006).

We also use the natural amenity index created by McGranahan (1999) from standardized

mean values of climate measures (January temperature, January days of sun,

July temperature, and July humidity), topographic variation and water area as proportion

of county area (see http://www.ers.usda.gov/Data/NaturalAmenities/natamenf.

xls). Note that since both SCIA and NAIX are indices of many exogenous variables,

they will constitute important parts of the instrument matrix that will be used to identify

the endogenous variables of the system.

Equation (4b) contains a vector of exogenous variables (Xmh

kt−1, k = 1, . . . , K2),

which includes, among others, POPs, POPd, FHHF, POPHD, UNEMP, MANU,

WHRT, and Social Capital Index (SCIX).

The initial levels of employment (EMPt −1) and median household income

(MHYt−1) are also included in the respective equations of (4a, b). These variables

are treated as predetermined variables because their values are given at the

123

32 G. H. Gebremariam et al.

beginning of each period and hence are not affected by the endogenous variables.

Table 1 provides the full list of the endogenous, and of the spatial lag and control

variables, their descriptions and the sources of the data.

4 Estimation issues

Equations (4a, b) constitute amodel with feedback simultaneity, spatial autoregressive

lag simultaneity, and spatial cross-regressive lag simultaneity with spatially autoregressive

disturbances. This creates complications, ofwhich the choice of the functional

form of each equation, whether or not each equation is identified, and the choice of

the estimator and instruments are the most important ones.

Concerning the functional form, a generalized PE test was performed (Kmenta

1986, pp. 521–522; Mackinnon et al. 1983) to determine whether a linear or log-linear

specification is most appropriate. The test indicates that the log-linear specification is

preferred to the linear form for all equations. Thus, the model is specified in log-linear

form with two modifications involving the measurement of the explanatory variables.

First, the natural log formulation is dropped for explanatory variables that can assume

negative or zero values. Second, lagged 1990 values are used for all explanatory variables

to avoid simultaneity bias.

Concerning identification, first, for each equation, the number of basic endogenous

variables that appear on the right hand side is smaller than the number of control variables

that appear in the model but not in that equation. Second, in those cases where

there are more instruments than needed to identify an equation, a test statistic1 was

computed (Hausman 1983) to investigate whether the additional instruments are valid

in the sense that they are uncorrelatedwith the error term. That is E(Q ur ) = 0,where

Q is an instrument matrix as defined below. Fulfillment of this condition ensures that

the instrument Q allows us to identify the regression parameters [α

, β

, λ

, γ

] of

Eqs. (4a, b), where α

is a vector of slope coefficients and β

, λ

, γ

are vectors of

coefficients of the right-hand dependent variables, the spatial lag variables, and the

predetermined variables, respectively.

As to the choice of estimator, the Method of Moments is preferred over the

Maximum Likelihood approach because the latterwould involve significant additional

computational complexity.2 The conventional three-stage least squares estimation to

1 This test statistic is nR2

u , where n is the sample size and R2

u is the usual R-squared of the regression of

residuals from the second-stage equation on all included and excluded instruments. In other words, estimate

Eqs. (4a, b) by GS2SLS or any efficient limited-information estimator and obtain the resulting residuals,

ˆ ur . Then, regress these on all instruments and calculate nR2

u . The statistic has a limiting chi-squared distribution

with degree of freedom equal to the number of over-identifying restrictions, under the assumed

specification of the model.

2 In theMaximum Likelihood approach, the probability of the joint distribution of all observations is maximized

with respect to a number of parameters. This involves the calculation of the Jacobian that appears in

the log-likelihood function, which is computationally challenging. The complexity becomes overwhelming

if the sample size is large, which applies in our case, and if the spatial weights matrices are not symmetric,

which also applies in our case, even if the sample size is moderate (Kelejian and Prucha 1999, 1998).

We also do not expect the error terms in our model to be normally distributed, which is required for the

Maximum Likelihood procedure.

123

Analysis of county employment and income growth in Appalachia 33

Table 1 Descriptive statistics

Variable code Variable description Mean SD Minimum Maximum

Constant 1.00 0.00 1.00 1.00

EMPR Employment Growth Rate

1990–2000

0.17 0.25 −0.69 1.79

MHYR Median Household Income Growth

Rate 1990–2000

0.48 0.31 −0.49 1.40

WEMPR Spatial Lag of EMPR 0.18 0.14 −0.18 0.81

WMHYR Spatial Lag of MHYR 0.47 0.19 −0.11 1.02

POPs Population, 1990 10.30 0.94 7.88 14.11

POPd Population Density, 1990 4.28 0.90 1.85 7.75

POP5-17 Percent of Population between

5–17Years, 1990

2.92 0.12 2.17 3.22

POP25-44 Percent of Population between

25–44Years Old, 1990

3.38 0.08 2.79 3.74

POP > 65 Percent of Population above

65Years Old, 1990

2.60 0.20 1.55 3.20

FHHF Percent of Female Householder,

Family Householder, 1990

2.32 0.20 1.81 3.19

POPHD Persons 25Years and over, % High

School only, 1990

4.10 0.17 3.57 4.47

POPCD Persons 25Years and over, %

Bachelor’s Degree or above, 1990

2.27 0.41 1.31 3.73

OWHU Owner-Occupied Housing Unit in

Percent, 1990

4.33 0.08 3.87 4.47

MHV Median Value of Owner Occupied

Housing 1990

10.74 0.26 9.67 11.68

UNEMP Unemployment Rate 1990 2.15 0.35 1.22 3.25

AGFF % Employed in Agriculture,

Forestry and Fisheries 1990

3.62 2.66 0.00 17.10

MANU % Employed in Manufacturing

1990

3.14 0.57 0.79 3.98

WHRT % Employed in Wholesale and

Retail Trade 1990

2.92 0.19 2.16 3.32

FIRE % Employed Finance, Insurance

and Real Estate 1990

1.23 0.33 0.00 2.23

HLTH % Employed Health Service 1990 1.95 0.34 0.74 3.44

NAIX Natural Amenities Index 1990 0.14 1.16 −3.72 3.55

ESBd Establishment Density 1990 2.93 0.34 1.87 4.09

EFIR Earnings in Finance Insurance and

Real Estate 1990

21075.08 96011.09 0.00 1638807.0

CSBD Commercial and Saving Banks

Deposits 1990

12.21 1.07 8.83 16.95

DFEG Direct Federal Expenditure and

Grants per Capita 1990

7.99 0.38 6.98 10.18

FGCE Federal Government Civilian

Employment per 10,000 Pop.

1990

60.48 101.03 0.00 1295.00

PCTAX Per Capital Local Tax 1990 5.91 0.53 4.51 7.42

PCPTAX Property Tax Per Capita 1990 5.52 0.62 3.91 7.36

SCIX Social Capital Index 1987 −0.60 0.94 −2.53 5.64

HWD Highway Density 1990 0.69 0.40 −0.34 2.63

123

34 G. H. Gebremariam et al.

Table 1 continued

Variable code Variable description Mean SD Minimum Maximum

ESBs Establishment Size 1990 2.53 0.30 1.49 3.60

AWSR Average Annual Wage and Salary Rate 1990 9.75 0.19 9.31 10.35

EMP Employment 1990 8.83 1.25 5.42 13.38

INM In-Migration 1990 7.09 1.00 4.54 10.52

OTM Out-Migration 1990 7.04 0.97 4.50 10.55

MHY Median Household Income 1989 9.94 0.23 9.06 10.68

GEX Direct General Expenditures per Capita 1992 7.23 0.28 6.49 8.11

All variables are expressed in logs except AGFF, EFIR, FGCE, SCIX, and NAIX

handle the feedback simultaneity is inappropriate, because of the spatial autoregressive

lag and spatial cross-regressive lag simultaneities terms. The Spatial Generalized

Methods of Moments approach used by Rey and Boarnet (2004) in a Monte Carlo

analysis of alternative approaches to modeling spatial simultaneity is also inappropriate,

because the model includes spatially autoregressive disturbances. Therefore,

we use the Generalized Spatial Two-Stage Least Squares (GS2SLS) as suggested by

Kelejian and Prucha (1998, 1999), and the Generalized Spatial Three-Stage Least

Squares (GS3SLS) approach as outlined by Kelejian and Prucha (2004).

TheGS2SLS and GS3SLS procedures are carried out in three and four step routines,

respectively. The first three steps are common to both routines. In the first step, the

parameter vector α

, β

, λ

, γ

is estimated by two stage least squares (2SLS), using

an instrument matrix Q that consists of a subset of linearly independent columns

X,WX,W2X, where X is the matrix that includes the control variables in the model.

W is a weights matrix. The disturbances for each equation in the model are computed

using the estimates of α

, β

, λ

, γ

from the first step. In the second step, the estimates

of the disturbances are used to estimate the autoregressive parameter ρ for each

equation, using Kelejian and Prucha (2004) generalized moments procedure. In the

third step, a Cochran–Orcutt-type transformation is performed, using the estimates for

ρ from the second step to account for the spatial autocorrelation in the disturbances.

The GS2SLS estimates of [β

, λ

, γ

] are then obtained by estimating the transformed

model using a subset of the linearly independent columns of [X,WX,W2X] as the

instrument matrix.

Although the GS2SLS takes the potential spatial correlation into account, it does

not utilize the information available across equations because it does not account for

the potential cross equation correlation in the innovation vectors (εem

i t , εmh

i t ). The correlation

coefficient between the residuals of the GS2SLS (εem

i t and εmh

i t ) is given in

Table 2. The full system information is utilized by stacking the Cochran–Orcutt-type

transformed equations (from the second step) in order to jointly estimate them. Thus,

in the fourth step, theGS3SLS estimates of the betas, lambdas, and gammas [β

, λ

, γ

]

are obtained by estimating this stacked model. The GS3SLS estimator is more efficient

than theGS2SLS estimator. Further, consistent estimates of the covariance matrix

are used to obtain the Feasible Generalized Three-Stage Least Squares (FGS3SLS)

estimators of α

, β

, λ

, γ

.

123

Analysis of county employment and income growth in Appalachia 35

Table 2 Correlation matrix of

the residuals from generalized

spatial two-stage least squares

(GS2SLS) estimation of the

model

Equation 1 Equation 2

Equation 1 1.0000

Equation 2 −0.3974 1.0000

5 Discussion and analysis of results

The GS2SLS and GS3SLS parameter estimates of the system represented by

Eqs. (4a, b) are reported in Table 3. These values are consistent with theoretical

expectations and with the results of many other cross-sectional empirical studies (Boarnet

1994; Deller et al. 2001; Henry et al. 1997). The coefficients of the endogenous

variables (EMPR and MHYR) are positive and statistically significant, indicating

strong interdependence between employment and median household income growth

rates. This interdependence is consistent with economic theory and empirical results.

Increases in the demand for goods and services that result from increases in family

median or per capita income are associated with increases in employment (Armington

and Acs 2002), which create opportunities for even more people to work and earn

income. However, the effect of median household income growth on employment

growth is stronger than that of employment growth on median household income

growth.

In the business employment (EMPR) equation, fifteen of the coefficient estimates

are significantly different from zero at the 10% level or better. The results suggest a

positive and significant parameter estimate for the spatial autoregressive lag variable

(WEMPR). This indicates that employment growth tends to spill over to neighboring

counties. The results also show a negative coefficient for (WEMPR) in the (MHYR)

equation, indicating that employment growth rates in neighboring counties tend to

unfavorably affect median household income growth rates (MHYR) in a given county.

These estimates are important for policy because they indicate that employment growth

in neighboring counties has positive and negative spillover effects on a given county’s

EMPR and MHYR, respectively. Furthermore, the significant spatial lag effects indicate

that EMPR not only depends on characteristics within the county, but also on

those of its neighbors. Hence, spatial effects should be tested empirically involving

employment growth rates and household income growth rates. Our model specification

incorporates a spatially autoregressive spatial process besides the spatial lag in the

dependent variables. The negative estimate for ρ1 (see Table 3) indicates that random

shocks to EMPR do not only affect the county where the shocks originated and its

neighbors, but also create negative shock waves across Appalachia.

To control for agglomeration effects, the model includes population statistics, such

as the initial county population size (POPs) and the percentage of population between

25 and 44 years old (POP25_44). The result shows that both POPs and POP25_44

have positive and significant effects on EMPR, even after accounting for potential spatial

spillover effects. This result is consistent with the literature (Acs and Armington

2004) which indicates that a growing population increases the demand for consumer

goods and services as well as the pool of potential entrepreneurs which encourage

business formation. This result is important from a policy perspective. It indicates

123

36 G. H. Gebremariam et al.

Table 3 Generalized spatial 2SLS (GS2SLS) and full information generalized spatial 3SLS (GS3SLS)

estimation results

Variables GS2SLS GS3SLS

EMPR Equation MHYR Equation EMPR Equation MHYR Equation

Coefficient t-statistic Coefficient t-statistic Coefficient t-statistic Coefficient t-statistic

Constant −7.5180∗∗∗ −4.07 7.7602∗∗∗ 3.95 −8.53228∗∗∗ −5.01698 8.6547∗∗∗ 4.714

EMPR 0.2825 1.66 0.6156∗∗∗ 4.0457

MHYR 0.1685 1.59 0.3735∗∗∗ 3.8956

WEMPR 0.2492∗ 1.94 −0.1423 −0.98 0.2792∗∗ 2.2949 −0.2694∗ −1.7

WMHYR 0.1657 1.44 −0.0559 −0.43 0.1147 1.1999 −0.1063 −0.8495

POPs 0.8367∗∗∗ 4.32 0.0877 0.78 0.7724∗∗∗ 4.3572 −0.0299 −0.2807

POPd −0.0101 −0.3 −0.0123 −0.4054

POP5-17 −0.1566 −0.9 −0.1072 −0.6642

POP25-44 0.2806 1.48 0.3093∗ 1.807

POP > 65 0.1046 0.98 0.1576 1.6024

FHHF −0.0031 −0.03 −0.0034 −0.3856

POPHD −0.1589 −1.03 −0.2439 −1.15 −0.1487 −1.0167 −0.1556 −0.7667

POPCD 0.0561 1 −0.0989 −1.35 0.0789 1.4827 −0.1147 −1.6361

OWHU −0.4079∗ −1.77 −0.368∗ −1.76

MHV −0.0309 −0.32 0.0955 0.76 −0.0483 −0.5198 0.0763 0.6308

UNEMP −0.0825∗∗ −2.05 0.0442 0.79 −0.079∗∗ −2.0599 0.0706 1.3197

AGFF −0.0055 −1.11 0.0025 0.38 −0.006 −1.2612 0.0032 0.5017

MANU 0.0856∗∗ 2.65 −0.0008 −0.02 0.0772∗∗ 2.5484 −0.0324 −0.8124

WHRT 0.3734∗∗∗ 4.5 −0.0727 −0.65 0.3719∗∗∗ 4.7178 −0.1916∗ −1.8012

FIRE 0.0177 0.39 −0.0471 −0.86 0.0282 0.6542 −0.0616 −1.168

HLTH −0.0079 −0.2 0.0297 0.56 −0.0157 −0.4067 0.0277 0.5475

NAIX 0.0072 0.72 −0.0063 −0.47 0.0062 0.645 −0.0064 −0.4944

ESBd 0.7049∗∗∗ 3.82 0.0242 0.27 0.6574∗∗∗ 3.9138 −0.0495 −0.5689

EFIR −1.05216D-08 −0.09 −1.16242D-08 −0.1113

CSBD 0.0406 1.14 0.0304 0.9565

DFEG 0.0002 0.01 −0.0071 −0.1973

FGCE 0.0001 0.6 4.78E-05 0.5158

PCTAX −0.0706 −1.25 −0.062 −1.2314

PCPTAX 0.0108 0.26 0.01095 0.2924

SCIX 0.0439∗ 1.7 0.046∗ 1.974

HWD −0.002 −0.04 −0.0062 −0.1303

ESBs 0.5536∗∗ 2.87 0.5345∗∗∗ 3.0658

AWSR 0.0912 0.94 0.0822 0.9521

EMP −0.8647∗∗∗ −4.7 −0.0223 −0.28 −0.8151∗∗∗ −4.8863 0.0941 1.2818

INM 0.1122 1.38 −0.1245 −1.25 0.1424∗ 1.8427 −0.1792∗ −1.8725

OTM −0.1382 −1.65 0.0693 0.65 −0.1401∗ −1.7571 0.1248 1.215

MHY 0.2334 1.32 −0.7671∗∗∗ −4.35 0.3636∗∗ 2.2161 −0.7976∗∗∗ −4.7331

GEX 0.0608 1.33 0.0684 1.24 0.04105 0.9472 0.0477 0.8971

Rho (ρ) −0.0428 0.1913 −0.0428 0.1913

123

Analysis of county employment and income growth in Appalachia 37

Table 3 continued

Variables GS2SLS GS3SLS

EMPR Equation MHYR Equation EMPR Equation MHYR Equation

Coefficient t-statistic Coefficient t-statistic Coefficient t-statistic Coefficient t-statistic

nR2∼χ2

(30,36)a 46.4608 0.02807b 39.1464 0.3305b 46.4608 0.02807b 39.1464 0.3305b

Moran I −0.2058 −5.0284c 0.1336 3.0753c −0.2058 −5.0284c 0.1336 3.0753c

Eta (η) 0.8647 0.7671 0.8151 0.7976

Half-Life (years) 8.47 8.65 8.47 8.65

PE test log log log log

n 417 417 417 417

*, **, and *** denote statistical significance level at the 10, 5, and 1%, respectively

a 30, 36 represent the degree of freedoms which are equal to the over-identifying restrictions in the EMPR, MHYR

equations, respectively

b p-values

c Z-values for Moran I

that counties with high population concentration are benefiting from the resulting

agglomerative and spillover effects that lead to localization of economic activities, in

line with Krugman (1991a,b) argument on regional spillover effects.

The county unemployment rate (UNEMP) is included among the exogenous variables

to measure local economic distress. The results suggest that a high unemployment

rate is associated with low business growth. This indicates that the poor economic

environment in Appalachia did not provide incentives for individuals to form new

businesses that employ not only the owner, but others. Unemployed individuals may

not have the capital to start a business. Furthermore, a high level of unemployment is

indicative of a relatively low aggregate demand, which also discourages new firm formation.

This result is consistent with the findings of Acs and Armington (2004). They

found that unemployment is negatively associated with new firm formation during

economic growth periods and positively during economic recession periods.

The coefficient of the variable representing the percentage of homes that are owned

by their own occupants (OWHU) is negative and statistically significant at the 10%

level. This result indicates that high home ownership is negatively associated with

business formation in Appalachia. This is contrary to the expectation that high home

ownership signals the availability of household assets and is therefore an indicator of

the capacity to finance new businesses by potential entrepreneurs, either by using the

house as collateral for loan or as indication of availability of other personal financial

resources. The result, however, shows that in Appalachia during the study period home

ownership was positively correlated with level of economic distress (Pollard 2003),

and home ownershipwas higher in distressed counties (76%), and lowest in attainment

counties (69%). Homeownership was also higher in central Appalachia (76%) than

in the more developed northern or southern sub-regions; and Appalachia non-metro

areas had higher ownership rates (76%) than its metro areas (72%). Thus, the result

indicates that home ownership is not a good indicator of the availability of resources

to start new business, at least in Appalachia.

123

38 G. H. Gebremariam et al.

The coefficients for MANU and WHRT are positive and significant at the 5 and 1%

levels, respectively. These results indicate that counties with a higher initial percentage

of their labor force employed in manufacturing and the wholesale and retail trade

showed higher growth rates in business than other counties.

The percentage of people employed in manufacturing (MANU) and the percentage

of people employed in wholesale and retail trade (WHRT) are included in the

EMPR equation to control for the influence of sectoral employment concentration on

the overall employment growth rate. The coefficient on MANU is positive and statistically

significant at the 5% level, indicating a direct relationship between growths

in overall employment and manufacturing employment at the beginning of the

periods. The coefficient on WHRT is also positive and significant at the 1% level,

indicating the positive role played by the service sector in expanding employment in

Appalachia during the study period. Thus, these results tend to suggest that

Appalachian counties that had a higher proportion of their labor force employed in

manufacturing and whole sale and retail trade at the beginning the periods experienced

higher growth rates in overall employment. This seems realistic since Appalachia has

experienced a shift from resource-based economic activities to manufacturing and,

particularly, to services. The coefficient on WHRT is higher and even more significant

than the coefficient on MANU in the EMPR equation, indicating that the contribution

of WHRT to overall employment growth was higher and more sustained than that of

MANU.

Establishment density (ESBd), defined as the total number of private sector establishments

in the county divided by the county’s population, is included in the model

to capture the degree of competition among firms and the concentration of businesses

relative to the population density. The average size of establishment (ESBs), defined

as total private sector employment divided by the number of private establishments

in the county, is also included to capture the effects of barriers to entry of new small

firms on employment growth. The coefficient for ESBd is positive and statistically

significant at the 1% level, indicating that the Appalachian region is far below the

threshold where competition among firms for consumer demands crowds businesses.

According to the results, a high ESBd is associated with growth in employment (business

growth), indicating that firms tend to locate near each other, possibly due to

localization and agglomeration economies of scale. The coefficient for ESBs is also

positive and significant indicating the existence of low barriers to new firm formation

and employment generation in Appalachia during the study period.

The results indicate that the county employment level is dependent on gross

in-migration, gross out-migration, and median household income. The coefficient for

INM, for example, is positive and significant at the 5% level. The coefficient for OTM

is negative and statistically significant at the 1% level. These are consistent with theoretical

expectations and empirical findings (Borts and Stein 1964). In-migration tends

to shift both the labor supply and labor demand curve right-wards, and out-migration

tends to lead to leftward shift of the curves. Thus, in-migration leads to increases in

employment, whereas out-migration leads to decreases in employment. A growing

population increases the demand for consumer goods and services and is positively

related to business formation (Acs and Armington 2004).

123

Analysis of county employment and income growth in Appalachia 39

Consistent with theoretical expectations and empirical findings, the coefficient for

MHY is positive and statistically significant at the 5% level. Increases in the demand

for goods and services that result from increases in family median or per capita income

are associated with increases in employment (Armington and Acs 2002).

An interesting observation from the empirical results pertains to the role of local

government in employment growth. The model predicts that local governments,

through their spending and taxation functions, play critical roles in creating and

enabling economic environments for businesses to prosper. The empirical results, however,

indicate that local governments have not played significant roles in employment

growth in Appalachia. Given the economic hardship and high level of underdevelopment

in Appalachia, these results are indications that local governments may need to

reassess or step up their efforts to create incentives for employment growth in this

region.

The elasticity of EMPR with respect to the initial employment level (EMP) is negative

and statistically significant, indicating convergence in the sense that counties with

low levels of employment at the beginning of the period (1990) tend to show a higher

rate of business growth than counties with high initial levels of employment, conditional

on the other explanatory variables. This result is consistent with prior studies on

rural renaissance (Deller et al. 2001; Lundberg 2003). The speed of adjustment, ηem,

is calculated at 0.8151, which indicates that just over 81% of the equilibrium rate of

growth in the employment rate of growth was realized during the period 1990–2000.

That is 8.151% annually, giving a half-life time of 8.47 years.

The parameter estimates for the MHYR equation also shows a positive estimate

for ρ2. This indicates that random shocks into the system with respect to MHYR not

only affect the county where the shocks originate and its neighbors, but create positive

spillover effects across Appalachia. The elasticity of EMPR with respect to the initial

median household income (MHY) is negative and statistically significant, indicating

convergence in the sense that counties with low median household incomes at the

beginning of the period (1990) tend to show higher rates of growth of median household

incomes than counties with high initial median household incomes, everything

else being equal. The speed of adjustment, ηmh, is calculated at 0.7976, which indicates

that about 80% of the equilibrium rate of growth in themedian household income

growth ratewas realized during the period 1990–2000. That is 7.976% annually, giving

a half-life time of 8.65 years. This result is comparable to the speed of convergence

estimates obtained by Higgins et al. (2006) and Young et al. (2008).

The effect of out-migration on the growth rate of median household income is negative

and statistically significant. If migrants’ endowments of human capital in the

form of education, accumulated skills, or entrepreneurial talents are higher compared

to the sending population, then the loss of their skills, inventiveness and innovativeness

would contribute to a decline in local productivity. Migrants may also own physical

and financial capital that they may take with them leading to a loss in investment in the

sending county. Moreover, out-migrants may contribute to a decline in the growth of

markets and scale and agglomerations economies in the sending county. Such demand

effects are the sources of loss in the growth of per capita personal incomes.

The coefficient for the index of social capital (SCIX) is positive and significant,

suggesting that high levels of social capital increase the wellbeing of a county. The

123

40 G. H. Gebremariam et al.

coefficients for the proportion of school age population (POP5-17), the proportion of

the population above 65 years old (POP > 65), and the proportion of female headed

households (FHHF) are negative, positive, and negative, respectively, as expected.

Counties with high proportions of POP5-17 and FHHF tend to have low levels of

median household incomes, whereas counties with a high proportion of POP > 65

tend to have high levels of MHY. These results are consistent with empirical results

of previous studies.

6 Conclusions

Themain objective of this studywas to test the hypotheses that (1) employment growth

and median household income growth are interdependent and jointly determined by

regional variables; (2) employment and median household income growth in a county

are conditional upon initial conditions of the county; and (3) employment and median

household income growth in a county are conditional upon employment and median

household income growth in neighboring counties. To test these hypotheses, a spatial

simultaneous equations model was developed. GS2SLS and GS3SLS coefficients of

the parameters were obtained by estimating the model using data covering the 417

Appalachian counties for the 1990–2000 period. The empirical results of the study

support the three hypotheses. In particular, the employment growth rate in one county

is positively affected by the employment growth rate and themedian household income

growth rate in neighboring counties, and the median household income growth rate in

one county is negatively affected by employment growth rate and median household

income growth rate in neighboring counties.

A policy implication of the finding is that counties may be more successful in

creating environments (business climate) to make themselves attractive to firms if

several neighboring counties pool their resources. The results also indicate the presence

of spatial correlation in the error terms, which implies that a random shock into

the system spreads across the region. The results further indicate convergence across

counties in Appalachia with respect to employment growth and median household

income growth rates, conditional upon the initial conditions of the explanatory variables

in the model. This information indicates that the divergence in the economic

status among Appalachian counties is narrowing and could mean that the efforts of

the Appalachian Regional Commission are showing results.

The empirical results indicate the presence of significant agglomerative effects:

counties with higher population concentrations showed significant business growth.

Combined with the findings of spillover effects, this might justify favoring focusing

investments in areas capable of generating agglomeration effects.

The study also produces useful information concerning the creation of new or the

expansion of existing businesses in Appalachia. Establishment density, which captures

the degree of competition among firms and crowding of businesses relative

to the population, indicates that Appalachia is below the threshold where competition

among firms for consumer demands crowds businesses. In addition, the results

indicate low barriers to new firm formation and employment generation during the

study period.

123

Analysis of county employment and income growth in Appalachia 41

While incorporating spatial interdependencies adds to the model’s computational

complexities, the returns are not only improved estimates, but the analysis also yields

information about spatial relationships that would not otherwise be available. For the

study period, this research suggests that a growth pole approach that spatially concentrates

scarce policy investments could benefit the region. Such insight requires

a spatially explicit model otherwise they are based on guesswork and intuition. Of

course, given the short time period of our analysis, additional research is needed to

determine if this result is stable over time or changes with the business cycle.

In general, this study confirms the importance of spatial effects in regional development.

The empirical results indicate the presence of spatial correlation in the error

terms and of spatial autoregressive lag. Failure to account for spatial interaction effects

results in less efficient and consistent estimates, as well as loss of insight.

Acknowledgments This research was partially funded by the West Virginia Agricultural and Forestry

Experiment Station. We acknowledge helpful comments by Dale Colyer and two referees. We thank

Anil Rupasingha, Stephan Goetz and David Freshwater for allowing the use of their Social Capital Index

data set for US counties. The usual caveat applies.

Appendix A: Derivation of the reduced form of the model

Let the system given in (4a, b) be written as:

Y = YB + XΓ + WYΛ + U. (I)

U = WUC + E and

Y = ( y1, . . . , yG) X = (x1, . . . , xK ) U = (u1, . . . , uG)

WU = (Wu1, . . . ,WuG) , C = diagGj

=1 ρj , E = (ε1,…, εG)

where y j is the n by 1 vector of cross sectional observations on the dependent variable

in the j th equation, xl is an n by 1 vector of cross sectional observations on the j th

exogenous variable, u j is an n by 1 vector of error terms in the j th equation, and B

and Γ are correspondingly defined parameter matrices of dimension G by G and K

by G, respectively. B is a diagonal matrix. Λ is G by G matrix of parameter estimates

of the spatial lag variables. It not diagonal and hence each equation includes spatial

cross-regressive lag variable in addition to its own spatial lag. Hence the model has

the same structure as that in Kelejian and Prucha (2004).

Note that ρj denotes the spatial autoregressive parameter in the j th equation and

since C is taken to be diagonal, the specification relates the disturbance vector in

the j th equation only to its own spatial lag. Since it is assumed that E(ε) = 0 and

E(εε

) = Σ ⊗ In, the disturbances, however, will be spatially correlated across units

and across equations.

The system in Eq. (I) can be expressed in a form where its solution for the endogenous

variables is clearly revealed. But, first consider the following vector transformations:

123

42 G. H. Gebremariam et al.

vec(Y) = vec(YB) + vec(XΓ ) + vec(WYΛ) + vec(U)

vec(Y) = vec(YB) + vec(XΓ ) + vec(WYΛ) + vec(UWC + E)

= B ⊗ I vec(Y) + Γ

⊗ I vec(X) + Λ

⊗ W vec(Y)

+ C ⊗ W vecU + vecE

Letting y = vec(Y), x = vec(X), u = vec(U), and ε = vec(E), it follows from

Eq. (I) that:

y = B ⊗ I y + Γ

⊗ I x + C ⊗ W u + ε

or

y = B ⊗ I y + Γ

⊗ I x + u,

u = C ⊗ W u + ε

(II)

Let B∗ = [(B ⊗ I) + (Λ

⊗ W)], Γ

∗ = (Γ

⊗ I ) and C∗ = C ⊗ W = diagGj

=1

(ρ jW), then Eq. (II) can be written in more compact form as:

y = B∗ y + Γ

∗x + u,

u = C∗u + ε

(III)

Assuming that InG − B∗ and InG − C∗ are nonsingular matrices with |ρj | < 1, j =

1, . . . , G, the system in Eq. (III) can be expressed in its reduced form as:

y = InG − B∗

−1 Γ

∗x + u ,

u = InG − C∗

−1

ε

(IV)

Based on the results of our estimation, we found that InG − B∗ and InG − C∗

have full column ranks and |ρj | < 1, j = 1, 2. From this we can conclude that

the reduced form of the system [Eq. (IV)] is properly defined and there also exists

spatial multiplier working in the system.

References

Acs ZJ, Armington C (2003) Endogenous growth and entrepreneurial activity in cities. http://ideas.repec.

org/p/cen/wpaper/03-02.html. Accessed 8 December 2008

Acs ZJ, Armington C (2004) The impact of geographic differences in human capital on service firm formation

rates. J Urban Econ 56:244–278

Acs ZJ, Audretsch DB (1993) Introduction. In: Acs ZJ, Audretsch DB (eds) Small firms and entrepreneurship:

an east-west perspective. Cambridge University Press, Cambridge

Acs ZJ, Audretsch DB (2001) The Emergence of the Entrepreneurial Society. Present. for the accept. of the

Int. Award for Entrepr. and Small Bus. Res., Stockh, 3 May

Anselin L (2003) Spatial externalities, spatial multipliers and spatial econometrics. Int Reg Sci Rev

26(2):153–166

Anselin L (1988) Spatial econometrics: methods, and models. Kluwer, Dordrecht

Anselin L, Kelejian HH (1997) Testing for spatial error autocorrelation in the presence of endogenous

regressors. Int Reg Sci Rev 20(1&2):153–182

123

Analysis of county employment and income growth in Appalachia 43

Arbia G, Basile R, Piras G (2005) Using panel data in modelling regional growth and convergence. Reg.

Econ Appl Lab Work Pap No. 55. Univ. Ill, Urbana-Champaign

Armington C, Acs ZJ (2002) The determinants of regional variation in new firm formation. Reg Stud

36(1):33–45

Aronsson T, Lundberg J,WikstromM (2001) Regional income growth and net migration in Sweden 1970–

1995. Reg Stud 35(9):823–830

Audretsch DB, Fritsch M (1994) The geography of firm births in Germany. Reg Stud 28(4):359–365

Audretsch DB, Carree MA, van Stel AJ, Thurik AR (2000) Impeded Industrial Restructuring: The Growth

Penalty. Res Pap Cent for Adv Small Bus Econ Erasmus Univ., Rotterdam

Barkley DL, HenryMS, Bao S (1998) The role of local school quality and rural employment and population

growth. Rev Reg Stud 28(1):81–102

Barro RJ, Sala-i-Martin X (1992) Convergence. J Polit Econ 100:223–251

Barro RJ, Sala-i-Martin X (2004) Economic growth, 2nd edn. MIT Press, Cambridge

Black DA, Sanders SG (2004) Labor market performance, poverty, and income inequality in Appalachia.

http://www.arc.gov/images/reports/labormkt/labormkt.pdf. Accessed 8 December 2008

Boarnet MG (1994) An empirical model of intra-metropolitan population and employment growth. Pap

Reg Sci 73(2):135–153

Borts GH, Stein JL (1964) Economic growth in a free market. Columbia University Press, New York

Brock WA, Evans DS (1989) Small business economics. Small Bus Econ 1(1):7–20

CallejonM, SegarraA(2001) Geographical determinants of the creation ofmanufacturing firms: the regions

of Spain. http://www.ub.es/graap/pdfcallejon/RS01.pdf. Accessed 8 December 2008

Carlino OG, Mills ES (1987) The determinants of county growth. J Reg Sci 27(1):39–54

Carree MA, Thurik AR (1998) Small firms and economic growth in Europe. Atl Econ J 26(2):137–146

Carree MA, Thurik AR (1999) Industrial structure and economic growth. In: Audretsch DB, Thurik AR

(eds) Innovation, industry evolution and employment. Cambridge University Press, Cambridge

Clark D, Murphy CA (1996) Countywide employment and population growth: an analysis of the 1980s.

J Reg Sci 36(2):235–256

Davidson P, Lindmark L, Olofsson C (1994) New firm formation and regional development in Sweden.

Reg Stud 28(4):395–410

Deller SC, Tsai TH, Marcouiller DW, English DBK (2001) The role of amenities and quality of life in rural

economic growth. Am J Agric Econ 83(2):352–365

Duffy NE (1994) The determinants of state manufacturing growth rates: a two-digit-level analysis. J Reg

Sci 34(2):137–162

Duffy-Deno KT (1998) The effect of federal wilderness on county growth in the inter-mountain western

United States. J Reg Sci 38(1):109–136

Duffy-Deno KT, Eberts RW (1991) Public infrastructure and regional economic development: a simultaneous

equations approach. J Urban Econ 30(3):329–343

Edmiston KD (2004) The net effect of large plant locations and expansions on county employment. J Reg

Sci 44(2):289–319

Ekstrom B, Leistritz FL (1988) Rural community decline and revitalization: an annotated bibliography.

Garland Publ, New York

Ertur C, Le Gallo J, Baumont C (2006) The European regional convergence process, 1980–1995: do spatial

regimes and spatial dependence matter? Int Reg Sci Rev 29(1):3–34

Fotopoulos G, Spencer N (1999) Spatial variations in new manufacturing plant openings: some empirical

evidence from greece. Reg Stud 33(3):219–229

Fritsch M (1992) Regional differences in new firm formation: evidence from West Germany. Reg Stud

26(3):233–244

Fritsch M, Falck O (2003) New firm formation by industry over space and time: a multilevel analysis.

Discuss Pap German Inst for Econ Res Berlin

Garofoli G (1994) New firm formation and regional development: the Italian case. Reg Stud 28(4):

381–393

Glaeser EL, Scheinkman JA, Shleifer A (1995) Economic growth in a cross-section of cities. J Monet Econ

36(1):117–143

Greenwood MJ, Hunt GL (1984) Migration and interregional employment redistribution in the United

States. Am Econ Rev 74(5):957–969

Greenwood MJ, Hunt GL,McDowel JM (1986) Migration and employment change: empirical evidence on

spatial and temporal dimensions of the linkage. J Reg Sci 26(2):223–234

123

44 G. H. Gebremariam et al.

Guesnier B (1994) Regional variation in new firm formation in France. Reg Stud 28(4):347–358

Hamalainen K, Bockerman P (2004) Regional labor market dynamics, housing, and migration. J Reg Sci

44(3):543–568

Hausman J (1983) Specification and estimation of simultaneous equations models. In: Griliches Z,

Intriligator M (eds) Handbook of econometrics. North Holland, Amsterdam

Hart M, Gudgin G (1994) Spatial variations in new firm formation in the Republic of Ireland, 1980–1990.

Reg Stud 28(4):367–380

HenryMS, BarkleyDL, Bao S (1997) The hinterland’s stake inmetropolitan growth: evidence from selected

southern regions. J Reg Sci 37(3):479–501

HenryMS, Schmitt B, KristensenK, Barkley DL, Bao S (1999) Extending Carlino-Mills models to examine

urban size and growth impacts on proximate rural areas. Growth Change 30(4):526–548

Higgins MJ, Levy D, Young AT (2006) Growth and convergence across the US: evidence from county-level

data. Rev Econ Stat 88(4):671–681

IssermanAM (1993) State economic development policy and practice in the United States: a survey article.

Int Reg Sci Rev 16(1–2):49–100

Johnson P, Parker S (1996) Spatial variations in the determinants and effects of firm births and deaths. Reg

Stud 30(7):676–688

Kangasharju A (2000) Regional variations in firm formation: panel and cross-section data evidence from

Finland. Reg Sci 79(4):355–373

Keeble D, Walker S (1994) New firms, small firms and dead Firms: spatial pattern and determinants in the

United Kingdom. Reg Stud 28(4):411–427

Kelejian HH, Prucha IR (1998) A generalized two-stage least squares procedure for estimating a spatial

autoregressive model with spatial autoregressive disturbances. J Real Estate Finance Econ 17(1):99–

121

Kelejian HH, Prucha IR (1999) A generalized moments estimator for the autoregressive parameter in a

spatial model. Int Econ Rev 40(2):509–533

Kelejian HH, Prucha IR (2001) On the asymptotic distribution of the Moran I test statistic with applications.

J Econ 104(2):219–257

Kelejian HH, Prucha IR (2004) Estimation of simultaneous systems of spatially interrelated cross sectional

equations. J Econ 118(1):27–50

Kmenta J (1986) Elements of econometrics. Macmillan, New York

Krugman P (1991a) Increasing returns and economic geography. J Polit Econ 99(3):483–499

Krugman P (1991b) Geography and trade. MIT Press, Cambridge

Lewis DJ, Hunt GL, Plantinga AJ (2002) Does public land policy affect local wage growth. Growth Change

34(1):64–86

Loveman G, Sengenberger W (1991) The re-emergence of small-scale production: an international comparison.

Small Bus Econ 3(1):1–37

Lundberg J (2003) On the determinants of average income growth and net migration at the municipal level

in Sweden. Rev Reg Stud 32(2):229–253

MacDonald JF (1992) Assessing the development status of metropolitan areas. In: Mills ES,MacDonald JF

(eds) Sources of metropolitan growth. Cent. for Urban Policy Res, New Brunswick

Mackinnon JG, White H, Davidson R (1983) Tests for model specification in the presence of alternative

hypotheses: Some further results. J Econ 21(1):53–70

McGranahan DA (1999) Natural amenities drive rural population change. http://www.ers.usda.gov/

publications/aer781/aer781i.pdf. Accessed 9 December 2008

Mills ES, Price R (1984) Metropolitan suburbanization and central city problems. JUrban Econ 15(1):1–17

Persson J (1997) Convergence across the Swedish counties, 1911–1993. Eur Econ Rev 41(9):1834–1852

Pollard KM(2003) Appalachia at the millennium: an overview of the results from census 2000. Popul. Ref.

Bur., Washington

Pulver GC (1989) Developing a community perspective on rural economic development policy. J Community

Dev Soc 20(2):1–4

Rappaport J (1999) Local growth empirics. Cent for Int Dev Work Pap No. 23. Harvard University Press,

Cambridge

Rey SJ, BoarnetMG(2004)Ataxonomy of spatial econometric models for simultaneous equations systems.

In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and

applications. Springer, Berlin

123

Analysis of county employment and income growth in Appalachia 45

Reynolds PD (1994) Autonomous firm dynamics and economic growth in the United States, 1986–1990.

Reg Stud 28(4):429–442

Rupasingha A, Goetz SJ, FreshwaterD (2006) The production of social capital in US counties. J Socio-Econ

35(1):83–101

Steinnes DN, Fisher WD (1974) An econometric model of intra-urban location. J Reg Sci 14(1):65–80

US Census Bureau (2005) Mean travel time to work for workers 16 years and over who did not work

at home (Minutes): 2005. (2005 American Community Survey). http://factfinder.census.gov/servlet/

DatasetMainPageServlet?_ds_name=ACS_2005_EST_G00_&_lang=en&_ts=199031476495. Accessed

4 June 2007

US Small Business Administration (SBA) (1999) The state of small business: a report of the President. US

Gov Print Press, Washington

Wennekers S, Thurik AR (1999) Linking entrepreneurship and economic growth. Small Bus Econ

13(1):27–55

Young AT, Higgins MJ, Levy D (2008) Sigma convergence versus beta convergence: evidence from US

county-level data. J Money Credit Bank 40(5):1083–1093

123

# MAKROEKONOMI

## Sunday, 17 October 2010

### Panel estimation of state-dependent adjustment

Empir Econ

DOI 10.1007/s00181-010-0419-y

Panel estimation of state-dependent adjustment

when the target is unobserved

Ulf von Kalckreuth

Received: 30 July 2009 / Accepted: 22 July 2010

© Springer-Verlag 2010

Abstract Understanding adjustment processes has become central in economics.

Empirical analysis is fraught with the problem that the target is usually unobserved.

This article develops and simulates GMM methods for estimating dynamic adjustment

models in a panel data context with partially or entirely unobserved targets and

endogenous, time-varying persistence. In this setup, the standard first differenceGMM

procedure fails. Four estimation strategies are proposed. Two of them are based on

quasi-differencing. The third is characterised by a state-dependent filter, while the last

is an adaptation of the GMM level estimator.

Keywords Dynamic panel data methods · Economic adjustment ·

GMM · Quasi-differencing · Non-linear estimation

JEL Classification C23 · C15 · D21

1 Introduction

New Keynesian economics, with its emphasis on real and financial frictions, has introduced

a focus on microeconomic adjustment dynamics into the empirical literature.

Adjustment dynamics are essential for understanding aggregate behaviour and its sensitivity

towards shocks. Important examples range from price adjustment and its sig-

This article was presented at the 2009 Panel Data Conference at the University of Bonn. It draws on Chap.

3 of the author’s habilitation thesis at the University of Mannheim.

The views expressed in this article do not necessarily reflect those of the Deutsche Bundesbank or its staff.

All the errors and omissions are those of the author.

U. von Kalckreuth (B)

Deutsche Bundesbank Research Centre, Wilhelm Epstein-Str. 14,

60431 Frankfurt am Main, Germany

e-mail: ulf.von-kalckreuth@bundesbank.de

123

U. von Kalckreuth

nificance for the New Keynesian Phillips curve (Woodford 2003), over plant level

adjustment and aggregate investment dynamics (Caballero et al. 1995; Caballero

and Engel 1999; Bayer 2006), to aggregate employment dynamics, building from

microeconomic evidence (Caballero et al. 1997). In these studies, as in von Kalckreuth

(2006), the adjustment dynamics itself becomes the principal object of analysis,

instead of being treated as an important, but burdensome obstacle to understanding

equilibrium phenomena.

In a rather general form, economic adjustment can be framed by a ‘gap equation’,

as formalised by Caballero et al. (1995):

yi,t = gi,t , xi,t · gi,t , where

gi,t = yi,t−1 − y∗

i,t

Here, subscripts refer to individual i at time t, and gi,t is the gap between the state

yi,t−1 inherited from the last period and the target y∗

i,t that would be realised if adjustment

costs were zero for one period of time. The speed of adjustment, which is written

as a function of the gap itself and additional state variables xi,t , determines the fraction

of the gap that is removed within one period of time. The adjustment function

will reflect convex or non-convex adjustment costs, irreversibility and indivisibilities,

financing constraints or other restrictions, and the uncertainty of expectations formation.

With quadratic adjustment costs or Calvo-type probabilistic adjustment, will

be a constant.1

Estimating the function is inherently difficult. In general, both y∗

i,t and gi,t will

not be observable. However, some measure of the gap is needed for any estimation,

and if explicitly depends on gi,t , this measure will move to the centre stage. In order

to address this issue, one may try to do the utmost to observe the target as exactly

as possible. The controversy between Cooper and Willis (2004) and Caballero and

Engel (2004) on interpreting the results of gap equation estimates bear testimony to

the problems that may result from imperfect measures of the gap. However, there is

an alternative. In linear dynamic panel estimation, the problem of unobserved targets

can successfully be addressed by positing an error component structure for the measurement

error and eliminating the individual fixed effect by a suitable transformation,

such as first differencing. See Bond et al. (2003) and Bond and Lombardi (2007) for

an error correction model of capital stock adjustment.

In the unrestricted, non-linear case, this approach is not feasible, as a host of

incidental parameters will preclude identification. However, there may be direct qualitative

information on the level of , e.g. from survey data, ratings or market information

services. If one is willing to treat the adjustment process as piecewise linear,

distinguishing regimes of adjustment, then, as will be shown, this information can be

harnessed to eliminate the incidental parameters from the problem completely.

1 Calvo-type adjustment refers to adjustment costs that are infinite with probability 1 − λ and zero with

probability λ. In other words: a randomly drawn share λ of market participants receives the chance to

adjust costlessly. As a modelling device, this assumption is ubiquitous in the monetary Dynamic General

Equilibrium literature. Sometimes this state-independent adjustment is playfully referred to as the working

of the ‘Calvo fairy’.

123

Panel estimation of state-dependent adjustment

Linear dynamic panel estimation was pioneered by Anderson and Hsiao (1982),

and it was developed and perfected by Holtz-Eakin et al. (1988); Arellano and Bond

(1991); Arellano and Bover (1995) and Blundell and Bond (1998). This article shows

how classic dynamic panel estimation methodology can be adapted for the analysis of

economic adjustment if the target is unobserved and the nonlinearity takes the form of

discrete regimes. This is not straightforward, as the unknown and time-varying adjustment

coefficient interacts with the equally unknown individual specific measurement

error. However, the reward is substantial: a well-known array of estimation procedures

and tests can be brought to bear on the investigation of economic adjustment.

The estimation methods presented here are geared to short panels that do not allow

a full direct identification of individual targets. The study was motivated by the problem

of characterising the speed of capital stock adjustment as depending on financing

constraints, in an environment where categorical information on the financing situation

is available; see von Kalckreuth (2008a) and von Kalckreuth (2008b).2 The

procedures allow addressing a number of important research questions, including the

state-dependence of pricing behaviour (is there a Calvo fairy?), the adjustment of

the financial structure of companies or banks after shocks, the asymmetry of factor

adjustment (downward rigidities, firing costs), or the implications of irreversibility.

Section 2 of this article characterises the stochastic process to be estimated.

A continuous scalar and a discrete regime vector are evolving jointly, and the

adjustment speed of the continuous-type variable depends on the regime. It is

shown that the standard procedure for estimating linear dynamic panel models

is not applicable. Section 3 assumes predetermined regimes and proposes two

estimators on the basis of quasi-differencing—one of them with the virtue of

great simplicity, the other being more efficient. Both are nonlinear, which may

lead to a small sample bias if in one of the regimes the adjustment speed is

almost zero. A Generalised Methods of Moments (GMM) estimator using statedependent

filtering is suggested, which is immune to this problem. Section 4 works

out sets of moment conditions that can be applied when the regimes are contemporaneously

correlated. Using a level estimator on an amplified equation, the assumption

of predetermined regimes can be dropped at the price of stricter prerequisites regarding

the fixed effect. Under the same conditions, a version involving first differences

is feasible, too. Section 5 compares the moment conditions and discusses their use.

Section 6 tests the proposed routines in a Monte Carlo study. Section 7 concludes.

Appendix A discusses error correction models with state-dependent dynamics, and

Appendix B contains the proofs.

2 A regime-specific adjustment process

A situation where a variable yi,t reverts to some target level y∗

i,twhich is characteristic

of individual i is examined. The speed of adjustment is state-dependent, following the

equation

2 The study successfully applies the estimator QD2, as exposed in Sect. 3 of this article.

123

U. von Kalckreuth

yi,t = − 1 − αi,t−1 yi,t−1 − y∗

i,t + εi,t , (1)

with

αi,t = α

ri,t .

The L-dimensional column vector α holds the state-dependent adjustment coefficients

relevant for each state. The adjustment coefficient αi,t = α

ri,t varies over time and

individuals, depending on the state ri,t, an L-dimensional column vector of regime

indicator variables, with one element taking a value of 1, and all others being zero. The

adjustment speed at date t is given by 1 − αi,t−1 . If the process is stable, it would

eventually settle in the target in the absence of shocks. The target level y∗

i,t is unobservable.

The panel dimension can help identify the adjustment process nonetheless, as it

allows an error component approach for modelling the unobserved target. An assumption

is made of the target to follow an equation that contains an individual-specific

latent term:

y∗

i,t

= x

i,tβ + μi .

The idiosyncratic componentμi in the adjustment equationmay reflect ameasurement

error or unobserved explanatory variables. The vector xi,t may encompass random

explanatory variables, deterministic time trends and also time dummies. In its absence,

the target level is entirely unobservable, but static. A generalized, error correction version

of the adjustment equation is discussed in the Appendix A.

Solving Eq. (1) for yi,t yields:

yi,t = αi,t−1 yi,t−1 + 1 − αi,t−1 x

i,tβ + 1 − αi,t−1 μi + εi,t

la te nt

. (2)

For later purposes, it is useful to state the backward solution to this stochastic difference

equation. For t ≥ 1 and a given starting value yi,0 it is

yi,t = yi,0 − x

i,1β − μi

t−1

τ=0

αi,τ + x

i,tβ + μi + Ai,t , (3)

with

Ai,t =

t−1

l=1

εi,l − x

i,l+1β

t−1

τ=l

αi,τ + εi,t . (4)

The solution has three components. The first term captures the influence of the initial

deviation. The second term is the target level at time t, x

i,tβ+μi . The third term, Ai,t ,

represents the effect of shocks and target changes, past and present. In the long run,

when the influence of the initial conditions has died out, Ai,t is equal to the deviation

from the target.

123

Panel estimation of state-dependent adjustment

In Eq. (2), both the individual effect and xi,t interact with a time-varying and endogenous

variable. This precludes the classical strategy for estimating linear dynamic

panel equations with fixed effects, namely to transform the equation by taking first

differences and use moment conditions involving higher lags of the dependent and

explanatory variables to accommodate for the fact that the transformed residual will

be correlated with the lagged endogenous variable. First differencing the Eq. (2) yields

yi,t = α

ri,t−1 yi,t−1 + (1 − α)

ri,t−1x

i,t β − α

ri,t−1μi + εi,t . (5)

Unlike the case of linear adjustment, the expression containing the unobserved μi is

not differenced out, and we have to deal with a time-varying error component that

is correlated with the explanatory variables. The following sections are devoted to

finding moment restrictions that make estimation feasible. The last set of restrictions

that will be discussed actually involves an amplified version of Eq. (5).

3 Predetermined regimes

In most applications, it will not be possible to treat ri,t as fully exogenous. If, for

example, εi,t is the error term in a capital accumulation equation and ri,t is a regime

indicating the degree of financing constraints, then the two variables should be correlated.

This section examines the case when the regime indicator, ri,t−1, can at least

be considered as predetermined with respect to the contemporaneous error term, εi,t .

Let us start by assuming the error term to be a martingale difference sequence:

E εi,t i,t−1 = 0, with

i,t−1 =

ri,t−1, ri,t−2, . . . , xi,t−1, xi,t−2, . . . , εi,t−1, εi,t−2, . . . , μi , y0i . (6)

Accommodation of the more general assumption

E εi,t

∗

i,t−k = 0, k ≥ 1, with

∗

i,t−k

=

ri,t−1, ri,t−2, . . . , xi,t−k , xi,t−k−1, . . . , εi,t−k, εi,t−k, . . . , μi , y0i , (7)

is straightforward. Note that

∗

i,t−k in assumption (7) is not simply a lagged version

of i,t−1, as the generalisation maintains the assumption of a predetermined ri,t−1.

The case of contemporaneously correlated regime indicators will be treated in Sect. 4.

3.1 Two moment conditions based on quasi-differencing

This subsection discusses two nonlinear transformations of the adjustment equation

that serve to eliminate the unobserved heterogeneity. Holtz-Eakin et al. (1988)

proposed quasi-differencing as a strategy in a case where fixed effects are subject

to time-varying shocks that arecommonacross individuals.3 It is nowexplored whether

3 See also Chamberlain (1983), pp. 1263–1264.

123

U. von Kalckreuth

this method can be generalised to themore complicated case at hand, where adjustment

coefficients are endogenous and vary over time and individuals.

Applied to the problem at hand, the quasi-differencing procedure as proposed by

Holtz-Eakin et al. (1988) would involve lagging Eq. (2), multiplying both sides by

1 − αi,t−1 / 1 − αi,t−2 and subtracting the result from Eq. (2). After reordering

coefficients, this gives

yi,t−1 − αi,t−1

1 − αi,t−2

αi,t−2 yi,t−1 − 1 − αi,t−1 x

i,tβ=εi,t − 1 − αi,t−1

1 − αi,t−2

εi,t−1.

(8)

The unobserved heterogeneity has duly been eliminated, but the error structure is difficult

to deal with, because αi,t−1 will in general be correlated with εi,t−1 and αi,t−2.

The underlying idea nonetheless leads to useful moment conditions, actually in two

different ways. First, dividing Eq. (8) by 1 − αi,t−1 gives

1

1 − αi,t−1

yi,t − αi,t−2

1 − αi,t−2

yi,t−1 − x

i,tβ = ψi,t ,

with ψi,t = εi,t

1 − αi,t−1

− εi,t−1

1 − αi,t−2

. (9)

This transformation—which shall be referred to as ‘QD1’—corresponds to solving

Eq. (1) for the deviation from the target, yi,t−1 − x

i,tβ − μi , and then solving the

lagged version of (1) for the past deviation from the target, yi,t−2 −x

i,t−1β −μi , and

finally differencing μi out. On the basis of Eq. (9), moment conditions for parameter

estimation can be formulated.

Second, we may multiply Eq. (9) by 1 − αi,t−2, to obtain

1 − αi,t−2

1 − αi,t−1

yi,t − αi,t−2 yi,t−1 − 1 − αi,t−2 x

i,tβ = ξi,t , (10)

with ξi,t = 1 − αi,t−2

1 − αi,t−1

εi,t − εi,t−1. (11)

This transformation shall be labelled ‘QD2’. It corresponds to multiplying Eq. (1) by

1 − αi,t−2 / 1 − αi,t−1 and subtracting the lag of the original adjustment equation.

Proposition 1 Under assumption (6) assuming the absence of serial correlation in

the error term, the levels yi,t−p, p ≥ 2, are instruments in Eqs. (9) and (10):

E yi,t−pψi,t = 0, (12)

E yi,t−pξi,t = 0. (13)

Proof See Appendix B.

Likewise, it can be shown that xi,t−p and the regime indicators ri,t−p, p ≥ 2,

are instruments in the Eqs. (9) and (10). If assumption (6) of no serial correlation is

123

Panel estimation of state-dependent adjustment

replaced by (7), then the set of instruments is pushed backwards in time accordingly:

The lags yi,t−k−p and xi,t−k−p, p ≥ 1 are instruments in the Eqs. (9) and (10). Note

that the regime indicator ri,t−1 is still assumed to be predetermined with respect to

εi,t ; thus, all lags ri,t−p, p ≥ 2 are instruments irrespective of k.

To discuss estimation on the basis of the two sets of moment conditions, it is useful,

however, to restate the transformations (9) and (10). Equation (9) has the convenient

feature that x

i,tβ enters additively. Collecting terms, one obtains

ψi,t = yi,t−1 + 1

1 − αi,t−1

yi,t − 1

1 − αi,t−2

yi,t−1

− x

i,tβ

= yi,t−1 + γ

ri,t−1 yi,t − x

i,tβ

= yi,t−1 + γ

ri,t−1 yi,t − x

i,tβ (14)

with γ

= 1

1−α1

. . . 1

1−αL

. (15)

Equation (14) is linear in the coefficient vectors γ and β, and can be estimated by

linear GMM using the moment conditions (12) of Proposition 1. The structural coefficients

α are related to the elements of γ by the nonlinear one-to-one transformation

(15). Inverting this transformation, therefore, gives a nonlinear GMM estimator of α.

Standard deviations and co-variances can be assessed using the delta method.

Making use of QD2 for GMM estimation is trickier. Let d ri,t−2, ri,t−1 be an

L2 × 1 indicator vector, where each element is a dummy variable indicating one of

the possible combinations of ri,t−2 and ri,t−1. Let λ be the vector of coefficients

1 − αi,t−2 / 1 − αi,t−1 corresponding to the elements of d (·):

λ

= 1 1−α1

1−α2

1−α1

1−α3

· · · 1−αL

1−αL−2

1−αL

1−αL−1

1 .

Let furthermore δ be a vector of products of the adjustment coefficients and β:

δ = (1 − α) ⊗ β =

⎛

⎜⎜⎜⎝

(1 − α1) β

(1 − α2) β

...

(1 − αL ) β

⎞

⎟⎟⎟⎠

.

Finally, let

h (α, β) =

⎛

⎝

λ

−α

−δ

⎞

⎠

(16)

123

U. von Kalckreuth

be an L (L + 1 + K) × 1 vector of reduced form coefficients, of which L (L + K)

are unknown. This results in

ξi,t = λ

d ri,t−2, ri,t−1 yi,t − α

ri,t−2 yi,t−1 − δ

ri,t−2 xi,t

= d ri,t−2, ri,t−1

yi,t r

i,t−1 yi,t−1 r

i,t−1 xi,t h (α, β) . (17)

In this case, there is no convenient one-to-one transformation from the elements

of h (α, β) to the underlying structural parameters. The nonlinearity of the problem

therefore has to be treated explicitly. Consider the simplest case, with two states and

no explanatory variables xi,t . Then λ and α have two elements each and one can write

π

= h (α)

= 1 1−α1

1−α2

1−α2

1−α1

1 −α1 −α2 .

Though nonlinear in the parameters, this equation is linear in the transformed variables.

This makes it easy to apply the Gauss–Newton method for solving the optimisation

problem inherent in GMM estimation. The Gauss–Newton method iterates

on a linearised moment function, sequentially improving the estimation. Calculating

pseudo-observations for each step, the estimation problem can be solved using routines

for the estimation of linear econometric models.4 As initial values for the iteration,

one can use the results from QD1 estimation exposed earlier in this section.

The transformations QD1 and QD2 are nonlinear, and the stochastic properties of

the transformed residuals depend on the adjustment parameters. Consider the transformed

residuals ψi,t = εi,t /(1 − αi,t−1) − εi,t−1/(1 − αi,t−2) on the one hand and

ξi,t = (1 − αi,t−2)/(1 − αi,t−1)εi,t − εi,t−1 on the other. The variance of ψi,t , will

become large if one or both alpha-coefficients are in the neighbourhood of 1, creating

problems in small samples. An adjustment coefficient approaching 1 will affect

the transformed error term of QD2, ξi,t , to a lesser degree. First, only one of the two

components of the difference is affected. Second, the effect is mitigated by the denominator,

1 − αi,t−2. Indeed, if the alpha coefficients in different regimes are of similar

size, the random factor will stay in the neighbourhood of 1. Therefore, when the alpha

coefficients are high (i.e. adjustment speed is low), considerable efficiency gains can

be expected from using QD2. This will be investigated in a simulation study in Sect. 6.

3.2 Generalised Differencing

As has been exposed above, the nonlinear transformations QD1 and QD2 may lead

to poor results if in one or more of the regimes the adjustment speed is very low.

The transformations cannot be used at all if one of the regimes is characterised by an

adjustment speed of exactly zero. This is a case of considerable theoretical interest,

4 The Gauss–Newton method has originally been developed for nonlinear least squares problems. See

Davidson and MacKinnon (1993) on the use of Gauss–Newton in nonlinear least squares and instrumental

variables estimation, Hayashi (2000), on GMM estimation, and Judge et al. (1985) on numerical methods

in maximisation. An unpublished appendix on the use of Gauss–Newton in the current context is available

from the author upon request.

123

Panel estimation of state-dependent adjustment

as the presence of fixed adjustment costs or irreversibility leads to bands around the

target where no adjustment takes place—the solution to the stochastic control problem

triggers adjustment when some threshold level is surpassed. Threshold behaviour

should be expected for decisions on single projects, not for firms or sectors, where

many such projects are aggregated. However, for small units it is certainly useful to

explicitly consider regimes of no adjustment, as have done Caballero et al. (1995) in

the context of plant level investment.

Therefore, it is worth asking whether there is a transformation that eliminates the

fixed effect in the target equation without affecting the size of the idiosyncratic errors.

It turns out that there is such a transformation, provided that the regime indicator has

limited memory with respect to εi,t . Consider again the first-differenced adjustment

Eq. (5) above:

yi,t = α

ri,t−1 yi,t−1 + (1 − α)

ri,t−1x

i,t β − α

ri,t−1μi + εi,t .

Whenever ri,t−1 = ri,t−2, this simplifies to

yi,t = α

ri,t−1 yi,t−1 + (1 − α)

ri,t−1 x

i,tβ + εi,t .

This expression looks very much like the first difference in the linear case, although

there is more than one adjustment coefficient to estimate. It is only taking first differences

of observations that belong to different regimes which leads to a latent term

−α

ri,t−1μi that will be correlated with the lagged dependent variable under a variety

of circumstances.

As it is this term that precludes the use of the standard technique, the following strategy

comes to mind: Differences are only formed for observations with ri,t−2 = ri,t−1.

The first element of α1 is estimated on the basis of cases where two consecutive observations

belong to the first regime, and using differences of observations that both

belong to the second regime leads to inference on the second adjustment coefficient,

etc. In this straight fashion, however, the idea will not work. If ri,t−1 and εi,t−1 are

correlated and groups of observations are formed according to regimes, then the transformed

residual εi,t will have a (conditional) expectation different from zero in those

groups. This will lead to biased estimators.

Under certain additional assumptions, however, a straightforwardmodification will

yield useful moment conditions:

1. Let q be the maximum τ for which there is a correlation between ri,t and εi,t−τ ,

e.g. as a consequence of a moving average structure of the state variable driving

the regime indicator. Then the observation is to be transformed subtracting past

observations of the same regime with a lag of at least = q + 2.

2. If an observation is not matched by a 2 + q-lag in the same regime, then it may

be transformed using a higher lag > q + 2.

The first part of the rule proposes a dynamic filter, which varies according to regimes.

The second avoids the loss of many observations in cases where regimes in t and t +q

do not match.

123

U. von Kalckreuth

The th difference is

yi,t − yi,t− = α

ri,t−1 yi,t−1 − ri,t− −1 yi,t− −1

+(1 − α)

ri,t−1x

i,t

− ri,t− −1x

i,t− β

−α

ri,t−1 − ri,t− −1 μi + εi,t − εi,t− ,

which simplifies to

yi,t − yi,t− = α

ri,t−1 yi,t−1 − yi,t− −1 + (1 − α)

ri,t−1 x

i,t

− x

i,t− β

+εi,t − εi,t− , (18)

if the two observations are characterised by the same regime, such that ri,t−1 =

ri,t− −1. When does the expectation of the residual term, εi,t − εi,t− , conditional

on ri,t−1 and the equality ri,t−1 = ri,t− −1, become zero? It is sufficient that εi,t and

εi,t− are both uncorrelated with the two conditioning variables ri,t−1 and ri,t− −1.

According to assumption (6), εi,t is uncorrelated with ri,t−1 and ri,t− −1. Then the

same is true with respect to εi,t− and ri,t− −1. Therefore, by choosing , it only

remains to make sure that εi,t− and ri,t−1 are uncorrelated. With = 1, this will

not be the case if εi,t and ri,t are contemporaneously correlated. However, if ri,t is

uncorrelated with all lags of εi,t , then = 2 will ensure that

E εi,t − εi,t− ri,t−1, ri,t−1 = ri,t− −1 = 0. (19)

More generally, if there is correlation between ri,t and εi,t−τ up to lag τ = q, the

difference that guarantees the above equation to hold will have to be at least of order

= q + 2. However, one is not restricted to using only differences of the order that

is ‘just right’, i.e. q + 2. Any other difference of order ≥ q + 2 will fulfil Eq. (19)

just as well. It is straightforward to construct a difference using the most proximate

observation of the same regime with lag ≥ q + 2. With respect to admissibility of

instruments, the rules of the classic first-difference approach apply: the instruments

need to be uncorrelated with the earlier of the two observations that make up the difference.

In the following, this procedure is called the Generalised Difference estimator.

For the moment conditions to hold, it is necessary to strengthen assumption (6). In

addition to the variables in the conditioning set i,t−1, εi,t must also be uncorrelated

with the future regimes ri,t+q+1, ri,t+q+2, . . ..

Proposition 2 Let the conditional expectation of εi,t satisfy

E εi,t i,t−1, ri,t+q+1, ri,t+q+2, . . . = 0, (20)

with i,t−1 defined as in assumption (6). Then the lagged levels yi,t− −p, p ≥ 1 are

instruments in Eq. (18), the adjustment equation transformed by taking the th difference,

with ≥ q + 2, conditional on the regimes being the same in each pair of

observations:

E εi,t − εi,t− yi,t− −p ri,t−1, ri,t−1 = ri,t− −1 = 0, with ≥ q + 2.

123

Panel estimation of state-dependent adjustment

Proof See Appendix B.

Likewise, it can be shown that xi,t− −p and the regime indicators ri,t− −p are

instruments in Eq. (18), given ri,t−1 = ri,t− −1. As in Proposition 1 above, if i,t−1

in (20) is replaced by

∗

i,t−k , as defined in assumption (7), with k being the minimum

τ such that εi,t does not vary with εi,t−τ and xi,t−τ , the set of instruments is pushed

backwards in time: The lags yi,t− −k−p and xi,t− −k−p, p ≥ 1 are instruments. As

the regime indicator ri,t−1 is still assumed to be predetermined all lags ri,t−p, p ≥ 2

are instruments irrespective of k.

It is an identifying assumption for the process that drives the regime indicator to

have finite memory with respect to innovations εi,t . This is a limitation. If ri,t are correlated

with all past values of εi,t , then the conditional expectation of the transformed

error term resulting from a difference of two observations from the same regime will

not disappear. The resulting bias can be expected to wane if the minimum lag length is

chosen to be large. However, doing so would result in the loss of many observations,

exacerbating another weakness of the estimation strategy. In principle, assuming a

finite memory of the regime indicator with respect to εi,t is rather similar in kind to

the assumption of a finite memory of εi,t with respect to earlier shocks,which is needed

to use lagged endogenous variables as instruments in the standard approach. Whether

the condition (20) can be expected to hold or not will depend on the estimation problem

at hand. In the context of estimating the microeconomic adjustment of the capital

stock under financing constraints, it may be realistic to assume that, after the shock to

capital demand, the financing structure of a firm will be restored in finite time.5

3.3 Testing finite memory and deciding on the length of memory

In order to use Generalised Differencing, it is necessary to test the condition (20) and

decide on the length of the memory of the process driving the regime with respect

to εi,t . There are two simple solutions. The first is to use the test of overidentifying

restrictions associated with Sargan (1958) and Hansen (1982) to check the validity of

the moment conditions. The drawback is that this test is generally used as an omnibus

test of the specification, including the choice of the instruments. It is preferable to

have a more specific test concerning the appropriate lag length.

Such a specific test can be based on the fact that the expected value of the residual

will not disappear if the lag length chosen is too short. In that case, the choice

of observations according to regime will select positive or negative outcomes of εi,t ,

because of the correlation between the regime variable and the error component εi,t .

If regime dummies are added to the adjustment equation, then their coefficients will

be estimated as positive or negative quantities according to the direction of selectivity,

although they should be zero according to the basic specification. Furthermore,

it is known how these estimates for regime constants are distributed under the null

of a correct specification. Using a GMM estimator, they are asymptotically normal,

with mean zero, and their standard deviation is given by the standard deviation of the

coefficient. Therefore, the t-value on these coefficients is a valid test statistic.

5 For a theoretical model that makes this prediction, see von Kalckreuth (2004, 2008b, Chap. 1).

123

U. von Kalckreuth

It may be argued that this test ignores the possibility that the regime-specific constants

truly belong into the equation. Consider a trend in the term in the brackets of

Eq. (1) that makes the target level of yi,t change over time:

yi,t = − 1 − αi,t−1 yi,t−1 − κt − μi + εi,t .

Solving for yi,t yields

yi,t = αi,t−1 yi,t−1 + 1 − αi,t−1 κt + 1 − αi,t−1 μi + εi,t .

After transforming the equation by subtracting an observation belonging into the same

regime, lagged periods, one obtains

yi,t − yi,t− = αi,t−1 yi,t−1 − yi,t− −1 + 1 − αi,t−1 κ + εi,t − εi,t− .

Regime-specific constants may thus be the result of a trending target variable. Actually,

this is a case of misspecification: the time trend should have figured in xi,t. The

regime constants should be proportional to each other, with a factor of proportionality

given by the adjustment speeds.6 More generally, they should not be of different sign,

as it will be the case if the coefficient on the regime dummy collects the residuals

selected for their high or low value.

4 Moment restrictions for contemporaneously correlated regimes

All moment restrictions discussed in the previous section require the regime indicator

to be predetermined with respect to the current shock term. This may hold in many

applications, specifically if there are long planning and gestation lags as in the case of

fixed investment. In other circumstances, the error term in the adjustment equation and

the threshold variable governing the adjustment regime may be contemporaneously

correlated. Let us investigate an approach that can be brought to bear in this case.

For greater clarity, the adjustment equation shall be rewritten with a modified dating,

to highlight the possibility of a contemporaneous correlation between the speed of

adjustment and εi,t :

yi,t = − 1 − αi,t yi,t−1 − x

i,tβ − μi + εi,t , (21)

or

yi,t = αi,t yi,t−1 + 1 − αi,t x

i,tβ + μi + εi,t . (22)

It will now be shown that the requirement of predetermined regimes can be dropped at

the cost of additional assumptions regarding the fixed effect. Under these assumptions,

6 Let z1 and z2 be two regime dummy coefficients, with α1 and α2 the corresponding adjustment coefficients.

If the regime dummies result from a trending target as above, then the nonlinear restriction between

coefficients is z1/z2 = (1 − α1)/(1 − α2). It is rather straightforward to test this restriction after estimation.

123

Panel estimation of state-dependent adjustment

it is possible to leave the fixed effect in an equation amplified by regime dummies and

use first differences as instruments. Under the same conditions, first differences will

also serve as instruments for a modified version of the first differenced Eq. (5).

Level estimation was introduced by Arellano and Bover (1995) and Blundell and

Bond (1998) as a response to a specific problem arising in the standard autoregressive

model with fixed effects. If the coefficient of the lagged dependent variable is in the

neighbourhood of one, then the level behaves like a random walk, and it will be a

weak instrument in the differenced equation. These authors use the following moment

condition for estimation in the estimation of the standard autoregressive model:

E yi,t−p μi + εi,t = 0,

with p ≥ 1. If εi,t is serially uncorrelated, then it is sufficient that yi,t is mean

stationary and displays a constant correlation with μi for the moment equation to

hold. This implies a requirement on the initial conditions: the deviation of the starting

value from the stationary level needs to be uncorrelated with the stationary level itself.

The latent term of Eq. (22) is given by 1 − αi,t μi + εi,t. In the attempt to use

first differences as instruments for levels, let us first take a look at

E yi,t−p 1 − αi,t μi + εi,t .

This expectation will be zero if, first, E yi,t−p = 0, and second, yi,t−p is uncorrelated

with both 1 − αi,t−1 μi and εi,t . The first condition requires the process to

be mean stationary, as in the derivation of Blundell/Bond and Arellano/Bover. The

second condition is hard to fulfil. To see the reason, one may adjust the backward

solution in (3) and (4) to the modified dating:

yi,t = yi,0 − x

i,1β − μi

t

τ=1

αi,τ + x

i,tβ + μi + Ai,t ,

where

Ai,t =

t

l=2

εi,l−1 − x

i,lβ

t

τ=l

αi,τ + εi,t .

Plugging this back into (21) yields the expression:

yi,t = − 1 − αi,t

yi,0 − x

i,1β − μi

t−1

τ=1

αi,τ + Ai,t−1 − x

i,tβ

+ εi,t .

(23)

The difference yi,t−p is a function of all εi,τ, xi,τ and αi,τ , τ ≤ t − p, as well as of

the initial condition, the deviation yi,0 − x

i,1β − μi . One of the requirements for the

covariance of yi,t−p and 1 − αi,t μi to disappear is therefore a limited memory of

123

U. von Kalckreuth

αi,t = α

ri,t with respect to its own past. Fixed effects in ri,t are thus excluded. This

would be hard to defend in many applications, given the presence of a fixed effect in

the law of motion governing yi,t .

In order to weaken the requirements, one may decompose the individual target

level, μi , into its expectation over all individuals, μe, and the individual deviation

from this expectation, μ

∗

i . Let, therefore,

μi = μe + μ

∗

i , with μe = Ei (μi ).

By definition, E μ

∗

i = 0. Rewriting the adjustment equation in (22) gives

yi,t = α

ri,t yi,t−1 + (1 − α)

ri,tx

i,tβ + μe (1 − α)

ri,t + μ

∗

i (1 − α)

ri,t + εi,t

laten t term

.

(24)

Written this way, the equation contains a regime-specific shift term μe (1 − α) ri,t .

In estimation, this term can be taken into account by introducing the regime vector

ri,t as a regressor into the equation.

Proposition 3 Consider the conditions

E εi,t εi,t−k, εi,t−k−1, . . . , xi,t−k ,

xi,t−k−1, . . . , ri,t−k , ri,t−k−1, . . . , yi,0 − x

i,1β − μi = 0, (25)

E μ

∗

i

εi,t ,

ri,t ,

xi,t , yi,0 − x

i,1β − μi = 0, (26)

with k ≥ 1, where a term in curly brackets denotes an entire time series. Jointly, these

conditions are sufficient for the following moment restrictions to hold in Eq. (24):

E yi,t−p εi,t + 1 − αi,t μ

∗

i = 0 with p ≥ k, (27)

Proof See Appendix B.

It follows immediately from the condition (25) that appropriately lagged values

xi,t−p and ri,t−p can also be used as instruments. Some comments are in order.

It is natural that one has to impose conditions on μ

∗

i , now that μi is not differenced out

of the error term. The invariance of expected μ

∗

i with respect to the time path

εi,t

is rather unproblematic. It agrees well with the basic structure of the error component

model. The irrelevance of the regime process is less innocuous. It is well conceivable

that a real-world data generating process for ri,t may contain a fixed effect that is

correlated with μ

∗

i . Similar reservations apply with respect to the required irrelevance

of

xi,t . Finally, the necessity of having an expected value of μi that is independent

of the initial deviation was also found by Blundell and Bond (1998) when investigating

the use of moment equations for levels in a linear context. The condition is not

innocuous either: it excludes an initial condition such as yi,0 = 0. It can be replaced

123

Panel estimation of state-dependent adjustment

by the requirement that the process has been running for a ‘very long’ time, as the first

term inside the bracket of Eq. (23) will disappear asymptotically.7

As a corollary to Proposition 3, it follows that lags of yi,t can also be used as

instruments in a differenced version of the augmented adjustment Eq. (24):

yi,t = α

ri,t yi,t−1 + (1 − α)

ri,tx

i,t β − μeα

ri,t + εi,t − μ

∗

i α

ri,t

laten t term

.

Under conditions (25) and (26), the following restriction will hold8:

E yi,t−p−1 εi,t − αi,tμ

∗

i = 0 with p ≥ k. (28)

Note that the moment restrictions for differences in (28) do not use all the information

contained in the moment restriction for levels: the first are implied by the latter but not

vice versa. Furthermore, because the residuals in (28) are first differenced, one observation

is lost, and the instruments have to be removed one period in time. However,

the moment condition is not necessarily useless: estimators based on condition (28)

may be more robust against violations of assumption (26) regarding the fixed effect,

especially when regime changes are relatively infrequent, as μ

∗

i is differenced out of

(28) whenever ri,t = ri,t−1.

5 A synopsis

At this point, it is interesting to compare the conditions for Propositions 1, 2 and 3.

All of them require the expected value of εi,t to be invariant with respect to past

values εi,t−k, εi,t−k−1, . . ., the levels or first differences of xi,t−k , xi,t−k−1, . . . as well

as to μi and/or the initial deviation. Propositions 1 and 2 also need εi,t to be uncorrelated

with ri,t−1, the regime indicator figuring in the current date adjustment equation,

whereas for Proposition 3, invariance of εi,t with respect to lag k and earlier of the

regime indicator is sufficient. As an additional identifying assumption for the Generalised

Differencing approach, the memory of ri,t needs to be finite with respect to lags

of εi,t . This excludes, for example, an autoregressive process for the state variable

underlying the adjustment indicator, with the innovation contemporaneously correlated

to εi,t . The level estimator, for its part, needs the expected value of the individual

effect μi to be unrelated to the process governing the idiosyncratic error, changes

in the forcing term xi,t , the regimes and the initial deviation. Both these restrictions

may impose considerable limitations. However, estimators based on Propositions 2

and 3 are able to fulfil special tasks. The Generalised Difference estimator will be

unbiased even if some of the alpha coefficients are large—in fact, it still works if

7 Such a process may also be observed by means of a ‘short’ panel—what matters is not the length of

the panel, but whether or not the process has been running long enough to bring the effect of the initial

condition in Eq. (23) into the neighbourhood of zero.

8 This follows directly from E yi,t−p−1 εi,t + 1 − αi,t μ

∗

i = 0 and E( yi,t−p−1(εi,t +

(1 − αi,t )μ

∗

i )) = 0.

123

U. von Kalckreuth

one of them is exactly equal to 1 or even greater. Like the standard first-difference

estimator in the linear case, the Generalised Difference estimator can be supposed to

deliver imprecise results if all the adjustment coefficients are in the neighbourhood

of 1, as then the level instruments are weak. In this case, the level estimator will perform

better. Perhaps even more importantly, this latter estimator is also capable of

dealing with regime indicators that are contemporaneously correlated with the error

term.

6 Implementing and simulating the estimators

This section compares the four sets of moment conditions exposed in the Propositions

1, 2 and 3, using them separately for estimation on simulated panel data sets.

6.1 Setting up the simulation

For the regime indicator, a threshold process is specified. The kth element of ri,t is

given by

r(k)i,t = Ind ¯sk−1 ≤ si,t ≤ ¯sk .

The numbers ¯s0, . . . , ¯sL are thresholds, with the first and the last element being equal

to−∞and∞, respectively. As an example for a threshold process with infinite memory

with respect to the error term, an AR(1) is used as a process for the latent state

si,t :

si,t = asi,t−1 + υi,t ,

where the current shock υi,t is contemporaneously correlated with the error term εi,t .

Alternatively, as an example of a process with finite memory, it is assumed that the

threshold process is driven by an MA(q):

si,t = b +

q

j=0

c jηi,t−j , with c0 = 1.

The elements of the moving average conform to

E ηi,t = 0, E ηi,tηi,t−p = 0∀p > 0, E ηi,t εi,t = 0, E ηi,t εi,t−p = 0∀p > 0.

Concretely, the two interrelated processes

ri,t , yi,t are simulated as follows:

Regime-dependent error correction process: εi,t is standard normal, μi is distributed

N (1, 1) , εi,t and μi are independent.

Regime indicator process: Regarding the number of regimes, let L = 2. If the

threshold process is driven by an AR(1), then let E υ2

i,t

= 1, E υi,t εi,t = 0.8, υi,t

being calculated as a weighted sum of εi,t and an independent Gaussian process. The

123

Panel estimation of state-dependent adjustment

AR-parameter a is 0.8. Likewise, for the MA(q), the stochastic structure is chosen as

E η2

i,t

= 1, E ηi,t εi,t = 0.8, with ηi,t being calculated as a weighted sum of εi,t

and an independent Gaussian process. The threshold level is set equal to zero, resulting

in an equal number of observations in each regime on average. Let us experiment

with a MA(0) (uncorrelated regimes states) and a MA(1) with c1 = 0.8. Note that the

assumed contemporaneous correlation between the shocks in the regime equation and

the error term is very high.

Panel structure: The panel is unbalanced, with individuals carrying either 8, 9 or

10 observations, 1,000 individuals of each type, that is, 3,000 individuals in total. For

each individual, the process is simulated for 50 periods, and only the last 8, 9 or 10

observations are used for estimation.

All the estimators are implemented by first calculating the transformed observations

and the instruments and then adapting and using the routines supplied with the

DPD module for Ox proposed by Doornik et al. (2002) to perform GMM estimates

and tests.9 Details on the estimation routines are given below and in the notes to the

tables.

6.1.1 Quasi-difference estimations QD1 and QD2

Let us assume an AR(1) as a process driving the threshold variable that constitutes the

regime. The estimation equations are transformed in the way described in Sect. 3. The

first quasi-differencing approach, QD1, is implemented by estimating the transformed

equation using a standard linear GMM estimator and then calculating the structural

parameters by inverting Eq. (15). The more complicated QD2 estimation is performed

by treating the moment as a nonlinear function of the structural parameters, using the

iterative Gauss–Newton method.

Estimates on the basis of the QD1 transformation are used as initial values. As

instruments, levels lagged twice are used. It turns out that the instruments are more

informative (the estimates being more precise) if they are separated out in regimes,

which means: For purposes of instrumentation, the lags of yi,t−2 are interacted with

regime dummies, ri,t−2.

6.1.2 Generalised Difference estimation

The transformation described in Proposition 2 consists in taking the th difference,

with chosen such that regimes ri,t−1 and ri,t− −1 match, subject to some minimum

order of difference. Available instruments are levels lagged + 1, + 2, . . .. As the

appropriate depends on the regime process, so does the set of instruments. By taking

the earlier of the two observations as a point of reference yi,t and assigning to it

the nearest lead yi,t+ of the same regime with ≥ 2 + q, the definition of suitable

instruments is straightforward. One can uniformly use lags yi,t−1, yi,t−2 and earlier as

instruments. As in Quasi-Difference estimation, let us interact the lagged levels yi,t−1

9 Ox is an object-oriented matrix programming language. For a complete description of Ox, see Doornik

(2001).

123

U. von Kalckreuth

with regime indicators ri,t−1. In order to test the validity of the transformation, regime

dummies are included as additional RHS variables. They also enter the instrument

set.

6.1.3 Level estimation

As described in Sect. 4, the level estimator is implemented by specifying an auxiliary

equation that contains a set of regime dummies as an additional RHS variable. Instruments

are first differences of lagged endogenous variables, interacted with regime

indicators, ri,t−1 yi,t−1 ri,t−2 yi,t−2 (four variables!) plus differenced indicators

for regime 1 taken from ri,t−1, ri,t−2. Simulations are performed both for the case

where a predetermined regime regime ri,t−1 enters the adjustment equation, and for

the case of a contemporaneously correlated regime ri,t governing the adjustment.

6.2 Simulation results

Tables 1 and 2 show estimates on the basis of quasi-difference transformations QD1

and QD2 (1,000 runs). The theoretical discussion has shown that the finite sample

properties of the estimators may depend on the size of the regime-specific coefficients,

notably on their difference from 1. Therefore, estimations for a whole range

of parameters are shown. The true value for α1 is set as 0.3, whereas the value for α2

ranges from 0.3 to 0.9. Larger ranges and finer steps are plotted in Figs. 1 and 2.

Table 1 and Fig. 1 display results for the simpler QD1 transformation. Although

for smaller coefficient values, the estimator performs well and yields correct estimates

with a good precision, it is less reliable if one of the regime-specific coefficients is

large. For α1 = α2 = 0.3, the mean bias is only of the order of −0.004 for both

parameters. It will be 0.0133 for ˆα2 when α2 is raised to 0.7, and for α2 = 0.9, the

finite sample bias of ˆα2 becomes a non-negligible −0.0414.10 The estimates ˆα1 also

deteriorate, although less markedly. The table also gives t-values and Sargan statistics.

The bias leads the t-tests reject the true value too often when one of the coefficients is

too high: In the extreme case of α2 = 0.9, the true value is rejected 77.9% of the times.

The same is true for the Sargan test of instrument validity: with large regime-specific

coefficients, it rejects the instruments 81.6% of the times when α2 = 0.9. One can

conclude that slow speeds of adjustment (high persistence) create a problem for QD1

estimation.

Table 2 and Fig. 2 give results for the QD2 transformation. As is expected, for large

values of regime specific adjustment coefficients the estimator performs better than

its counterpart based on QD1. In the extreme cases of α1 = 0.3 and α2 = 0.9, the bias

is still only 0.0152 and −0.0209, respectively. For smaller values of regime-specific

coefficients, there is hardly any bias at all. Sargan statistics and t-values are reliable,

except for very high values of α2.

10 Whether one considers the bias as large will also depend on the way one looks at the parameter. The

state-dependent speed of adjustment is given by 1 − αi,t−1. A bias of −0.0415 when the true value of α2

is 0.9 will, therefore, overestimate the adjustment speed by 41.5%.

123

Panel estimation of state-dependent adjustment

Table 1 Quasi-differences, QD1 transformation, 1,000 runs

Simulation # (1) (2) (3) (4)

Specification state variable underlying regimes AR(1)

True α1 0.3 0.3 0.3 0.3

True α2 0.3 0.5 0.7 0.9

α1

Mean parameter estimate 0.2930 0.2955 0.2939 0.2687

Mean bias –0.0041 –0.0045 –0.0061 –0.0313

Mean estimated std. deviation 0.0220 0.0236 0.0276 0.0351

Std. dev. parameter estimate 0.0218 0.0247 0.0298 0.0533

RMSE 0.0222 0.0251 0.0304 0.0618

Freq. rejections of true value on 5% conf. level 4.6% 6.8% 5.9% 25.7%

α2

Mean parameter estimate 0.2957 0.4938 0.6868 0.8586

Mean bias –0.0043 –0.0062 –0.0133 –0.0414

Mean estimated std. deviation 0.0194 0.0189 0.0177 0.0139

Std. dev. parameter estimate 0.0197 0.0190 0.0188 0.0203

RMSE 0.0202 0.0200 0.0230 0.0262

Freq. rejections of true value on 5% conf. level 6.0% 5.4% 12.3% 77.9%

Freq. rejection by Sargan–Hansen on 5% conf. level 8.1% 9.4% 16.4% 81.6%

Valid obs. in estimation 21,000 21,000 21,000 21,000

Notes: the table shows GMM estimates of α1 and α2 on the basis of the transformation QD1, see Proposition

1. Columns vary by parameters α1 and α2 used for generating panels according to Eq.(2). Each column

represents 1,000 repetitions of two-stage GMM estimates using an unbalanced panel of 3,000 individuals

with 10, 9 and 8 observations (1,000 individuals each). The number of valid observations is reduced

by the need to transform variables. Instruments are the levels of ri,t−2 yi,t−2 (i.e. two interaction terms)

and a constant. Estimated standard deviations are derived from reduced form estimates using the delta

method. Sargan-Hansen test is the test of overidentifying restrictions associated with Sargan (1958) and

Hansen (1982). Estimation is executed using DPD package version 1.2 on Ox version 3.30 and additional,

user-written routines

The theoretical discussion in Sect. 3 has shown that the precision of the QD2 estimator

should depend on the ratio of adjustment speeds. If both of them are high, but

of similar size, then the ratio 1 − αi,t−2 / 1 − αi,t−1 in the definition of the transformed

error term ξi,t cancels out in Eq. (11). The error term in QD1, in contrast,

depends on the absolute distance of the regime-specific coefficients from unity. To

study this issue, the simulations of QD1 and QD2 estimation are performed using a

value of α1 = 0.8 as a platform and varying over α2.The result is shown in Figs. 3 (QD1

estimation) and 4 (QD2 estimation). Here, the QD1 estimates are biased throughout

the range. The bias of ˆα2 switches from positive to negative, whereas the bias of ˆα2 is

negative throughout. In contrast, with QD2, the bias practically disappears when both

parameters are large, to be noticeable only when α1 is small.

Table 3 and Figs. 5 and 6 give results using GMM on observations transformed by

Generalised Differences. InColumns 1 and 2, the estimator is correctly used. Thememory

of the regime process is restricted—Column (1) assumes uncorrelated regimes,

and Column (2) assumes a threshold process driven by anMA(1). The minimum leads

used in transformation are 2 and 3, respectively. In both cases, the Generalised

Difference estimator performs well. The estimates are unbiased. The standard deviations

are similar to what can be obtained from the quasi-difference estimates for the

123

U. von Kalckreuth

Table 2 Quasi-differences, QD2 transformation, 1,000 runs

Simulation # (1) (2) (3) (4)

Specification state variable underlying regimes AR(1)

True α1 0.3 0.3 0.3 0.3

True α2 0.3 0.5 0.7 0.9

α1

Mean parameter estimate 0.2998 0.3006 0.3021 0.3152

Mean bias −0.0002 0.0006 0.0021 0.0152

Mean estimated std. deviation 0.0221 0.0229 0.0261 0.0418

Std. dev. parameter estimate 0.0217 0.0235 0.0270 0.0463

RMSE 0.0217 0.0235 0.0271 0.0487

Freq. rejections of true value on 5% conf. level 4.7% 5.8% 5.8% 9.5%

α2

Mean parameter estimate 0.2985 0.4982 0.6943 0.8791

Mean bias −0.0014 −0.0018 −0.0057 −0.0209

Mean estimated std. deviation 0.0195 0.0194 0.0187 0.0174

Std. dev. parameter estimate 0.0195 0.0192 0.0188 0.0170

RMSE 0.0196 0.0193 0.0197 0.0269

Freq. rejections of true value on 5% conf. level 5.9% 4.5% 5.9% 23.0%

Freq. rejection by Sargan–Hansen on 5% conf. level 5.2% 6.0% 6.0% 22.9%

Valid obs. in estimation 21,000 21,000 21,000 21,000

Notes: the table shows GMM estimates of α1 and α2 on the basis of the transformation QD2, see Proposition

1. Columns vary by parameters α1 and α2 used for generating panels according to Eq. (2). Each

column represents 1,000 repetitions of a two-stage GMM procedure iterating on pseudoregressors, using

an unbalanced panel of 3,000 individuals with 10, 9 and 8 observations (1,000 individuals each). As an

initial value, an estimate on the basis of QD1 was used. The number of valid observations is reduced by

the need to transform variables. Instruments are the levels of ri,t−2 yi,t−2 (i.e. two interaction terms) and

a constant. Estimated standard deviations are calculated as a by-product from the final Gauss–Newton iteration

step. Sargan-Hansen test is the test of overidentifying restrictions associated with Sargan (1958) and

Hansen (1982). Estimation is executed using DPD package version 1.2 on Ox version 3.30 and additional,

user-written routines

smaller of the two coefficients and actually somewhat lower for the higher coefficient.

In the case of an MA(1) regime process, standard deviations are higher, as less

observations can be used. Column (1), with a minimum lead of 2, yields an average of

15,058 valid observations per estimation. This number decreases to 11,277 in Column

(2), when a minimum lead of 3 is imposed. On the same set of simulated data, the

estimates based on quasi-differencing can use 21,000 observations each run. Figure 5

shows that the average deviation of the Generalised Difference estimator from the true

parameter value is very small when the conditions for its use are met and does not

depend systematically on the size of the adjustment coefficients. Even regime-specific

coefficients equal to or larger than 1 can be accommodated, as long as the overall

process remains stable. Columns (3) and (4) do ‘the wrong thing’. For Column (3), a

minimum lead of 2 is used on data generated with a regime process generated by an

MA(1), where a lead of ≥ 3 is warranted. Column (4) assumes an AR(1) process

driving the threshold variable: this process has infinite memory. Unsurprisingly, in

both cases, the estimator turns out to be biased. However, in spite of a strong correlation

between the shock in the regime variable and the error term, the bias is moderate.

In Column (3), only the estimates ˆα2 are biased, to a degree that is similar to the

performance of the QD2 estimator under the same (unfavourable) parameter values.

123

Panel estimation of state-dependent adjustment

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

Quasi-Differences 1: Bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 1 Mean bias for estimates on the basis of QD1, with α1 = 0.3 and α2 varying

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.020

-0.015

-0.010

-0.005

0.000

0.005

0.010

0.015

Quasi-Differences 2: Bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 2 Mean bias for estimates on the basis of QD2, with α1 = 0.3 and α2 varying

When, as assumed in Column (4), the regime process is driven by a process with

infinite memory, the resulting bias is larger, similar in size to the weak performance

of the QD1 estimator when one of the coefficients is large. Figure 6 shows how in this

latter case the bias depends on the alpha-parameters.

123

U. von Kalckreuth

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.05

-0.04

-0.03

-0.02

-0.01

0.00

0.01

0.02

Quasi-Differences 1: Bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 3 Mean bias for estimates on the basis of QD1, with α1 = 0.8 and α2 varying

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.0100

-0.0075

-0.0050

-0.0025

0.0000

0.0025

0.0050

0.0075

0.0100 Quasi-Differences 2: Bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 4 Mean bias for estimates on the basis of QD2, with α1 = 0.8 and α2 varying

The specification tests do not fail to detect the erroneous assumption regarding

the warranted order of differentiation. In both cases, the regime constant test rejects

the specification in 100% of the cases. As the estimated coefficients are of opposite

sign, they cannot be caused by trending target values. The regime dummies

123

Panel estimation of state-dependent adjustment

Table 3 Generalised Differences estimation, (α1, α2) = (0.3, 0.8), 1,000 runs

... using appropriate leads ... using inappropriate leads

Specification state variable (1) (2) (3) (4)

underlying regimes MA(0) MA(1) MA(1) AR(1)

lead = 2 lead = 3 lead = 2 lead = 2

α1

Mean estimate (true value 0.3) 0.2990 0.2994 0.2950 0.2767

Mean est. std. dev. 0.0215 0.0286 0.0243 0.0223

Mean bias −0.0010 −0.0006 −0.0050 −0.0232

RMSE 0.0118 0.0276 0.0239 0.0322

Freq. rejections of true value 6.4% 3.8% 5.1% 18.0%

on 5% conf. level

α2

Mean estimate (true value 0.8) 0.7978 0.7995 0.7736 0.7568

Mean est. std. dev. 0.0261 0.0304 0.0298 0.0276

Mean bias −0.0021 −0.0005 −0.0264 −0.0432

RMSE 0.0113 0.0296 0.0399 0.0518

Freq. rejections of true value 3.7% 4.5% 14.2% 34.6%

on 5% conf. level

Specification tests

G1

Mean estimate −0.0001 0.0000 −0.0765 −0.0977

Mean est. std. dev. 0.0116 0.0145 0.0114 0.0102

Freq. rejections of zero value 5.8% 5.4% 100% 100%

on 5% conf. level

G2

Mean estimate −0.0007 −0.0010 0.0822 0.0977

Mean est. std. dev. 0.0114 0.0141 0.0114 0.0102

Freq. rejection of zero value 4.9% 4.7% 100% 100%

on 5% conf. level

Freq. rejection by Sargan–Hansen 5.1% 4.8% 91.9% 23.2%

on 5% conf. level

Av. no. of valid observations 15,058 11,277 14,107 15,117

Notes: the table shows GMM estimates of α1 and α2 on the basis of Generalised Differencing, see Proposition

2. Columns vary by the stochastic specification of the regime indicator when generating the panels

according to Eq. (2) and by the lead used for transformation. Columns (1), (2), and (3) specify processes

where the memory of the regime variable is limited over time, and the state variable that underlies the regime

indicator follows an MA process. In column (4), the regime process is supposed to have infinite memory.

In all columns, α1 = 0.3 and α2 = 0.8. Each column represents 1,000 repetitions of two-stage GMM

estimates using an unbalanced panel of 3,000 individuals with 10, 9 and 8 observations (1,000 individuals

each). The number of valid observations is reduced by the need to transform variables. Instruments are the

levels of ri,t−1 yi,t−1 (i.e. two interaction terms) and a constant. G1 and G2 are regime dummy coefficients

introduced as a specification test for the correct lag length, see Subsection 3.3. Sargan-Hansen test is the test

of overidentifying restrictions associated with Sargan (1958) and Hansen (1982). Estimation is executed

using DPD package version 1.2 on Ox version 3.30 and additional, user-written routines

have ‘captured’ the regime-specific non-zero expectations of the differenced residuals

E εi,t − εi,t−2 ri,t−1,ri,t−1 = ri,t−3 for the two values that ri,t−1 can take.

The Sargan test is sensitive for the misspecification in Column (3) where the wrong

lead is used, rejecting 91.9% of the estimates. Detecting an infinite memory of the

regime variable is harder for the Sargan test: only 23.2% of estimates in Column (4)

are rejected.

123

U. von Kalckreuth

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

-0.0030

-0.0025

-0.0020

-0.0015

-0.0010

-0.0005

0.0000

0.0005 GD estimation: bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 5 Mean bias for Generalised Differences estimates, with α1 = 0.8 and α2 varying. Here regime

process uncorrelated over time, correct lead of 2

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.055

-0.050

-0.045

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

GD estimation: bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 6 Mean bias for Generalised Differences estimates, with α1 = 0.8 and α2 varying. Here regime

process unlimited memory AR(1), misspecified lead of 2

Table 4, together with Figs. 7 and 8, show simulation results for the level estimator,

both for the case of a predetermined regime and a contemporaneous regime.

In both cases, a regime process with infinite memory is assumed. In the table and

123

Panel estimation of state-dependent adjustment

Table 4 Level estimation, 1,000 runs

Simulation # (1) (2) (3) (4)

Regime indicator Predetermined Contemporaneous

State variable underlying regimes AR(1)

True α1 0.3 0.3 0.3 0.3

True α2 0.8 1.1 0.8 1.1

α1

Mean parameter estimate 0.3031 0.3004 0.3006 0.2951

Mean bias 0.0031 0.0004 0.0006 −0.0049

Mean estimated std. deviation 0.0197 0.0074 0.0255 0.0073

Std. dev. parameter estimate 0.0187 0.0078 0.0252 0.0079

RMSE 0.0190 0.0078 0.0252 0.0094

Freq. rejections of true value on 5% conf. level 4.3% 4.2% 4.7% 10.5%

α2

Mean parameter estimate 0.7987 1.1001 0.7891 1.1004

Mean bias −0.0013 0.0001 −0.0109 0.0004

Mean estimated std. deviation 0.0188 0.0013 0.0283 0.0017

Std. dev. parameter estimate 0.0191 0.0013 0.0285 0.0017

RMSE 0.0192 0.0013 0.0305 0.0017

Freq. rejections of true value on 5% conf. level 5.7% 5.2% 7.0% 5.6%

Auxiliary regime constants

G1

Mean estimate 0.6985 0.698 0.6795 0.6922

Theoretically expected 0.7 0.7 0.7 0.7

G2

Mean estimate 0.2030 −0.0984 0.2434 −0.0852

Theoretically expected 0.2 −0.1 0.2 −0.1

Freq. rejection by Sargan–Hansen on 5% conf. level 4.1% 3.0% 5.2% 4.9%

Valid obs. in estimation 24,000 24,000 24,000 24,000

Notes: the table shows GMM estimates of α1 and α2 on the basis of level estimation (see Proposition 3).

Columns vary by parameters α2 and by the stochastic specification of the regime indicator used for generating

the panels according to Eq. (2). In all cases, the regime process is supposed to have infinite memory,

following an AR(1) process. Columns (1) and (2) relate to processes where the regime variable is predetermined

in the adjustment equation, and Columns (3) and (4) relate to results for regime variables that are

contemporaneously correlated with the error term. In all columns, α1 = 0.3. While Columns (1) and (3)

specify α2 = 0.8, columns (2) and (4) show results for α2 = 1.1. Each column represents 1,000 repetitions

of two-stage GMMestimates using an unbalanced panel of 3,000 individuals with 10, 9 and 8 observations

(1,000 individuals each). Instruments are first differences of lagged endogenous variables, interacted with

the regime indicators, ri,t−1 yi,t−1 ri,t−2 yi,t−2 (i.e. four variables) plus dummies for the first regime

from ri,t−1 and ri,t−2. G1 and G2are coefficients of regime dummies introduced into the equation to capture

the regime-specific shift term in Eq. (24). Sargan–Hansen test is the test of overidentifying restrictions

associated with Sargan (1958) and Hansen (1982). Estimation is executed using DPD package version 1.2

on Ox version 3.30 and additional, user-written routines

the figures α2 varies, with a fixed value of α1 = 0.3. In the predetermined case,

there is little bias over the whole range of parameters, with the possible exception

of α2 = 1, where the bias of ˆα1 assumes a moderate value of 0.0117 (not shown in

the table). Standard deviations are similar to those that were obtained with the other

estimators. If α2 assumes a value larger than 1, then the estimates become extremely

exact.

Columns (3) and (4), as well as Fig. 8, show that the level estimator indeed successfully

copes with contemporaneous regime variables, a problem that cannot be

123

U. von Kalckreuth

solved by any of the other approaches. There is a moderate bias that peaks at 0.012

for ˆα2 when α2 = 0.9 (not shown in the table), and the standard deviations are higher

than with a predetermined regime for α2 < 1. Again, for α2 > 1, the level estimates

become very exact. In all the columns, the regime dummy is very near the theoretical

value of E 1 − αi,t μe, a term that is introduced into Eq. (24) by splitting up the firm

fixed effect into its expectation and a deviation uncorrelated with the shocks in the

other processes.

7 Conclusion and outlook

Four different ways of estimating an adjustment equation with time-varying persistence

have been presented, all within a GMM framework, albeit with a different set

of moment conditions.

Two estimation techniques rely on transforming the original equation using quasidifferences.

Both quasi-differences estimators are very precise when all the coefficients

are small. When both coefficients are large and of similar size (high persistence

throughout the regimes), the results of QD1 estimation have been shown to be unusable

in simulation, whereas the QD2 approach continues to deliver correct results. In

von Kalckreuth (2008a), the QD2 estimator is successfully employed for estimating

differential adjustment speeds for the capital stock. The most difficult parameterisation

is observed when coefficients are widely different, while one of them is large.

While affected by small sample problems, the QD2 estimator performs clearly better

in this situation. In direct comparison, the major virtue of the QD1 estimator lies in

its surprising simplicity.

The third method involves transformation using Generalised Differences, with a

lead that is long enough to overcome the memory of the εi,t -shocks in the process

driving the regime indicator. This method is applicable only when the memory of the

regime process is limited.We have seen above how to test this requirement. Although

a limited memory may be a good approximation in a number of circumstances, such

as investment under financing constraints, the requirement will not always be fulfilled.

If the conditions are met, then this method leads to a linear estimator which remains

unbiased also if some of the coefficients are in the neighbourhood of 1 or larger. The

fourth method leaves the equation untransformed, and past differences are used as

instruments. Regime dummies are employed to capture and neutralise the time-varying

non-zero expected value of the residual process. Thememory of the regime process

is irrelevant for this technique. However, one needs to assume the individual-specific

deterministic equilibrium as being unrelated to the process governing the idiosyncratic

error, changes in the forcing term xi,t , the initial deviation and the regimes. The level

estimator is very precise with regard to larger coefficients. This is not really surprising:

the use of level equations has originally been proposed to overcome the problem

of weak instruments in cases where the autoregressive parameter approaches unity.

More important is another virtue of the fourth method: the level estimator is the sole

procedure that can be used when the regime indicator is contemporaneous to the error

term in the adjustment equation.

123

Panel estimation of state-dependent adjustment

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

-0.0025

0.0000

0.0025

0.0050

0.0075

0.0100

Level estimation: bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 7 Mean bias for level estimation with predetermined regimes, with α1 = 0.3 and α2 varying

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

-0.012

-0.010

-0.008

-0.006

-0.004

-0.002

0.000

Level estimation: bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 8 Mean bias for level estimation with contemporaneous regimes, with α1 = 0.3 and α2 varying

In dealing with a practical estimation problem, one first of all needs to decide

whether the assumption of contemporaneous regimes is warranted in the given situation.

If this is the case, then it is the quasi-differencing methods that impose the least

stringent conditions. They can be used and interpreted like first-difference estimations

in the standard model. Owing to the fact that the nonlinear transformation will affect

123

U. von Kalckreuth

the latent terms in QD1 stronger than in QD2, the latter is to be preferred, although

the former may be used as a starting point for specification search and as a convenient

way to generate initial values for QD2 estimation. If adjustment speeds are low and

dissimilar, then even QD2 can lead to non-trivial small sample biases. In this case, the

Generalised Differences method may be preferable, subject to a test on the requirement

of finite memory, in spite of losing many observations.

If adjustment regimes are likely to be contemporaneously correlated, then estimation

using the level approach is possible if the fixed effect is ‘benevolent’, as explained

above. In that case, the level approach will also be a useful device if the speed of adjustment

is slow. Under the same conditions, one may also use the lagged first differences

as instruments in a differenced equation. This, however, is less efficient, as observations

are lost and not all the information in the moment restriction is used.

In deciding upon the use of the moment conditions, one may ask whether they

can be meaningfully combined in estimation. In general, this will not be the case.

If regimes are contemporaneously correlated with the idiosyncratic error term, then

the moment conditions from Propositions 1 and 2 should not be used at all. If the

regime can be considered as predetermined, then one will want to avoid imposing the

additional conditions needed for the level estimator. They are more restrictive than

in the standard autoregressive case. With predetermined regimes, the moment conditions

from Proposition 1 and 2 are to be regarded as alternatives. For higher speeds

of adjustment, they are very similar, and a possible gain from augmenting QD2 by

Generalised Differences will not be worth the risk of erroneously imposing additional

constraints, whereas, for very low and dissimilar speeds, even QD2 breaks down and

Generalised Differencing should be used on its own.

Appendix A: A state-dependent ECM

Formally, the state-dependent adjustment equation considered in this article involves a

lagged dependent variable and a forcing term xi,t . However, in addition, higher-order

adjustment processes can be accommodated, by redefining states appropriately.

Consider a linear autoregressive process with distributed lags in a forcing term xi,t

and an individual specific constant μi :

A (L) yi,t = B (L) xi,t + μi + εi,t .

where A (L) and B (L) are lag polynomials. As is well known, the process can always

be written in the error correction format. If, for example, A (L) and B (L) are of order

2, then this leads to

yi,t = −φ yi,t−1 − β

xi,t−1 − μ

∗

i + γ 0

xi,t + γ 1

xi,t−1 + ω yi,t−1 + εi,t .

In the first line, the term in brackets is the deviation from the static equilibrium, where

β may be interpreted as a cumulative long-run effect of a shock in xi,t . The transformed

constant μ

∗

i is equal to [A (L)]−1 μi. The termφ is the speed of adjustment. If

the process is stable, then |φ| < 1. The second line depicts the transitional dynamics,

123

Panel estimation of state-dependent adjustment

which is not directly related to the deviation from equilibrium. With A (L) or B (L)

of order higher than 2, the transitional dynamics in the error correction format would

involve higher-order lags of differences xi,t and yi,t .

A generalisation of the adjustment process considered hitherto makes φ, γ 0, γ 1,

and ω state-dependent, while leaving the transformed constant μ

∗

i and the long-run

effect β time invariant. The latter imposes a constraint on the time-varying coefficients:

yi,t = −φi,t−1 yi,t−1 − β

xi,t−1 − μ

∗

i + γ 0

i,t−1

xi,t

+γ 1

i,t−1

xi,t−1 + ωi,t−1 yi,t−1 + εi,t .

Now let again ri,t be an indicator variable characterising the speed of adjustment. As

the adjustment process is parameterised over two lags, it is straightforward to model

the time-varying parameters as a function involving the state variables in two periods,

t −1 and t −2. Finally, let di,t−1 be an indicator vector of dummies for all the possible

values ri,t−1, ri,t−2 can take. Then we can write

φi,t−1 = ϕ

di,t−1, ωi,t−1 = ω

di,t−1, γ 0

i,t−1

=

0di,t−1, γ 1

i,t−1

=

1di,t−1,

with ϕ, ω,

0 and

1 vectors and matrices of state-dependent adjustment coefficients

remaining to be estimated. Written this way, the problem is fully equivalent to the one

that has been treated in this article, with di,t−1 taking the place of ri,t−1 with respect

to the adjustment speed, φi,t−1, and using appropriate interaction terms for all the

other state-dependent coefficients.With the help of quasi-differencing or Generalised

Differencing, one can eliminate the fixed effect from the adjustment equation. With

contemporaneous adjustment coefficients, one may use the level estimator. It has to

be noted though that—compared to a first-order adjustment process—the Generalised

Difference estimator will be difficult to use, as there are L2 states to be considered

here, and only pairs of observations belonging to the same regime with a given minimum

time distance can be used. The other two estimation principles are not affected

by this profusion of states, except for the fact that the number of coefficients is higher.

Appendix B: Proofs

Proof of proposition 1 If E εi,t

i,t−1 = 0, then any function f i,t−1 will be

orthogonal to εi,t , because

E f i,t εi,t = E i,t E f i,t−1 εi,t

i,t−1

= E i,t f i,t−1 E εi,t i,t−1 = 0. (29)

Consider first E yi,t−pψi,t , with p ≥ 2. Equation (3) and (4) show that yi,t−p is a

function of ri,t−p−1, ri,t−p−2, . . . , xi,t−p, xi,t−p−1, . . . , εi,t−p, εi,t−p−1, . . . μi , yi,0 .

The expressions 1/(1 − αi,t−1) and 1/(1 − αi,t−2) are functions of ri,t−1 and ri,t−2.

Applying (29) to the products yi,t−p/(1 − αi,t−1)εi,t and yi,t−p/(1 − αi,t−2)εi,t−1

yields E yi,t−pψi,t = 0. The same argument holds for E yi,t−pξi,t , with p ≥ 2.

123

U. von Kalckreuth

Proof of proposition 2 The proposition follows from the law of iterated expectations:

E yi,t− −p εi,t − εi,t− ri,t−1, ri,t− −1

= Eyi,t− −p E yi,t− −p εi,t − εi,t− ri,t−1, ri,t− −1, yi,t− −p

= Eyi,t− −p yi,t− −p · E εi,t − εi,t− ri,t−1, ri,t− −1, yi,t− −p = 0,

because the conditional expectation within the brackets is zero for ≥ 2 + q. The

backward solution (3) decomposes yi,t into the initial deviation, yi,0 − x

i,1β − μi ,

and the history of xi,t , ri,t and εi,t . The assumption (20) ensures that conditioning

on ri,t−1, ri,t− −1 and yi,t− −p, p ≥ 1 will preserve a zero expectation of εi,t and

εi,t− −1.

Proof of Proposition 3 The restriction (27) holds for p = k if, first,

E yi,t−kεi,t = E yi,t−k · E εi,t

yi,t−k = 0, (30)

and second,

E yi,t−k 1 − αi,t μ

∗

i = 0. (31)

Given the backward solution (23), condition (25) is sufficient for the expectation

in the bracket of (30) to be identically zero, as yi,t−k is a function of

εi,t−k, εi,t−k−1, . . . , xi,t−k, xi,t−k−1, . . . , ri,t−k , ri,t−k−1, . . . , yi,0−x

i,1β − μ .

Similarly, one has

E yi,t−k 1 − αi,t μ

∗

i = E yi,t−k 1 − αi,t · E μ

∗

i yi,t−k 1 − αi,t .

If the expectation of μ

∗

i is zero conditional on all random variables that constitute

yi,t−k according to its reduced form in (23), then the expectation in (31) vanishes.

Acknowledgments The author thanks Jörg Breitung for important discussions, encouragement and

patience. Olympia Bover made a vital comment that gave the article a new turn. Vassilis Hajivassiliou

and Sarah Rupprecht discussed earlier conference versions. Two anonymous referees made extremely helpful,

detailed and constructive comments. This article has been presented in part or fully at the 2009 Panel

Data Conference in Bonn, the 2008 Econometric Society European Meeting in Mailand, the 2007 Deutsche

Bundesbank and Banque de France Spring Conference on Microdata Analysis and Macroeconomic

Implications in Eltville, the 2007 Annual Meeting of the Verein für Socialpolitik in Munich and the 2007

CES-Ifo Conference on Survey Data in Economics—Methodology and Applications, in Munich.

References

Anderson TW,HsiaoC (1982) Formulation and estimation of dynamicmodels using panel data. J Economet

18:47–82

Arellano M, Bond S (1991) Some tests of specification for panel data: Monte Carlo evidence and an application

to employment equations. Rev Econ Stud 58:277–297

Arellano M, Bover O (1995) Another look at the instrumental variable estimation of error component

models. J Economet 68:29–51

123

Panel estimation of state-dependent adjustment

Bayer C (2006) Investment dynamics with fixed adjustment costs and capital market imperfections. J Monetary

Econ 53:1909–1947

Blundell R, Bond S (1998) Initial conditions and moment restrictions in dynamic panel data models. J

Economet 87:115–143

Bond S, LombardiD (2007) To buy or not to buy? Uncertainty, irreversibility and heterogeneous investment

dynamics in Italian company data. IMF Staff Papers 53:375–400

Bond S, Elston JA, Mairesse J, Mulkay B (2003) Financial factors and investment in Belgium, France,

Germany, and the United Kingdom: a comparison using company panel data. Rev Econ Stat 85:153–

165

Caballero RJ, Engel EMRA (1999) Explaining investment dynamics in U.S. manufacturing: a generalised

(S,s) approach. Econometrica 67:783–826

Caballero RJ, Engel EMRA (2004) A comment on the economics of labor adjustment: mind the gap: reply.

Am Econ Rev 94:1238–1244

Caballero RJ, Engel EMRA, Haltiwanger JC (1995) Plant level adjustment and aggregate investment

dynamics. Brookings Papers Econ Act 1995(2):1–39

Caballero RJ, Engel EMRA, Haltiwanger JC (1997) Aggregate employment dynamics: building from

microeconomic evidence. Am Econ Rev 87:115–137

Chamberlain G (1983) Panel data, Chap 22. In: Griliches Z, Intriligator M (eds) The handbook of econometrics,

vol II. Amsterdam, North Holland pp 1247–1318

Cooper R,Willis JL (2004) A comment on the economics of labor adjustment: mind the gap. Am Econ Rev

94:1223–1237

Davidson R, MacKinnon JG (1993) Estimation and inference in econometrics. Oxford University Press,

New York

Doornik JA (2001) Ox 3.0. An object-oriented matrix programming language, 4th edn. Timberlake Consultants,

London

Doornik JA, Arellano M, Bond S (2002) Panel data estimation using DPD for Ox documentation accompanying

the DPD for Ox module code, dated 23 Dec 2002

Hansen L (1982) Large sample properties of generalized method of moments estimators. Econometrica

50:1029–1054

Hayashi F (2000) Econometrics. Princeton University Press, Princeton

Holtz-Eakin D, Newey WK, Rosen HS (1988) Estimating vector autoregressions with panel data.

Econometrica 56:1371–1395

Judge GG, Griffith WE, Hill RC, Lütkepohl H, Lee TC (1985) The theory and practice of econometrics.

2nd edn. Wiley, New York

Sargan JD (1958) The estimation of economic relationships using instrumental variables. Econometrica

26:393–415

von Kalckreuth U (2004) Financial constraints for investors and the speed of adaptation: are innovators

special? Deutsche Bundesbank Discussion Paper Series 1, No. 20/04

von Kalckreuth U (2006) Financial constraints and capacity adjustment: evidence from a large panel of

survey data. Economica 73:691–724

von Kalckreuth U (2008a) Financing constraints, micro adjustment of capital demand and aggregate implications.

Deutsche Bundesbank Discussion Paper Series 1, No 11/08

von Kalckreuth U (2008b) Financing constraints and the adjustment dynamics of enterprises. Habilitation

thesis, University of Mannheim, May 2008

Woodford M (2003) Interest and prices. Foundations of a theory of monetary policy. Princeton University

Press, Princeton

123

DOI 10.1007/s00181-010-0419-y

Panel estimation of state-dependent adjustment

when the target is unobserved

Ulf von Kalckreuth

Received: 30 July 2009 / Accepted: 22 July 2010

© Springer-Verlag 2010

Abstract Understanding adjustment processes has become central in economics.

Empirical analysis is fraught with the problem that the target is usually unobserved.

This article develops and simulates GMM methods for estimating dynamic adjustment

models in a panel data context with partially or entirely unobserved targets and

endogenous, time-varying persistence. In this setup, the standard first differenceGMM

procedure fails. Four estimation strategies are proposed. Two of them are based on

quasi-differencing. The third is characterised by a state-dependent filter, while the last

is an adaptation of the GMM level estimator.

Keywords Dynamic panel data methods · Economic adjustment ·

GMM · Quasi-differencing · Non-linear estimation

JEL Classification C23 · C15 · D21

1 Introduction

New Keynesian economics, with its emphasis on real and financial frictions, has introduced

a focus on microeconomic adjustment dynamics into the empirical literature.

Adjustment dynamics are essential for understanding aggregate behaviour and its sensitivity

towards shocks. Important examples range from price adjustment and its sig-

This article was presented at the 2009 Panel Data Conference at the University of Bonn. It draws on Chap.

3 of the author’s habilitation thesis at the University of Mannheim.

The views expressed in this article do not necessarily reflect those of the Deutsche Bundesbank or its staff.

All the errors and omissions are those of the author.

U. von Kalckreuth (B)

Deutsche Bundesbank Research Centre, Wilhelm Epstein-Str. 14,

60431 Frankfurt am Main, Germany

e-mail: ulf.von-kalckreuth@bundesbank.de

123

U. von Kalckreuth

nificance for the New Keynesian Phillips curve (Woodford 2003), over plant level

adjustment and aggregate investment dynamics (Caballero et al. 1995; Caballero

and Engel 1999; Bayer 2006), to aggregate employment dynamics, building from

microeconomic evidence (Caballero et al. 1997). In these studies, as in von Kalckreuth

(2006), the adjustment dynamics itself becomes the principal object of analysis,

instead of being treated as an important, but burdensome obstacle to understanding

equilibrium phenomena.

In a rather general form, economic adjustment can be framed by a ‘gap equation’,

as formalised by Caballero et al. (1995):

yi,t = gi,t , xi,t · gi,t , where

gi,t = yi,t−1 − y∗

i,t

Here, subscripts refer to individual i at time t, and gi,t is the gap between the state

yi,t−1 inherited from the last period and the target y∗

i,t that would be realised if adjustment

costs were zero for one period of time. The speed of adjustment, which is written

as a function of the gap itself and additional state variables xi,t , determines the fraction

of the gap that is removed within one period of time. The adjustment function

will reflect convex or non-convex adjustment costs, irreversibility and indivisibilities,

financing constraints or other restrictions, and the uncertainty of expectations formation.

With quadratic adjustment costs or Calvo-type probabilistic adjustment, will

be a constant.1

Estimating the function is inherently difficult. In general, both y∗

i,t and gi,t will

not be observable. However, some measure of the gap is needed for any estimation,

and if explicitly depends on gi,t , this measure will move to the centre stage. In order

to address this issue, one may try to do the utmost to observe the target as exactly

as possible. The controversy between Cooper and Willis (2004) and Caballero and

Engel (2004) on interpreting the results of gap equation estimates bear testimony to

the problems that may result from imperfect measures of the gap. However, there is

an alternative. In linear dynamic panel estimation, the problem of unobserved targets

can successfully be addressed by positing an error component structure for the measurement

error and eliminating the individual fixed effect by a suitable transformation,

such as first differencing. See Bond et al. (2003) and Bond and Lombardi (2007) for

an error correction model of capital stock adjustment.

In the unrestricted, non-linear case, this approach is not feasible, as a host of

incidental parameters will preclude identification. However, there may be direct qualitative

information on the level of , e.g. from survey data, ratings or market information

services. If one is willing to treat the adjustment process as piecewise linear,

distinguishing regimes of adjustment, then, as will be shown, this information can be

harnessed to eliminate the incidental parameters from the problem completely.

1 Calvo-type adjustment refers to adjustment costs that are infinite with probability 1 − λ and zero with

probability λ. In other words: a randomly drawn share λ of market participants receives the chance to

adjust costlessly. As a modelling device, this assumption is ubiquitous in the monetary Dynamic General

Equilibrium literature. Sometimes this state-independent adjustment is playfully referred to as the working

of the ‘Calvo fairy’.

123

Panel estimation of state-dependent adjustment

Linear dynamic panel estimation was pioneered by Anderson and Hsiao (1982),

and it was developed and perfected by Holtz-Eakin et al. (1988); Arellano and Bond

(1991); Arellano and Bover (1995) and Blundell and Bond (1998). This article shows

how classic dynamic panel estimation methodology can be adapted for the analysis of

economic adjustment if the target is unobserved and the nonlinearity takes the form of

discrete regimes. This is not straightforward, as the unknown and time-varying adjustment

coefficient interacts with the equally unknown individual specific measurement

error. However, the reward is substantial: a well-known array of estimation procedures

and tests can be brought to bear on the investigation of economic adjustment.

The estimation methods presented here are geared to short panels that do not allow

a full direct identification of individual targets. The study was motivated by the problem

of characterising the speed of capital stock adjustment as depending on financing

constraints, in an environment where categorical information on the financing situation

is available; see von Kalckreuth (2008a) and von Kalckreuth (2008b).2 The

procedures allow addressing a number of important research questions, including the

state-dependence of pricing behaviour (is there a Calvo fairy?), the adjustment of

the financial structure of companies or banks after shocks, the asymmetry of factor

adjustment (downward rigidities, firing costs), or the implications of irreversibility.

Section 2 of this article characterises the stochastic process to be estimated.

A continuous scalar and a discrete regime vector are evolving jointly, and the

adjustment speed of the continuous-type variable depends on the regime. It is

shown that the standard procedure for estimating linear dynamic panel models

is not applicable. Section 3 assumes predetermined regimes and proposes two

estimators on the basis of quasi-differencing—one of them with the virtue of

great simplicity, the other being more efficient. Both are nonlinear, which may

lead to a small sample bias if in one of the regimes the adjustment speed is

almost zero. A Generalised Methods of Moments (GMM) estimator using statedependent

filtering is suggested, which is immune to this problem. Section 4 works

out sets of moment conditions that can be applied when the regimes are contemporaneously

correlated. Using a level estimator on an amplified equation, the assumption

of predetermined regimes can be dropped at the price of stricter prerequisites regarding

the fixed effect. Under the same conditions, a version involving first differences

is feasible, too. Section 5 compares the moment conditions and discusses their use.

Section 6 tests the proposed routines in a Monte Carlo study. Section 7 concludes.

Appendix A discusses error correction models with state-dependent dynamics, and

Appendix B contains the proofs.

2 A regime-specific adjustment process

A situation where a variable yi,t reverts to some target level y∗

i,twhich is characteristic

of individual i is examined. The speed of adjustment is state-dependent, following the

equation

2 The study successfully applies the estimator QD2, as exposed in Sect. 3 of this article.

123

U. von Kalckreuth

yi,t = − 1 − αi,t−1 yi,t−1 − y∗

i,t + εi,t , (1)

with

αi,t = α

ri,t .

The L-dimensional column vector α holds the state-dependent adjustment coefficients

relevant for each state. The adjustment coefficient αi,t = α

ri,t varies over time and

individuals, depending on the state ri,t, an L-dimensional column vector of regime

indicator variables, with one element taking a value of 1, and all others being zero. The

adjustment speed at date t is given by 1 − αi,t−1 . If the process is stable, it would

eventually settle in the target in the absence of shocks. The target level y∗

i,t is unobservable.

The panel dimension can help identify the adjustment process nonetheless, as it

allows an error component approach for modelling the unobserved target. An assumption

is made of the target to follow an equation that contains an individual-specific

latent term:

y∗

i,t

= x

i,tβ + μi .

The idiosyncratic componentμi in the adjustment equationmay reflect ameasurement

error or unobserved explanatory variables. The vector xi,t may encompass random

explanatory variables, deterministic time trends and also time dummies. In its absence,

the target level is entirely unobservable, but static. A generalized, error correction version

of the adjustment equation is discussed in the Appendix A.

Solving Eq. (1) for yi,t yields:

yi,t = αi,t−1 yi,t−1 + 1 − αi,t−1 x

i,tβ + 1 − αi,t−1 μi + εi,t

la te nt

. (2)

For later purposes, it is useful to state the backward solution to this stochastic difference

equation. For t ≥ 1 and a given starting value yi,0 it is

yi,t = yi,0 − x

i,1β − μi

t−1

τ=0

αi,τ + x

i,tβ + μi + Ai,t , (3)

with

Ai,t =

t−1

l=1

εi,l − x

i,l+1β

t−1

τ=l

αi,τ + εi,t . (4)

The solution has three components. The first term captures the influence of the initial

deviation. The second term is the target level at time t, x

i,tβ+μi . The third term, Ai,t ,

represents the effect of shocks and target changes, past and present. In the long run,

when the influence of the initial conditions has died out, Ai,t is equal to the deviation

from the target.

123

Panel estimation of state-dependent adjustment

In Eq. (2), both the individual effect and xi,t interact with a time-varying and endogenous

variable. This precludes the classical strategy for estimating linear dynamic

panel equations with fixed effects, namely to transform the equation by taking first

differences and use moment conditions involving higher lags of the dependent and

explanatory variables to accommodate for the fact that the transformed residual will

be correlated with the lagged endogenous variable. First differencing the Eq. (2) yields

yi,t = α

ri,t−1 yi,t−1 + (1 − α)

ri,t−1x

i,t β − α

ri,t−1μi + εi,t . (5)

Unlike the case of linear adjustment, the expression containing the unobserved μi is

not differenced out, and we have to deal with a time-varying error component that

is correlated with the explanatory variables. The following sections are devoted to

finding moment restrictions that make estimation feasible. The last set of restrictions

that will be discussed actually involves an amplified version of Eq. (5).

3 Predetermined regimes

In most applications, it will not be possible to treat ri,t as fully exogenous. If, for

example, εi,t is the error term in a capital accumulation equation and ri,t is a regime

indicating the degree of financing constraints, then the two variables should be correlated.

This section examines the case when the regime indicator, ri,t−1, can at least

be considered as predetermined with respect to the contemporaneous error term, εi,t .

Let us start by assuming the error term to be a martingale difference sequence:

E εi,t i,t−1 = 0, with

i,t−1 =

ri,t−1, ri,t−2, . . . , xi,t−1, xi,t−2, . . . , εi,t−1, εi,t−2, . . . , μi , y0i . (6)

Accommodation of the more general assumption

E εi,t

∗

i,t−k = 0, k ≥ 1, with

∗

i,t−k

=

ri,t−1, ri,t−2, . . . , xi,t−k , xi,t−k−1, . . . , εi,t−k, εi,t−k, . . . , μi , y0i , (7)

is straightforward. Note that

∗

i,t−k in assumption (7) is not simply a lagged version

of i,t−1, as the generalisation maintains the assumption of a predetermined ri,t−1.

The case of contemporaneously correlated regime indicators will be treated in Sect. 4.

3.1 Two moment conditions based on quasi-differencing

This subsection discusses two nonlinear transformations of the adjustment equation

that serve to eliminate the unobserved heterogeneity. Holtz-Eakin et al. (1988)

proposed quasi-differencing as a strategy in a case where fixed effects are subject

to time-varying shocks that arecommonacross individuals.3 It is nowexplored whether

3 See also Chamberlain (1983), pp. 1263–1264.

123

U. von Kalckreuth

this method can be generalised to themore complicated case at hand, where adjustment

coefficients are endogenous and vary over time and individuals.

Applied to the problem at hand, the quasi-differencing procedure as proposed by

Holtz-Eakin et al. (1988) would involve lagging Eq. (2), multiplying both sides by

1 − αi,t−1 / 1 − αi,t−2 and subtracting the result from Eq. (2). After reordering

coefficients, this gives

yi,t−1 − αi,t−1

1 − αi,t−2

αi,t−2 yi,t−1 − 1 − αi,t−1 x

i,tβ=εi,t − 1 − αi,t−1

1 − αi,t−2

εi,t−1.

(8)

The unobserved heterogeneity has duly been eliminated, but the error structure is difficult

to deal with, because αi,t−1 will in general be correlated with εi,t−1 and αi,t−2.

The underlying idea nonetheless leads to useful moment conditions, actually in two

different ways. First, dividing Eq. (8) by 1 − αi,t−1 gives

1

1 − αi,t−1

yi,t − αi,t−2

1 − αi,t−2

yi,t−1 − x

i,tβ = ψi,t ,

with ψi,t = εi,t

1 − αi,t−1

− εi,t−1

1 − αi,t−2

. (9)

This transformation—which shall be referred to as ‘QD1’—corresponds to solving

Eq. (1) for the deviation from the target, yi,t−1 − x

i,tβ − μi , and then solving the

lagged version of (1) for the past deviation from the target, yi,t−2 −x

i,t−1β −μi , and

finally differencing μi out. On the basis of Eq. (9), moment conditions for parameter

estimation can be formulated.

Second, we may multiply Eq. (9) by 1 − αi,t−2, to obtain

1 − αi,t−2

1 − αi,t−1

yi,t − αi,t−2 yi,t−1 − 1 − αi,t−2 x

i,tβ = ξi,t , (10)

with ξi,t = 1 − αi,t−2

1 − αi,t−1

εi,t − εi,t−1. (11)

This transformation shall be labelled ‘QD2’. It corresponds to multiplying Eq. (1) by

1 − αi,t−2 / 1 − αi,t−1 and subtracting the lag of the original adjustment equation.

Proposition 1 Under assumption (6) assuming the absence of serial correlation in

the error term, the levels yi,t−p, p ≥ 2, are instruments in Eqs. (9) and (10):

E yi,t−pψi,t = 0, (12)

E yi,t−pξi,t = 0. (13)

Proof See Appendix B.

Likewise, it can be shown that xi,t−p and the regime indicators ri,t−p, p ≥ 2,

are instruments in the Eqs. (9) and (10). If assumption (6) of no serial correlation is

123

Panel estimation of state-dependent adjustment

replaced by (7), then the set of instruments is pushed backwards in time accordingly:

The lags yi,t−k−p and xi,t−k−p, p ≥ 1 are instruments in the Eqs. (9) and (10). Note

that the regime indicator ri,t−1 is still assumed to be predetermined with respect to

εi,t ; thus, all lags ri,t−p, p ≥ 2 are instruments irrespective of k.

To discuss estimation on the basis of the two sets of moment conditions, it is useful,

however, to restate the transformations (9) and (10). Equation (9) has the convenient

feature that x

i,tβ enters additively. Collecting terms, one obtains

ψi,t = yi,t−1 + 1

1 − αi,t−1

yi,t − 1

1 − αi,t−2

yi,t−1

− x

i,tβ

= yi,t−1 + γ

ri,t−1 yi,t − x

i,tβ

= yi,t−1 + γ

ri,t−1 yi,t − x

i,tβ (14)

with γ

= 1

1−α1

. . . 1

1−αL

. (15)

Equation (14) is linear in the coefficient vectors γ and β, and can be estimated by

linear GMM using the moment conditions (12) of Proposition 1. The structural coefficients

α are related to the elements of γ by the nonlinear one-to-one transformation

(15). Inverting this transformation, therefore, gives a nonlinear GMM estimator of α.

Standard deviations and co-variances can be assessed using the delta method.

Making use of QD2 for GMM estimation is trickier. Let d ri,t−2, ri,t−1 be an

L2 × 1 indicator vector, where each element is a dummy variable indicating one of

the possible combinations of ri,t−2 and ri,t−1. Let λ be the vector of coefficients

1 − αi,t−2 / 1 − αi,t−1 corresponding to the elements of d (·):

λ

= 1 1−α1

1−α2

1−α1

1−α3

· · · 1−αL

1−αL−2

1−αL

1−αL−1

1 .

Let furthermore δ be a vector of products of the adjustment coefficients and β:

δ = (1 − α) ⊗ β =

⎛

⎜⎜⎜⎝

(1 − α1) β

(1 − α2) β

...

(1 − αL ) β

⎞

⎟⎟⎟⎠

.

Finally, let

h (α, β) =

⎛

⎝

λ

−α

−δ

⎞

⎠

(16)

123

U. von Kalckreuth

be an L (L + 1 + K) × 1 vector of reduced form coefficients, of which L (L + K)

are unknown. This results in

ξi,t = λ

d ri,t−2, ri,t−1 yi,t − α

ri,t−2 yi,t−1 − δ

ri,t−2 xi,t

= d ri,t−2, ri,t−1

yi,t r

i,t−1 yi,t−1 r

i,t−1 xi,t h (α, β) . (17)

In this case, there is no convenient one-to-one transformation from the elements

of h (α, β) to the underlying structural parameters. The nonlinearity of the problem

therefore has to be treated explicitly. Consider the simplest case, with two states and

no explanatory variables xi,t . Then λ and α have two elements each and one can write

π

= h (α)

= 1 1−α1

1−α2

1−α2

1−α1

1 −α1 −α2 .

Though nonlinear in the parameters, this equation is linear in the transformed variables.

This makes it easy to apply the Gauss–Newton method for solving the optimisation

problem inherent in GMM estimation. The Gauss–Newton method iterates

on a linearised moment function, sequentially improving the estimation. Calculating

pseudo-observations for each step, the estimation problem can be solved using routines

for the estimation of linear econometric models.4 As initial values for the iteration,

one can use the results from QD1 estimation exposed earlier in this section.

The transformations QD1 and QD2 are nonlinear, and the stochastic properties of

the transformed residuals depend on the adjustment parameters. Consider the transformed

residuals ψi,t = εi,t /(1 − αi,t−1) − εi,t−1/(1 − αi,t−2) on the one hand and

ξi,t = (1 − αi,t−2)/(1 − αi,t−1)εi,t − εi,t−1 on the other. The variance of ψi,t , will

become large if one or both alpha-coefficients are in the neighbourhood of 1, creating

problems in small samples. An adjustment coefficient approaching 1 will affect

the transformed error term of QD2, ξi,t , to a lesser degree. First, only one of the two

components of the difference is affected. Second, the effect is mitigated by the denominator,

1 − αi,t−2. Indeed, if the alpha coefficients in different regimes are of similar

size, the random factor will stay in the neighbourhood of 1. Therefore, when the alpha

coefficients are high (i.e. adjustment speed is low), considerable efficiency gains can

be expected from using QD2. This will be investigated in a simulation study in Sect. 6.

3.2 Generalised Differencing

As has been exposed above, the nonlinear transformations QD1 and QD2 may lead

to poor results if in one or more of the regimes the adjustment speed is very low.

The transformations cannot be used at all if one of the regimes is characterised by an

adjustment speed of exactly zero. This is a case of considerable theoretical interest,

4 The Gauss–Newton method has originally been developed for nonlinear least squares problems. See

Davidson and MacKinnon (1993) on the use of Gauss–Newton in nonlinear least squares and instrumental

variables estimation, Hayashi (2000), on GMM estimation, and Judge et al. (1985) on numerical methods

in maximisation. An unpublished appendix on the use of Gauss–Newton in the current context is available

from the author upon request.

123

Panel estimation of state-dependent adjustment

as the presence of fixed adjustment costs or irreversibility leads to bands around the

target where no adjustment takes place—the solution to the stochastic control problem

triggers adjustment when some threshold level is surpassed. Threshold behaviour

should be expected for decisions on single projects, not for firms or sectors, where

many such projects are aggregated. However, for small units it is certainly useful to

explicitly consider regimes of no adjustment, as have done Caballero et al. (1995) in

the context of plant level investment.

Therefore, it is worth asking whether there is a transformation that eliminates the

fixed effect in the target equation without affecting the size of the idiosyncratic errors.

It turns out that there is such a transformation, provided that the regime indicator has

limited memory with respect to εi,t . Consider again the first-differenced adjustment

Eq. (5) above:

yi,t = α

ri,t−1 yi,t−1 + (1 − α)

ri,t−1x

i,t β − α

ri,t−1μi + εi,t .

Whenever ri,t−1 = ri,t−2, this simplifies to

yi,t = α

ri,t−1 yi,t−1 + (1 − α)

ri,t−1 x

i,tβ + εi,t .

This expression looks very much like the first difference in the linear case, although

there is more than one adjustment coefficient to estimate. It is only taking first differences

of observations that belong to different regimes which leads to a latent term

−α

ri,t−1μi that will be correlated with the lagged dependent variable under a variety

of circumstances.

As it is this term that precludes the use of the standard technique, the following strategy

comes to mind: Differences are only formed for observations with ri,t−2 = ri,t−1.

The first element of α1 is estimated on the basis of cases where two consecutive observations

belong to the first regime, and using differences of observations that both

belong to the second regime leads to inference on the second adjustment coefficient,

etc. In this straight fashion, however, the idea will not work. If ri,t−1 and εi,t−1 are

correlated and groups of observations are formed according to regimes, then the transformed

residual εi,t will have a (conditional) expectation different from zero in those

groups. This will lead to biased estimators.

Under certain additional assumptions, however, a straightforwardmodification will

yield useful moment conditions:

1. Let q be the maximum τ for which there is a correlation between ri,t and εi,t−τ ,

e.g. as a consequence of a moving average structure of the state variable driving

the regime indicator. Then the observation is to be transformed subtracting past

observations of the same regime with a lag of at least = q + 2.

2. If an observation is not matched by a 2 + q-lag in the same regime, then it may

be transformed using a higher lag > q + 2.

The first part of the rule proposes a dynamic filter, which varies according to regimes.

The second avoids the loss of many observations in cases where regimes in t and t +q

do not match.

123

U. von Kalckreuth

The th difference is

yi,t − yi,t− = α

ri,t−1 yi,t−1 − ri,t− −1 yi,t− −1

+(1 − α)

ri,t−1x

i,t

− ri,t− −1x

i,t− β

−α

ri,t−1 − ri,t− −1 μi + εi,t − εi,t− ,

which simplifies to

yi,t − yi,t− = α

ri,t−1 yi,t−1 − yi,t− −1 + (1 − α)

ri,t−1 x

i,t

− x

i,t− β

+εi,t − εi,t− , (18)

if the two observations are characterised by the same regime, such that ri,t−1 =

ri,t− −1. When does the expectation of the residual term, εi,t − εi,t− , conditional

on ri,t−1 and the equality ri,t−1 = ri,t− −1, become zero? It is sufficient that εi,t and

εi,t− are both uncorrelated with the two conditioning variables ri,t−1 and ri,t− −1.

According to assumption (6), εi,t is uncorrelated with ri,t−1 and ri,t− −1. Then the

same is true with respect to εi,t− and ri,t− −1. Therefore, by choosing , it only

remains to make sure that εi,t− and ri,t−1 are uncorrelated. With = 1, this will

not be the case if εi,t and ri,t are contemporaneously correlated. However, if ri,t is

uncorrelated with all lags of εi,t , then = 2 will ensure that

E εi,t − εi,t− ri,t−1, ri,t−1 = ri,t− −1 = 0. (19)

More generally, if there is correlation between ri,t and εi,t−τ up to lag τ = q, the

difference that guarantees the above equation to hold will have to be at least of order

= q + 2. However, one is not restricted to using only differences of the order that

is ‘just right’, i.e. q + 2. Any other difference of order ≥ q + 2 will fulfil Eq. (19)

just as well. It is straightforward to construct a difference using the most proximate

observation of the same regime with lag ≥ q + 2. With respect to admissibility of

instruments, the rules of the classic first-difference approach apply: the instruments

need to be uncorrelated with the earlier of the two observations that make up the difference.

In the following, this procedure is called the Generalised Difference estimator.

For the moment conditions to hold, it is necessary to strengthen assumption (6). In

addition to the variables in the conditioning set i,t−1, εi,t must also be uncorrelated

with the future regimes ri,t+q+1, ri,t+q+2, . . ..

Proposition 2 Let the conditional expectation of εi,t satisfy

E εi,t i,t−1, ri,t+q+1, ri,t+q+2, . . . = 0, (20)

with i,t−1 defined as in assumption (6). Then the lagged levels yi,t− −p, p ≥ 1 are

instruments in Eq. (18), the adjustment equation transformed by taking the th difference,

with ≥ q + 2, conditional on the regimes being the same in each pair of

observations:

E εi,t − εi,t− yi,t− −p ri,t−1, ri,t−1 = ri,t− −1 = 0, with ≥ q + 2.

123

Panel estimation of state-dependent adjustment

Proof See Appendix B.

Likewise, it can be shown that xi,t− −p and the regime indicators ri,t− −p are

instruments in Eq. (18), given ri,t−1 = ri,t− −1. As in Proposition 1 above, if i,t−1

in (20) is replaced by

∗

i,t−k , as defined in assumption (7), with k being the minimum

τ such that εi,t does not vary with εi,t−τ and xi,t−τ , the set of instruments is pushed

backwards in time: The lags yi,t− −k−p and xi,t− −k−p, p ≥ 1 are instruments. As

the regime indicator ri,t−1 is still assumed to be predetermined all lags ri,t−p, p ≥ 2

are instruments irrespective of k.

It is an identifying assumption for the process that drives the regime indicator to

have finite memory with respect to innovations εi,t . This is a limitation. If ri,t are correlated

with all past values of εi,t , then the conditional expectation of the transformed

error term resulting from a difference of two observations from the same regime will

not disappear. The resulting bias can be expected to wane if the minimum lag length is

chosen to be large. However, doing so would result in the loss of many observations,

exacerbating another weakness of the estimation strategy. In principle, assuming a

finite memory of the regime indicator with respect to εi,t is rather similar in kind to

the assumption of a finite memory of εi,t with respect to earlier shocks,which is needed

to use lagged endogenous variables as instruments in the standard approach. Whether

the condition (20) can be expected to hold or not will depend on the estimation problem

at hand. In the context of estimating the microeconomic adjustment of the capital

stock under financing constraints, it may be realistic to assume that, after the shock to

capital demand, the financing structure of a firm will be restored in finite time.5

3.3 Testing finite memory and deciding on the length of memory

In order to use Generalised Differencing, it is necessary to test the condition (20) and

decide on the length of the memory of the process driving the regime with respect

to εi,t . There are two simple solutions. The first is to use the test of overidentifying

restrictions associated with Sargan (1958) and Hansen (1982) to check the validity of

the moment conditions. The drawback is that this test is generally used as an omnibus

test of the specification, including the choice of the instruments. It is preferable to

have a more specific test concerning the appropriate lag length.

Such a specific test can be based on the fact that the expected value of the residual

will not disappear if the lag length chosen is too short. In that case, the choice

of observations according to regime will select positive or negative outcomes of εi,t ,

because of the correlation between the regime variable and the error component εi,t .

If regime dummies are added to the adjustment equation, then their coefficients will

be estimated as positive or negative quantities according to the direction of selectivity,

although they should be zero according to the basic specification. Furthermore,

it is known how these estimates for regime constants are distributed under the null

of a correct specification. Using a GMM estimator, they are asymptotically normal,

with mean zero, and their standard deviation is given by the standard deviation of the

coefficient. Therefore, the t-value on these coefficients is a valid test statistic.

5 For a theoretical model that makes this prediction, see von Kalckreuth (2004, 2008b, Chap. 1).

123

U. von Kalckreuth

It may be argued that this test ignores the possibility that the regime-specific constants

truly belong into the equation. Consider a trend in the term in the brackets of

Eq. (1) that makes the target level of yi,t change over time:

yi,t = − 1 − αi,t−1 yi,t−1 − κt − μi + εi,t .

Solving for yi,t yields

yi,t = αi,t−1 yi,t−1 + 1 − αi,t−1 κt + 1 − αi,t−1 μi + εi,t .

After transforming the equation by subtracting an observation belonging into the same

regime, lagged periods, one obtains

yi,t − yi,t− = αi,t−1 yi,t−1 − yi,t− −1 + 1 − αi,t−1 κ + εi,t − εi,t− .

Regime-specific constants may thus be the result of a trending target variable. Actually,

this is a case of misspecification: the time trend should have figured in xi,t. The

regime constants should be proportional to each other, with a factor of proportionality

given by the adjustment speeds.6 More generally, they should not be of different sign,

as it will be the case if the coefficient on the regime dummy collects the residuals

selected for their high or low value.

4 Moment restrictions for contemporaneously correlated regimes

All moment restrictions discussed in the previous section require the regime indicator

to be predetermined with respect to the current shock term. This may hold in many

applications, specifically if there are long planning and gestation lags as in the case of

fixed investment. In other circumstances, the error term in the adjustment equation and

the threshold variable governing the adjustment regime may be contemporaneously

correlated. Let us investigate an approach that can be brought to bear in this case.

For greater clarity, the adjustment equation shall be rewritten with a modified dating,

to highlight the possibility of a contemporaneous correlation between the speed of

adjustment and εi,t :

yi,t = − 1 − αi,t yi,t−1 − x

i,tβ − μi + εi,t , (21)

or

yi,t = αi,t yi,t−1 + 1 − αi,t x

i,tβ + μi + εi,t . (22)

It will now be shown that the requirement of predetermined regimes can be dropped at

the cost of additional assumptions regarding the fixed effect. Under these assumptions,

6 Let z1 and z2 be two regime dummy coefficients, with α1 and α2 the corresponding adjustment coefficients.

If the regime dummies result from a trending target as above, then the nonlinear restriction between

coefficients is z1/z2 = (1 − α1)/(1 − α2). It is rather straightforward to test this restriction after estimation.

123

Panel estimation of state-dependent adjustment

it is possible to leave the fixed effect in an equation amplified by regime dummies and

use first differences as instruments. Under the same conditions, first differences will

also serve as instruments for a modified version of the first differenced Eq. (5).

Level estimation was introduced by Arellano and Bover (1995) and Blundell and

Bond (1998) as a response to a specific problem arising in the standard autoregressive

model with fixed effects. If the coefficient of the lagged dependent variable is in the

neighbourhood of one, then the level behaves like a random walk, and it will be a

weak instrument in the differenced equation. These authors use the following moment

condition for estimation in the estimation of the standard autoregressive model:

E yi,t−p μi + εi,t = 0,

with p ≥ 1. If εi,t is serially uncorrelated, then it is sufficient that yi,t is mean

stationary and displays a constant correlation with μi for the moment equation to

hold. This implies a requirement on the initial conditions: the deviation of the starting

value from the stationary level needs to be uncorrelated with the stationary level itself.

The latent term of Eq. (22) is given by 1 − αi,t μi + εi,t. In the attempt to use

first differences as instruments for levels, let us first take a look at

E yi,t−p 1 − αi,t μi + εi,t .

This expectation will be zero if, first, E yi,t−p = 0, and second, yi,t−p is uncorrelated

with both 1 − αi,t−1 μi and εi,t . The first condition requires the process to

be mean stationary, as in the derivation of Blundell/Bond and Arellano/Bover. The

second condition is hard to fulfil. To see the reason, one may adjust the backward

solution in (3) and (4) to the modified dating:

yi,t = yi,0 − x

i,1β − μi

t

τ=1

αi,τ + x

i,tβ + μi + Ai,t ,

where

Ai,t =

t

l=2

εi,l−1 − x

i,lβ

t

τ=l

αi,τ + εi,t .

Plugging this back into (21) yields the expression:

yi,t = − 1 − αi,t

yi,0 − x

i,1β − μi

t−1

τ=1

αi,τ + Ai,t−1 − x

i,tβ

+ εi,t .

(23)

The difference yi,t−p is a function of all εi,τ, xi,τ and αi,τ , τ ≤ t − p, as well as of

the initial condition, the deviation yi,0 − x

i,1β − μi . One of the requirements for the

covariance of yi,t−p and 1 − αi,t μi to disappear is therefore a limited memory of

123

U. von Kalckreuth

αi,t = α

ri,t with respect to its own past. Fixed effects in ri,t are thus excluded. This

would be hard to defend in many applications, given the presence of a fixed effect in

the law of motion governing yi,t .

In order to weaken the requirements, one may decompose the individual target

level, μi , into its expectation over all individuals, μe, and the individual deviation

from this expectation, μ

∗

i . Let, therefore,

μi = μe + μ

∗

i , with μe = Ei (μi ).

By definition, E μ

∗

i = 0. Rewriting the adjustment equation in (22) gives

yi,t = α

ri,t yi,t−1 + (1 − α)

ri,tx

i,tβ + μe (1 − α)

ri,t + μ

∗

i (1 − α)

ri,t + εi,t

laten t term

.

(24)

Written this way, the equation contains a regime-specific shift term μe (1 − α) ri,t .

In estimation, this term can be taken into account by introducing the regime vector

ri,t as a regressor into the equation.

Proposition 3 Consider the conditions

E εi,t εi,t−k, εi,t−k−1, . . . , xi,t−k ,

xi,t−k−1, . . . , ri,t−k , ri,t−k−1, . . . , yi,0 − x

i,1β − μi = 0, (25)

E μ

∗

i

εi,t ,

ri,t ,

xi,t , yi,0 − x

i,1β − μi = 0, (26)

with k ≥ 1, where a term in curly brackets denotes an entire time series. Jointly, these

conditions are sufficient for the following moment restrictions to hold in Eq. (24):

E yi,t−p εi,t + 1 − αi,t μ

∗

i = 0 with p ≥ k, (27)

Proof See Appendix B.

It follows immediately from the condition (25) that appropriately lagged values

xi,t−p and ri,t−p can also be used as instruments. Some comments are in order.

It is natural that one has to impose conditions on μ

∗

i , now that μi is not differenced out

of the error term. The invariance of expected μ

∗

i with respect to the time path

εi,t

is rather unproblematic. It agrees well with the basic structure of the error component

model. The irrelevance of the regime process is less innocuous. It is well conceivable

that a real-world data generating process for ri,t may contain a fixed effect that is

correlated with μ

∗

i . Similar reservations apply with respect to the required irrelevance

of

xi,t . Finally, the necessity of having an expected value of μi that is independent

of the initial deviation was also found by Blundell and Bond (1998) when investigating

the use of moment equations for levels in a linear context. The condition is not

innocuous either: it excludes an initial condition such as yi,0 = 0. It can be replaced

123

Panel estimation of state-dependent adjustment

by the requirement that the process has been running for a ‘very long’ time, as the first

term inside the bracket of Eq. (23) will disappear asymptotically.7

As a corollary to Proposition 3, it follows that lags of yi,t can also be used as

instruments in a differenced version of the augmented adjustment Eq. (24):

yi,t = α

ri,t yi,t−1 + (1 − α)

ri,tx

i,t β − μeα

ri,t + εi,t − μ

∗

i α

ri,t

laten t term

.

Under conditions (25) and (26), the following restriction will hold8:

E yi,t−p−1 εi,t − αi,tμ

∗

i = 0 with p ≥ k. (28)

Note that the moment restrictions for differences in (28) do not use all the information

contained in the moment restriction for levels: the first are implied by the latter but not

vice versa. Furthermore, because the residuals in (28) are first differenced, one observation

is lost, and the instruments have to be removed one period in time. However,

the moment condition is not necessarily useless: estimators based on condition (28)

may be more robust against violations of assumption (26) regarding the fixed effect,

especially when regime changes are relatively infrequent, as μ

∗

i is differenced out of

(28) whenever ri,t = ri,t−1.

5 A synopsis

At this point, it is interesting to compare the conditions for Propositions 1, 2 and 3.

All of them require the expected value of εi,t to be invariant with respect to past

values εi,t−k, εi,t−k−1, . . ., the levels or first differences of xi,t−k , xi,t−k−1, . . . as well

as to μi and/or the initial deviation. Propositions 1 and 2 also need εi,t to be uncorrelated

with ri,t−1, the regime indicator figuring in the current date adjustment equation,

whereas for Proposition 3, invariance of εi,t with respect to lag k and earlier of the

regime indicator is sufficient. As an additional identifying assumption for the Generalised

Differencing approach, the memory of ri,t needs to be finite with respect to lags

of εi,t . This excludes, for example, an autoregressive process for the state variable

underlying the adjustment indicator, with the innovation contemporaneously correlated

to εi,t . The level estimator, for its part, needs the expected value of the individual

effect μi to be unrelated to the process governing the idiosyncratic error, changes

in the forcing term xi,t , the regimes and the initial deviation. Both these restrictions

may impose considerable limitations. However, estimators based on Propositions 2

and 3 are able to fulfil special tasks. The Generalised Difference estimator will be

unbiased even if some of the alpha coefficients are large—in fact, it still works if

7 Such a process may also be observed by means of a ‘short’ panel—what matters is not the length of

the panel, but whether or not the process has been running long enough to bring the effect of the initial

condition in Eq. (23) into the neighbourhood of zero.

8 This follows directly from E yi,t−p−1 εi,t + 1 − αi,t μ

∗

i = 0 and E( yi,t−p−1(εi,t +

(1 − αi,t )μ

∗

i )) = 0.

123

U. von Kalckreuth

one of them is exactly equal to 1 or even greater. Like the standard first-difference

estimator in the linear case, the Generalised Difference estimator can be supposed to

deliver imprecise results if all the adjustment coefficients are in the neighbourhood

of 1, as then the level instruments are weak. In this case, the level estimator will perform

better. Perhaps even more importantly, this latter estimator is also capable of

dealing with regime indicators that are contemporaneously correlated with the error

term.

6 Implementing and simulating the estimators

This section compares the four sets of moment conditions exposed in the Propositions

1, 2 and 3, using them separately for estimation on simulated panel data sets.

6.1 Setting up the simulation

For the regime indicator, a threshold process is specified. The kth element of ri,t is

given by

r(k)i,t = Ind ¯sk−1 ≤ si,t ≤ ¯sk .

The numbers ¯s0, . . . , ¯sL are thresholds, with the first and the last element being equal

to−∞and∞, respectively. As an example for a threshold process with infinite memory

with respect to the error term, an AR(1) is used as a process for the latent state

si,t :

si,t = asi,t−1 + υi,t ,

where the current shock υi,t is contemporaneously correlated with the error term εi,t .

Alternatively, as an example of a process with finite memory, it is assumed that the

threshold process is driven by an MA(q):

si,t = b +

q

j=0

c jηi,t−j , with c0 = 1.

The elements of the moving average conform to

E ηi,t = 0, E ηi,tηi,t−p = 0∀p > 0, E ηi,t εi,t = 0, E ηi,t εi,t−p = 0∀p > 0.

Concretely, the two interrelated processes

ri,t , yi,t are simulated as follows:

Regime-dependent error correction process: εi,t is standard normal, μi is distributed

N (1, 1) , εi,t and μi are independent.

Regime indicator process: Regarding the number of regimes, let L = 2. If the

threshold process is driven by an AR(1), then let E υ2

i,t

= 1, E υi,t εi,t = 0.8, υi,t

being calculated as a weighted sum of εi,t and an independent Gaussian process. The

123

Panel estimation of state-dependent adjustment

AR-parameter a is 0.8. Likewise, for the MA(q), the stochastic structure is chosen as

E η2

i,t

= 1, E ηi,t εi,t = 0.8, with ηi,t being calculated as a weighted sum of εi,t

and an independent Gaussian process. The threshold level is set equal to zero, resulting

in an equal number of observations in each regime on average. Let us experiment

with a MA(0) (uncorrelated regimes states) and a MA(1) with c1 = 0.8. Note that the

assumed contemporaneous correlation between the shocks in the regime equation and

the error term is very high.

Panel structure: The panel is unbalanced, with individuals carrying either 8, 9 or

10 observations, 1,000 individuals of each type, that is, 3,000 individuals in total. For

each individual, the process is simulated for 50 periods, and only the last 8, 9 or 10

observations are used for estimation.

All the estimators are implemented by first calculating the transformed observations

and the instruments and then adapting and using the routines supplied with the

DPD module for Ox proposed by Doornik et al. (2002) to perform GMM estimates

and tests.9 Details on the estimation routines are given below and in the notes to the

tables.

6.1.1 Quasi-difference estimations QD1 and QD2

Let us assume an AR(1) as a process driving the threshold variable that constitutes the

regime. The estimation equations are transformed in the way described in Sect. 3. The

first quasi-differencing approach, QD1, is implemented by estimating the transformed

equation using a standard linear GMM estimator and then calculating the structural

parameters by inverting Eq. (15). The more complicated QD2 estimation is performed

by treating the moment as a nonlinear function of the structural parameters, using the

iterative Gauss–Newton method.

Estimates on the basis of the QD1 transformation are used as initial values. As

instruments, levels lagged twice are used. It turns out that the instruments are more

informative (the estimates being more precise) if they are separated out in regimes,

which means: For purposes of instrumentation, the lags of yi,t−2 are interacted with

regime dummies, ri,t−2.

6.1.2 Generalised Difference estimation

The transformation described in Proposition 2 consists in taking the th difference,

with chosen such that regimes ri,t−1 and ri,t− −1 match, subject to some minimum

order of difference. Available instruments are levels lagged + 1, + 2, . . .. As the

appropriate depends on the regime process, so does the set of instruments. By taking

the earlier of the two observations as a point of reference yi,t and assigning to it

the nearest lead yi,t+ of the same regime with ≥ 2 + q, the definition of suitable

instruments is straightforward. One can uniformly use lags yi,t−1, yi,t−2 and earlier as

instruments. As in Quasi-Difference estimation, let us interact the lagged levels yi,t−1

9 Ox is an object-oriented matrix programming language. For a complete description of Ox, see Doornik

(2001).

123

U. von Kalckreuth

with regime indicators ri,t−1. In order to test the validity of the transformation, regime

dummies are included as additional RHS variables. They also enter the instrument

set.

6.1.3 Level estimation

As described in Sect. 4, the level estimator is implemented by specifying an auxiliary

equation that contains a set of regime dummies as an additional RHS variable. Instruments

are first differences of lagged endogenous variables, interacted with regime

indicators, ri,t−1 yi,t−1 ri,t−2 yi,t−2 (four variables!) plus differenced indicators

for regime 1 taken from ri,t−1, ri,t−2. Simulations are performed both for the case

where a predetermined regime regime ri,t−1 enters the adjustment equation, and for

the case of a contemporaneously correlated regime ri,t governing the adjustment.

6.2 Simulation results

Tables 1 and 2 show estimates on the basis of quasi-difference transformations QD1

and QD2 (1,000 runs). The theoretical discussion has shown that the finite sample

properties of the estimators may depend on the size of the regime-specific coefficients,

notably on their difference from 1. Therefore, estimations for a whole range

of parameters are shown. The true value for α1 is set as 0.3, whereas the value for α2

ranges from 0.3 to 0.9. Larger ranges and finer steps are plotted in Figs. 1 and 2.

Table 1 and Fig. 1 display results for the simpler QD1 transformation. Although

for smaller coefficient values, the estimator performs well and yields correct estimates

with a good precision, it is less reliable if one of the regime-specific coefficients is

large. For α1 = α2 = 0.3, the mean bias is only of the order of −0.004 for both

parameters. It will be 0.0133 for ˆα2 when α2 is raised to 0.7, and for α2 = 0.9, the

finite sample bias of ˆα2 becomes a non-negligible −0.0414.10 The estimates ˆα1 also

deteriorate, although less markedly. The table also gives t-values and Sargan statistics.

The bias leads the t-tests reject the true value too often when one of the coefficients is

too high: In the extreme case of α2 = 0.9, the true value is rejected 77.9% of the times.

The same is true for the Sargan test of instrument validity: with large regime-specific

coefficients, it rejects the instruments 81.6% of the times when α2 = 0.9. One can

conclude that slow speeds of adjustment (high persistence) create a problem for QD1

estimation.

Table 2 and Fig. 2 give results for the QD2 transformation. As is expected, for large

values of regime specific adjustment coefficients the estimator performs better than

its counterpart based on QD1. In the extreme cases of α1 = 0.3 and α2 = 0.9, the bias

is still only 0.0152 and −0.0209, respectively. For smaller values of regime-specific

coefficients, there is hardly any bias at all. Sargan statistics and t-values are reliable,

except for very high values of α2.

10 Whether one considers the bias as large will also depend on the way one looks at the parameter. The

state-dependent speed of adjustment is given by 1 − αi,t−1. A bias of −0.0415 when the true value of α2

is 0.9 will, therefore, overestimate the adjustment speed by 41.5%.

123

Panel estimation of state-dependent adjustment

Table 1 Quasi-differences, QD1 transformation, 1,000 runs

Simulation # (1) (2) (3) (4)

Specification state variable underlying regimes AR(1)

True α1 0.3 0.3 0.3 0.3

True α2 0.3 0.5 0.7 0.9

α1

Mean parameter estimate 0.2930 0.2955 0.2939 0.2687

Mean bias –0.0041 –0.0045 –0.0061 –0.0313

Mean estimated std. deviation 0.0220 0.0236 0.0276 0.0351

Std. dev. parameter estimate 0.0218 0.0247 0.0298 0.0533

RMSE 0.0222 0.0251 0.0304 0.0618

Freq. rejections of true value on 5% conf. level 4.6% 6.8% 5.9% 25.7%

α2

Mean parameter estimate 0.2957 0.4938 0.6868 0.8586

Mean bias –0.0043 –0.0062 –0.0133 –0.0414

Mean estimated std. deviation 0.0194 0.0189 0.0177 0.0139

Std. dev. parameter estimate 0.0197 0.0190 0.0188 0.0203

RMSE 0.0202 0.0200 0.0230 0.0262

Freq. rejections of true value on 5% conf. level 6.0% 5.4% 12.3% 77.9%

Freq. rejection by Sargan–Hansen on 5% conf. level 8.1% 9.4% 16.4% 81.6%

Valid obs. in estimation 21,000 21,000 21,000 21,000

Notes: the table shows GMM estimates of α1 and α2 on the basis of the transformation QD1, see Proposition

1. Columns vary by parameters α1 and α2 used for generating panels according to Eq.(2). Each column

represents 1,000 repetitions of two-stage GMM estimates using an unbalanced panel of 3,000 individuals

with 10, 9 and 8 observations (1,000 individuals each). The number of valid observations is reduced

by the need to transform variables. Instruments are the levels of ri,t−2 yi,t−2 (i.e. two interaction terms)

and a constant. Estimated standard deviations are derived from reduced form estimates using the delta

method. Sargan-Hansen test is the test of overidentifying restrictions associated with Sargan (1958) and

Hansen (1982). Estimation is executed using DPD package version 1.2 on Ox version 3.30 and additional,

user-written routines

The theoretical discussion in Sect. 3 has shown that the precision of the QD2 estimator

should depend on the ratio of adjustment speeds. If both of them are high, but

of similar size, then the ratio 1 − αi,t−2 / 1 − αi,t−1 in the definition of the transformed

error term ξi,t cancels out in Eq. (11). The error term in QD1, in contrast,

depends on the absolute distance of the regime-specific coefficients from unity. To

study this issue, the simulations of QD1 and QD2 estimation are performed using a

value of α1 = 0.8 as a platform and varying over α2.The result is shown in Figs. 3 (QD1

estimation) and 4 (QD2 estimation). Here, the QD1 estimates are biased throughout

the range. The bias of ˆα2 switches from positive to negative, whereas the bias of ˆα2 is

negative throughout. In contrast, with QD2, the bias practically disappears when both

parameters are large, to be noticeable only when α1 is small.

Table 3 and Figs. 5 and 6 give results using GMM on observations transformed by

Generalised Differences. InColumns 1 and 2, the estimator is correctly used. Thememory

of the regime process is restricted—Column (1) assumes uncorrelated regimes,

and Column (2) assumes a threshold process driven by anMA(1). The minimum leads

used in transformation are 2 and 3, respectively. In both cases, the Generalised

Difference estimator performs well. The estimates are unbiased. The standard deviations

are similar to what can be obtained from the quasi-difference estimates for the

123

U. von Kalckreuth

Table 2 Quasi-differences, QD2 transformation, 1,000 runs

Simulation # (1) (2) (3) (4)

Specification state variable underlying regimes AR(1)

True α1 0.3 0.3 0.3 0.3

True α2 0.3 0.5 0.7 0.9

α1

Mean parameter estimate 0.2998 0.3006 0.3021 0.3152

Mean bias −0.0002 0.0006 0.0021 0.0152

Mean estimated std. deviation 0.0221 0.0229 0.0261 0.0418

Std. dev. parameter estimate 0.0217 0.0235 0.0270 0.0463

RMSE 0.0217 0.0235 0.0271 0.0487

Freq. rejections of true value on 5% conf. level 4.7% 5.8% 5.8% 9.5%

α2

Mean parameter estimate 0.2985 0.4982 0.6943 0.8791

Mean bias −0.0014 −0.0018 −0.0057 −0.0209

Mean estimated std. deviation 0.0195 0.0194 0.0187 0.0174

Std. dev. parameter estimate 0.0195 0.0192 0.0188 0.0170

RMSE 0.0196 0.0193 0.0197 0.0269

Freq. rejections of true value on 5% conf. level 5.9% 4.5% 5.9% 23.0%

Freq. rejection by Sargan–Hansen on 5% conf. level 5.2% 6.0% 6.0% 22.9%

Valid obs. in estimation 21,000 21,000 21,000 21,000

Notes: the table shows GMM estimates of α1 and α2 on the basis of the transformation QD2, see Proposition

1. Columns vary by parameters α1 and α2 used for generating panels according to Eq. (2). Each

column represents 1,000 repetitions of a two-stage GMM procedure iterating on pseudoregressors, using

an unbalanced panel of 3,000 individuals with 10, 9 and 8 observations (1,000 individuals each). As an

initial value, an estimate on the basis of QD1 was used. The number of valid observations is reduced by

the need to transform variables. Instruments are the levels of ri,t−2 yi,t−2 (i.e. two interaction terms) and

a constant. Estimated standard deviations are calculated as a by-product from the final Gauss–Newton iteration

step. Sargan-Hansen test is the test of overidentifying restrictions associated with Sargan (1958) and

Hansen (1982). Estimation is executed using DPD package version 1.2 on Ox version 3.30 and additional,

user-written routines

smaller of the two coefficients and actually somewhat lower for the higher coefficient.

In the case of an MA(1) regime process, standard deviations are higher, as less

observations can be used. Column (1), with a minimum lead of 2, yields an average of

15,058 valid observations per estimation. This number decreases to 11,277 in Column

(2), when a minimum lead of 3 is imposed. On the same set of simulated data, the

estimates based on quasi-differencing can use 21,000 observations each run. Figure 5

shows that the average deviation of the Generalised Difference estimator from the true

parameter value is very small when the conditions for its use are met and does not

depend systematically on the size of the adjustment coefficients. Even regime-specific

coefficients equal to or larger than 1 can be accommodated, as long as the overall

process remains stable. Columns (3) and (4) do ‘the wrong thing’. For Column (3), a

minimum lead of 2 is used on data generated with a regime process generated by an

MA(1), where a lead of ≥ 3 is warranted. Column (4) assumes an AR(1) process

driving the threshold variable: this process has infinite memory. Unsurprisingly, in

both cases, the estimator turns out to be biased. However, in spite of a strong correlation

between the shock in the regime variable and the error term, the bias is moderate.

In Column (3), only the estimates ˆα2 are biased, to a degree that is similar to the

performance of the QD2 estimator under the same (unfavourable) parameter values.

123

Panel estimation of state-dependent adjustment

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

-0.010

-0.005

Quasi-Differences 1: Bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 1 Mean bias for estimates on the basis of QD1, with α1 = 0.3 and α2 varying

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.020

-0.015

-0.010

-0.005

0.000

0.005

0.010

0.015

Quasi-Differences 2: Bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 2 Mean bias for estimates on the basis of QD2, with α1 = 0.3 and α2 varying

When, as assumed in Column (4), the regime process is driven by a process with

infinite memory, the resulting bias is larger, similar in size to the weak performance

of the QD1 estimator when one of the coefficients is large. Figure 6 shows how in this

latter case the bias depends on the alpha-parameters.

123

U. von Kalckreuth

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.05

-0.04

-0.03

-0.02

-0.01

0.00

0.01

0.02

Quasi-Differences 1: Bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 3 Mean bias for estimates on the basis of QD1, with α1 = 0.8 and α2 varying

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.0100

-0.0075

-0.0050

-0.0025

0.0000

0.0025

0.0050

0.0075

0.0100 Quasi-Differences 2: Bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 4 Mean bias for estimates on the basis of QD2, with α1 = 0.8 and α2 varying

The specification tests do not fail to detect the erroneous assumption regarding

the warranted order of differentiation. In both cases, the regime constant test rejects

the specification in 100% of the cases. As the estimated coefficients are of opposite

sign, they cannot be caused by trending target values. The regime dummies

123

Panel estimation of state-dependent adjustment

Table 3 Generalised Differences estimation, (α1, α2) = (0.3, 0.8), 1,000 runs

... using appropriate leads ... using inappropriate leads

Specification state variable (1) (2) (3) (4)

underlying regimes MA(0) MA(1) MA(1) AR(1)

lead = 2 lead = 3 lead = 2 lead = 2

α1

Mean estimate (true value 0.3) 0.2990 0.2994 0.2950 0.2767

Mean est. std. dev. 0.0215 0.0286 0.0243 0.0223

Mean bias −0.0010 −0.0006 −0.0050 −0.0232

RMSE 0.0118 0.0276 0.0239 0.0322

Freq. rejections of true value 6.4% 3.8% 5.1% 18.0%

on 5% conf. level

α2

Mean estimate (true value 0.8) 0.7978 0.7995 0.7736 0.7568

Mean est. std. dev. 0.0261 0.0304 0.0298 0.0276

Mean bias −0.0021 −0.0005 −0.0264 −0.0432

RMSE 0.0113 0.0296 0.0399 0.0518

Freq. rejections of true value 3.7% 4.5% 14.2% 34.6%

on 5% conf. level

Specification tests

G1

Mean estimate −0.0001 0.0000 −0.0765 −0.0977

Mean est. std. dev. 0.0116 0.0145 0.0114 0.0102

Freq. rejections of zero value 5.8% 5.4% 100% 100%

on 5% conf. level

G2

Mean estimate −0.0007 −0.0010 0.0822 0.0977

Mean est. std. dev. 0.0114 0.0141 0.0114 0.0102

Freq. rejection of zero value 4.9% 4.7% 100% 100%

on 5% conf. level

Freq. rejection by Sargan–Hansen 5.1% 4.8% 91.9% 23.2%

on 5% conf. level

Av. no. of valid observations 15,058 11,277 14,107 15,117

Notes: the table shows GMM estimates of α1 and α2 on the basis of Generalised Differencing, see Proposition

2. Columns vary by the stochastic specification of the regime indicator when generating the panels

according to Eq. (2) and by the lead used for transformation. Columns (1), (2), and (3) specify processes

where the memory of the regime variable is limited over time, and the state variable that underlies the regime

indicator follows an MA process. In column (4), the regime process is supposed to have infinite memory.

In all columns, α1 = 0.3 and α2 = 0.8. Each column represents 1,000 repetitions of two-stage GMM

estimates using an unbalanced panel of 3,000 individuals with 10, 9 and 8 observations (1,000 individuals

each). The number of valid observations is reduced by the need to transform variables. Instruments are the

levels of ri,t−1 yi,t−1 (i.e. two interaction terms) and a constant. G1 and G2 are regime dummy coefficients

introduced as a specification test for the correct lag length, see Subsection 3.3. Sargan-Hansen test is the test

of overidentifying restrictions associated with Sargan (1958) and Hansen (1982). Estimation is executed

using DPD package version 1.2 on Ox version 3.30 and additional, user-written routines

have ‘captured’ the regime-specific non-zero expectations of the differenced residuals

E εi,t − εi,t−2 ri,t−1,ri,t−1 = ri,t−3 for the two values that ri,t−1 can take.

The Sargan test is sensitive for the misspecification in Column (3) where the wrong

lead is used, rejecting 91.9% of the estimates. Detecting an infinite memory of the

regime variable is harder for the Sargan test: only 23.2% of estimates in Column (4)

are rejected.

123

U. von Kalckreuth

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

-0.0030

-0.0025

-0.0020

-0.0015

-0.0010

-0.0005

0.0000

0.0005 GD estimation: bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 5 Mean bias for Generalised Differences estimates, with α1 = 0.8 and α2 varying. Here regime

process uncorrelated over time, correct lead of 2

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.055

-0.050

-0.045

-0.040

-0.035

-0.030

-0.025

-0.020

-0.015

GD estimation: bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 6 Mean bias for Generalised Differences estimates, with α1 = 0.8 and α2 varying. Here regime

process unlimited memory AR(1), misspecified lead of 2

Table 4, together with Figs. 7 and 8, show simulation results for the level estimator,

both for the case of a predetermined regime and a contemporaneous regime.

In both cases, a regime process with infinite memory is assumed. In the table and

123

Panel estimation of state-dependent adjustment

Table 4 Level estimation, 1,000 runs

Simulation # (1) (2) (3) (4)

Regime indicator Predetermined Contemporaneous

State variable underlying regimes AR(1)

True α1 0.3 0.3 0.3 0.3

True α2 0.8 1.1 0.8 1.1

α1

Mean parameter estimate 0.3031 0.3004 0.3006 0.2951

Mean bias 0.0031 0.0004 0.0006 −0.0049

Mean estimated std. deviation 0.0197 0.0074 0.0255 0.0073

Std. dev. parameter estimate 0.0187 0.0078 0.0252 0.0079

RMSE 0.0190 0.0078 0.0252 0.0094

Freq. rejections of true value on 5% conf. level 4.3% 4.2% 4.7% 10.5%

α2

Mean parameter estimate 0.7987 1.1001 0.7891 1.1004

Mean bias −0.0013 0.0001 −0.0109 0.0004

Mean estimated std. deviation 0.0188 0.0013 0.0283 0.0017

Std. dev. parameter estimate 0.0191 0.0013 0.0285 0.0017

RMSE 0.0192 0.0013 0.0305 0.0017

Freq. rejections of true value on 5% conf. level 5.7% 5.2% 7.0% 5.6%

Auxiliary regime constants

G1

Mean estimate 0.6985 0.698 0.6795 0.6922

Theoretically expected 0.7 0.7 0.7 0.7

G2

Mean estimate 0.2030 −0.0984 0.2434 −0.0852

Theoretically expected 0.2 −0.1 0.2 −0.1

Freq. rejection by Sargan–Hansen on 5% conf. level 4.1% 3.0% 5.2% 4.9%

Valid obs. in estimation 24,000 24,000 24,000 24,000

Notes: the table shows GMM estimates of α1 and α2 on the basis of level estimation (see Proposition 3).

Columns vary by parameters α2 and by the stochastic specification of the regime indicator used for generating

the panels according to Eq. (2). In all cases, the regime process is supposed to have infinite memory,

following an AR(1) process. Columns (1) and (2) relate to processes where the regime variable is predetermined

in the adjustment equation, and Columns (3) and (4) relate to results for regime variables that are

contemporaneously correlated with the error term. In all columns, α1 = 0.3. While Columns (1) and (3)

specify α2 = 0.8, columns (2) and (4) show results for α2 = 1.1. Each column represents 1,000 repetitions

of two-stage GMMestimates using an unbalanced panel of 3,000 individuals with 10, 9 and 8 observations

(1,000 individuals each). Instruments are first differences of lagged endogenous variables, interacted with

the regime indicators, ri,t−1 yi,t−1 ri,t−2 yi,t−2 (i.e. four variables) plus dummies for the first regime

from ri,t−1 and ri,t−2. G1 and G2are coefficients of regime dummies introduced into the equation to capture

the regime-specific shift term in Eq. (24). Sargan–Hansen test is the test of overidentifying restrictions

associated with Sargan (1958) and Hansen (1982). Estimation is executed using DPD package version 1.2

on Ox version 3.30 and additional, user-written routines

the figures α2 varies, with a fixed value of α1 = 0.3. In the predetermined case,

there is little bias over the whole range of parameters, with the possible exception

of α2 = 1, where the bias of ˆα1 assumes a moderate value of 0.0117 (not shown in

the table). Standard deviations are similar to those that were obtained with the other

estimators. If α2 assumes a value larger than 1, then the estimates become extremely

exact.

Columns (3) and (4), as well as Fig. 8, show that the level estimator indeed successfully

copes with contemporaneous regime variables, a problem that cannot be

123

U. von Kalckreuth

solved by any of the other approaches. There is a moderate bias that peaks at 0.012

for ˆα2 when α2 = 0.9 (not shown in the table), and the standard deviations are higher

than with a predetermined regime for α2 < 1. Again, for α2 > 1, the level estimates

become very exact. In all the columns, the regime dummy is very near the theoretical

value of E 1 − αi,t μe, a term that is introduced into Eq. (24) by splitting up the firm

fixed effect into its expectation and a deviation uncorrelated with the shocks in the

other processes.

7 Conclusion and outlook

Four different ways of estimating an adjustment equation with time-varying persistence

have been presented, all within a GMM framework, albeit with a different set

of moment conditions.

Two estimation techniques rely on transforming the original equation using quasidifferences.

Both quasi-differences estimators are very precise when all the coefficients

are small. When both coefficients are large and of similar size (high persistence

throughout the regimes), the results of QD1 estimation have been shown to be unusable

in simulation, whereas the QD2 approach continues to deliver correct results. In

von Kalckreuth (2008a), the QD2 estimator is successfully employed for estimating

differential adjustment speeds for the capital stock. The most difficult parameterisation

is observed when coefficients are widely different, while one of them is large.

While affected by small sample problems, the QD2 estimator performs clearly better

in this situation. In direct comparison, the major virtue of the QD1 estimator lies in

its surprising simplicity.

The third method involves transformation using Generalised Differences, with a

lead that is long enough to overcome the memory of the εi,t -shocks in the process

driving the regime indicator. This method is applicable only when the memory of the

regime process is limited.We have seen above how to test this requirement. Although

a limited memory may be a good approximation in a number of circumstances, such

as investment under financing constraints, the requirement will not always be fulfilled.

If the conditions are met, then this method leads to a linear estimator which remains

unbiased also if some of the coefficients are in the neighbourhood of 1 or larger. The

fourth method leaves the equation untransformed, and past differences are used as

instruments. Regime dummies are employed to capture and neutralise the time-varying

non-zero expected value of the residual process. Thememory of the regime process

is irrelevant for this technique. However, one needs to assume the individual-specific

deterministic equilibrium as being unrelated to the process governing the idiosyncratic

error, changes in the forcing term xi,t , the initial deviation and the regimes. The level

estimator is very precise with regard to larger coefficients. This is not really surprising:

the use of level equations has originally been proposed to overcome the problem

of weak instruments in cases where the autoregressive parameter approaches unity.

More important is another virtue of the fourth method: the level estimator is the sole

procedure that can be used when the regime indicator is contemporaneous to the error

term in the adjustment equation.

123

Panel estimation of state-dependent adjustment

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

-0.0025

0.0000

0.0025

0.0050

0.0075

0.0100

Level estimation: bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 7 Mean bias for level estimation with predetermined regimes, with α1 = 0.3 and α2 varying

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

-0.012

-0.010

-0.008

-0.006

-0.004

-0.002

0.000

Level estimation: bias as a function of alpha2

bias alpha1 × alpha2 bias alpha2 × alpha2

Fig. 8 Mean bias for level estimation with contemporaneous regimes, with α1 = 0.3 and α2 varying

In dealing with a practical estimation problem, one first of all needs to decide

whether the assumption of contemporaneous regimes is warranted in the given situation.

If this is the case, then it is the quasi-differencing methods that impose the least

stringent conditions. They can be used and interpreted like first-difference estimations

in the standard model. Owing to the fact that the nonlinear transformation will affect

123

U. von Kalckreuth

the latent terms in QD1 stronger than in QD2, the latter is to be preferred, although

the former may be used as a starting point for specification search and as a convenient

way to generate initial values for QD2 estimation. If adjustment speeds are low and

dissimilar, then even QD2 can lead to non-trivial small sample biases. In this case, the

Generalised Differences method may be preferable, subject to a test on the requirement

of finite memory, in spite of losing many observations.

If adjustment regimes are likely to be contemporaneously correlated, then estimation

using the level approach is possible if the fixed effect is ‘benevolent’, as explained

above. In that case, the level approach will also be a useful device if the speed of adjustment

is slow. Under the same conditions, one may also use the lagged first differences

as instruments in a differenced equation. This, however, is less efficient, as observations

are lost and not all the information in the moment restriction is used.

In deciding upon the use of the moment conditions, one may ask whether they

can be meaningfully combined in estimation. In general, this will not be the case.

If regimes are contemporaneously correlated with the idiosyncratic error term, then

the moment conditions from Propositions 1 and 2 should not be used at all. If the

regime can be considered as predetermined, then one will want to avoid imposing the

additional conditions needed for the level estimator. They are more restrictive than

in the standard autoregressive case. With predetermined regimes, the moment conditions

from Proposition 1 and 2 are to be regarded as alternatives. For higher speeds

of adjustment, they are very similar, and a possible gain from augmenting QD2 by

Generalised Differences will not be worth the risk of erroneously imposing additional

constraints, whereas, for very low and dissimilar speeds, even QD2 breaks down and

Generalised Differencing should be used on its own.

Appendix A: A state-dependent ECM

Formally, the state-dependent adjustment equation considered in this article involves a

lagged dependent variable and a forcing term xi,t . However, in addition, higher-order

adjustment processes can be accommodated, by redefining states appropriately.

Consider a linear autoregressive process with distributed lags in a forcing term xi,t

and an individual specific constant μi :

A (L) yi,t = B (L) xi,t + μi + εi,t .

where A (L) and B (L) are lag polynomials. As is well known, the process can always

be written in the error correction format. If, for example, A (L) and B (L) are of order

2, then this leads to

yi,t = −φ yi,t−1 − β

xi,t−1 − μ

∗

i + γ 0

xi,t + γ 1

xi,t−1 + ω yi,t−1 + εi,t .

In the first line, the term in brackets is the deviation from the static equilibrium, where

β may be interpreted as a cumulative long-run effect of a shock in xi,t . The transformed

constant μ

∗

i is equal to [A (L)]−1 μi. The termφ is the speed of adjustment. If

the process is stable, then |φ| < 1. The second line depicts the transitional dynamics,

123

Panel estimation of state-dependent adjustment

which is not directly related to the deviation from equilibrium. With A (L) or B (L)

of order higher than 2, the transitional dynamics in the error correction format would

involve higher-order lags of differences xi,t and yi,t .

A generalisation of the adjustment process considered hitherto makes φ, γ 0, γ 1,

and ω state-dependent, while leaving the transformed constant μ

∗

i and the long-run

effect β time invariant. The latter imposes a constraint on the time-varying coefficients:

yi,t = −φi,t−1 yi,t−1 − β

xi,t−1 − μ

∗

i + γ 0

i,t−1

xi,t

+γ 1

i,t−1

xi,t−1 + ωi,t−1 yi,t−1 + εi,t .

Now let again ri,t be an indicator variable characterising the speed of adjustment. As

the adjustment process is parameterised over two lags, it is straightforward to model

the time-varying parameters as a function involving the state variables in two periods,

t −1 and t −2. Finally, let di,t−1 be an indicator vector of dummies for all the possible

values ri,t−1, ri,t−2 can take. Then we can write

φi,t−1 = ϕ

di,t−1, ωi,t−1 = ω

di,t−1, γ 0

i,t−1

=

0di,t−1, γ 1

i,t−1

=

1di,t−1,

with ϕ, ω,

0 and

1 vectors and matrices of state-dependent adjustment coefficients

remaining to be estimated. Written this way, the problem is fully equivalent to the one

that has been treated in this article, with di,t−1 taking the place of ri,t−1 with respect

to the adjustment speed, φi,t−1, and using appropriate interaction terms for all the

other state-dependent coefficients.With the help of quasi-differencing or Generalised

Differencing, one can eliminate the fixed effect from the adjustment equation. With

contemporaneous adjustment coefficients, one may use the level estimator. It has to

be noted though that—compared to a first-order adjustment process—the Generalised

Difference estimator will be difficult to use, as there are L2 states to be considered

here, and only pairs of observations belonging to the same regime with a given minimum

time distance can be used. The other two estimation principles are not affected

by this profusion of states, except for the fact that the number of coefficients is higher.

Appendix B: Proofs

Proof of proposition 1 If E εi,t

i,t−1 = 0, then any function f i,t−1 will be

orthogonal to εi,t , because

E f i,t εi,t = E i,t E f i,t−1 εi,t

i,t−1

= E i,t f i,t−1 E εi,t i,t−1 = 0. (29)

Consider first E yi,t−pψi,t , with p ≥ 2. Equation (3) and (4) show that yi,t−p is a

function of ri,t−p−1, ri,t−p−2, . . . , xi,t−p, xi,t−p−1, . . . , εi,t−p, εi,t−p−1, . . . μi , yi,0 .

The expressions 1/(1 − αi,t−1) and 1/(1 − αi,t−2) are functions of ri,t−1 and ri,t−2.

Applying (29) to the products yi,t−p/(1 − αi,t−1)εi,t and yi,t−p/(1 − αi,t−2)εi,t−1

yields E yi,t−pψi,t = 0. The same argument holds for E yi,t−pξi,t , with p ≥ 2.

123

U. von Kalckreuth

Proof of proposition 2 The proposition follows from the law of iterated expectations:

E yi,t− −p εi,t − εi,t− ri,t−1, ri,t− −1

= Eyi,t− −p E yi,t− −p εi,t − εi,t− ri,t−1, ri,t− −1, yi,t− −p

= Eyi,t− −p yi,t− −p · E εi,t − εi,t− ri,t−1, ri,t− −1, yi,t− −p = 0,

because the conditional expectation within the brackets is zero for ≥ 2 + q. The

backward solution (3) decomposes yi,t into the initial deviation, yi,0 − x

i,1β − μi ,

and the history of xi,t , ri,t and εi,t . The assumption (20) ensures that conditioning

on ri,t−1, ri,t− −1 and yi,t− −p, p ≥ 1 will preserve a zero expectation of εi,t and

εi,t− −1.

Proof of Proposition 3 The restriction (27) holds for p = k if, first,

E yi,t−kεi,t = E yi,t−k · E εi,t

yi,t−k = 0, (30)

and second,

E yi,t−k 1 − αi,t μ

∗

i = 0. (31)

Given the backward solution (23), condition (25) is sufficient for the expectation

in the bracket of (30) to be identically zero, as yi,t−k is a function of

εi,t−k, εi,t−k−1, . . . , xi,t−k, xi,t−k−1, . . . , ri,t−k , ri,t−k−1, . . . , yi,0−x

i,1β − μ .

Similarly, one has

E yi,t−k 1 − αi,t μ

∗

i = E yi,t−k 1 − αi,t · E μ

∗

i yi,t−k 1 − αi,t .

If the expectation of μ

∗

i is zero conditional on all random variables that constitute

yi,t−k according to its reduced form in (23), then the expectation in (31) vanishes.

Acknowledgments The author thanks Jörg Breitung for important discussions, encouragement and

patience. Olympia Bover made a vital comment that gave the article a new turn. Vassilis Hajivassiliou

and Sarah Rupprecht discussed earlier conference versions. Two anonymous referees made extremely helpful,

detailed and constructive comments. This article has been presented in part or fully at the 2009 Panel

Data Conference in Bonn, the 2008 Econometric Society European Meeting in Mailand, the 2007 Deutsche

Bundesbank and Banque de France Spring Conference on Microdata Analysis and Macroeconomic

Implications in Eltville, the 2007 Annual Meeting of the Verein für Socialpolitik in Munich and the 2007

CES-Ifo Conference on Survey Data in Economics—Methodology and Applications, in Munich.

References

Anderson TW,HsiaoC (1982) Formulation and estimation of dynamicmodels using panel data. J Economet

18:47–82

Arellano M, Bond S (1991) Some tests of specification for panel data: Monte Carlo evidence and an application

to employment equations. Rev Econ Stud 58:277–297

Arellano M, Bover O (1995) Another look at the instrumental variable estimation of error component

models. J Economet 68:29–51

123

Panel estimation of state-dependent adjustment

Bayer C (2006) Investment dynamics with fixed adjustment costs and capital market imperfections. J Monetary

Econ 53:1909–1947

Blundell R, Bond S (1998) Initial conditions and moment restrictions in dynamic panel data models. J

Economet 87:115–143

Bond S, LombardiD (2007) To buy or not to buy? Uncertainty, irreversibility and heterogeneous investment

dynamics in Italian company data. IMF Staff Papers 53:375–400

Bond S, Elston JA, Mairesse J, Mulkay B (2003) Financial factors and investment in Belgium, France,

Germany, and the United Kingdom: a comparison using company panel data. Rev Econ Stat 85:153–

165

Caballero RJ, Engel EMRA (1999) Explaining investment dynamics in U.S. manufacturing: a generalised

(S,s) approach. Econometrica 67:783–826

Caballero RJ, Engel EMRA (2004) A comment on the economics of labor adjustment: mind the gap: reply.

Am Econ Rev 94:1238–1244

Caballero RJ, Engel EMRA, Haltiwanger JC (1995) Plant level adjustment and aggregate investment

dynamics. Brookings Papers Econ Act 1995(2):1–39

Caballero RJ, Engel EMRA, Haltiwanger JC (1997) Aggregate employment dynamics: building from

microeconomic evidence. Am Econ Rev 87:115–137

Chamberlain G (1983) Panel data, Chap 22. In: Griliches Z, Intriligator M (eds) The handbook of econometrics,

vol II. Amsterdam, North Holland pp 1247–1318

Cooper R,Willis JL (2004) A comment on the economics of labor adjustment: mind the gap. Am Econ Rev

94:1223–1237

Davidson R, MacKinnon JG (1993) Estimation and inference in econometrics. Oxford University Press,

New York

Doornik JA (2001) Ox 3.0. An object-oriented matrix programming language, 4th edn. Timberlake Consultants,

London

Doornik JA, Arellano M, Bond S (2002) Panel data estimation using DPD for Ox documentation accompanying

the DPD for Ox module code, dated 23 Dec 2002

Hansen L (1982) Large sample properties of generalized method of moments estimators. Econometrica

50:1029–1054

Hayashi F (2000) Econometrics. Princeton University Press, Princeton

Holtz-Eakin D, Newey WK, Rosen HS (1988) Estimating vector autoregressions with panel data.

Econometrica 56:1371–1395

Judge GG, Griffith WE, Hill RC, Lütkepohl H, Lee TC (1985) The theory and practice of econometrics.

2nd edn. Wiley, New York

Sargan JD (1958) The estimation of economic relationships using instrumental variables. Econometrica

26:393–415

von Kalckreuth U (2004) Financial constraints for investors and the speed of adaptation: are innovators

special? Deutsche Bundesbank Discussion Paper Series 1, No. 20/04

von Kalckreuth U (2006) Financial constraints and capacity adjustment: evidence from a large panel of

survey data. Economica 73:691–724

von Kalckreuth U (2008a) Financing constraints, micro adjustment of capital demand and aggregate implications.

Deutsche Bundesbank Discussion Paper Series 1, No 11/08

von Kalckreuth U (2008b) Financing constraints and the adjustment dynamics of enterprises. Habilitation

thesis, University of Mannheim, May 2008

Woodford M (2003) Interest and prices. Foundations of a theory of monetary policy. Princeton University

Press, Princeton

123

Subscribe to:
Posts (Atom)