II.ii. What Do We Mean by Upgrading? Measurement Issues

The word upgrading is used in a variety of ways. This section aims to clarify, conceptually and empirically, how the term has been used and to highlight the strengths and weaknesses of existing empirical measures.


2.2 Measurement Issues

We now turn to the question of how to measure upgrading. There is an important tension here. On one hand, the most common empirical measure, total factor productivity (TFP) in various forms, is conceptually attractive in that it is aimed directly at measuring firm capabilities, and improvements in capabilities in theory bear an unambiguously positive relationship to firm performance. But TFP measures also suffer from a number of well-known potential biases.[1] On the other hand, direct indicators of product quality, product innovation, and technology adoption are increasingly available, and are arguably more credible measures of the dimensions of upgrading they seek to capture. But it is not always obvious what constitutes an “improvement” on these dimensions, and the indicators are typically only available in particular sectors, raising questions about external validity. This subsection considers the strengths and weaknesses of the different measures that have been employed in the literature.

2.2.1 Measures of Productivity

The standard approach to TFP estimation begins by positing the existence of a firm-level production function, most commonly Cobb-Douglas, for instance:

where yi is log output, typically sales deflated by a sector-level output price deflator, ki is log capital, i is log labor, (employment or hours), mi is log materials, typically expenditures deflated by a sector-level input price deflator, ωi is “ex ante” productivity, which the firm knows before choosing the variable inputs  and m, and εi is an “ex post” shock, realized after the firm had made its input decisions.[2] The coefficients βk, β, and βm are then estimated by one of several methods (discussed briefly below), and TFP is estimated as: TFP[hat]iyiβ[hat]kkiβ[hat]iβ[hat]mmi.

One under-appreciated issue with this approach is that if the firm is actually a collection of production lines, as in the framework above, then it is not obvious that there exists an “aggregate” production function that fully summarizes the relationship between inputs and outputs at the firm level. Under certain conditions, production-line-level production functions such as the Fijk(·) in equation (1) aggregate into a firm-level function such as equation (3).[3] This finding is analogous to earlier results on the aggregation of firm-level production functions to a macro-level production function, going back to Houthakker (1955). But the assumptions required in the earlier literature have been criticized as special and unlikely to hold in practice (Felipe and Fisher, 2003), and a similar point could be made about the aggregation from the firm-product-technique level to the firm level. A main defense of standard aggregate production functions has been that they seem to work pretty well, in that they provide a reasonable fit between aggregate inputs and aggregate output and the estimated factor elasticities of output are consistent with observed factor shares (Fisher, 1971; Fisher et al., 1977), and a similar defense could be made for firm-level production functions such as equation (3). But given the shaky microfoundations, caution is warranted in interpreting them. The caveat of Mairesse and Griliches (1988) still seems apt: “[T]he simple production function model ... is at best just an approximation to a much more complex and changing reality at the firm, product, and factory floor level” (p. 28).

Much of the recent literature on production-function estimation has been concerned with a different problem, the “transmission bias” recognized by Marschak and Andrews (1944): in the context of equation (3), if a firm observes that it has a high ex ante productivity, then it may choose to use more labor and materials, generating a correlation between ωi and i and mi and biasing OLS estimates. The most common way to address this issue is to construct an observable proxy for the ex ante productivity term, using either investment (Olley and Pakes, 1996) or materials (Levinsohn and Petrin, 2003; Ackerberg et al., 2015).[4] These approaches have recently been criticized by Gandhi et al. (forthcoming), who argue that the Olley-Pakes and Levinsohn-Petrin estimators are not non-parametrically identified; they propose using the first-order condition for the choice of materials as an additional source of identification.[5] It is also important to note that the monotonicity assumption required for standard proxy-variable methods is strong; in the Olley-Pakes version, for instance, heterogeneity across firms in the extent to which they are credit constrained or face adjustment costs of capital would violate the required assumption (Griliches and Mairesse, 1998; Ackerberg et al., 2015).

A separate issue arises because it is rare to observe physical quantities of outputs or inputs. It is common is to use sector-level output and input price deflators to deflate firm-level revenues (or valueadded) and input expenditures. But as De Loecker and Goldberg (2014) point out, this can give rise to potentially severe biases, if idiosyncratic factors that affect output or input prices are correlated with a firm’s input choices, as in general one would expect them to be.[6] Datasets with physical quantities at the firm-product level are increasingly available, and in sectors with homogeneous products the quantity information can help to address these biases. In US data, Foster et al. (2008) focus on 11 arguably homogeneous products and estimate a function with physical output on the left-hand side, to yield what they call TFPQ (Q for quantity). They contrast it with a measure of TFP estimated with revenues on the left-hand side, TFPR (R for revenues). Although the US data do not contain physical quantities of inputs, such information is available in a few other countries (Chile, Colombia, Ecuador, Peru, Portugal, and Spain, among others), and one could in principle include physical inputs on the right-hand side to solve the input-price bias problem.

But it is important to be aware that quantity-based TFP measures are likely to be a misleading indicator of firm capability in the presence of quality differences in either inputs or outputs (Katayama et al., 2009; Grieco and McDevitt, 2016). Intuitively, a firm may take advantage of increased capability to raise quality rather than simply to increase physical output, leading quantity-based TFP to understate the true capability change. Differences in input quality may generate an offsetting bias, if firms are able to create more units of output out of higher-quality inputs. In the Melitz (2003)-type theoretical model of Kugler and Verhoogen (2012), these effects can arise under certain parameter values.[7] This is not just a theoretical curiosity. In an experiment discussed at greater length below, Atkin et al. (2017a, 2019) randomly allocated export contacts to Egyptian rug producers. They find that the producers increased exports, quality (which they measure directly), and profits, as might be expected, but decreased square meters of rug woven per hour and TFPQ. In laboratory conditions, sewing identical rugs, the treated weavers were no slower than the non-treated weavers and they sewed higher-quality rugs. In this setting, it seems clear that TFPQ is misleading as a measure of firm performance.[8] Although it may only be in extreme cases that measured TFPQ is negatively affected by increases in firm capability, we would expect quality changes to drive a wedge between TFPQ and capability - the theoretical concept one would like to measure - in a wide variety of circumstances.[9] Quality bias of this sort is likely to be particularly salient in developing countries as firms enter world markets, because of the large differences in incomes between domestic and rich-country consumers.

A natural response to the issues of quality bias is to revert to using revenues on the left-hand side and expenditures on the right-hand side. Using price times quantity, rather than just quantity, should take into account quality differences, since they are presumably reflected in prices. But prices also reflect things other than quality, in particular markups. In imperfectly competitive industries, TFPR is a measure both of technical efficiency - the ability to transform physical inputs into physical outputs - and of the ability to sell at a price above marginal cost (De Loecker and Goldberg, 2014). It may well be the best measure of firm performance available for quality-differentiated industries, but one should not interpret it solely as a measure of technical efficiency. One way to address this issue is to estimate markups directly to separate them from marginal costs (which reflect technical efficiency); we return to this issue below.

When estimating productivity with the new data on physical quantities, one must also decide whether and how to aggregate across products in multi-product firms. Even datasets with productlevel information typically do not report which inputs are used to produce which outputs.[10] One approach is to focus on single-product firms and possibly to do a selection correction for the fact that they are not representative (Foster et al., 2008; De Loecker et al., 2016; Balat et al., 2018). Another is to impose theoretical structure on the demand side and to use the model to infer how firms would allocate inputs to outputs if they were behaving optimally (Orr, 2018; Valmari, 2016). The literature has not yet converged on a consensus approach to this issue.

In sum, although TFP measures have the attractive property that they aim directly at estimating firm capabilities, existing estimation methods suffer from a number of well-known difficulties and may reflect a number of other factors besides capabilities - notably markups in the case of TFPR and endogenous quality choices in the case of TFPQ. We will see below that results for TFP outcomes are often mixed. This may in part be due to a confounding of effects on firm capabilities with effects on markups or quality choices.

2.2.2 Measures of Quality

Direct measures of quality are not available in standard firm-level datasets and are typically quite difficult to come by. But a few studies have had access to direct information on firm-level quality choices. Several recent papers have used quality ratings (or prizes at tasting competitions) for wines, in France (Crozet et al., 2012), Chile (Macchiavello, 2010), and Argentina (Chen and Juvenal, 2016, 2018, 2019). Studies have taken advantage of direct information on quality of Egyptian rugs (Atkin et al., 2017a), sweetness of watermelons (Bai, 2018), contamination of dairy products (Bai et al., 2017), automobile defects (Bai et al., 2019), the protein content of fishmeal (Hansman et al., forthcoming), and coffee bean characteristics such as size and defect rates (Macchiavello and Miquel-Florensa, 2018, 2019).[11] Verhoogen (2008) proxies for quality using ISO 9000 certification, an international production standard. Accessing more direct measures of quality to examine firm-level quality choices is a promising direction for research.

An alternative approach is to construct measures of quality from information on prices and quantities, which requires theoretical structure. Khandelwal et al. (2013) show how this can be done in trade-transactions data on Chinese textile and clothing firms. In a Melitz (2003)-type model where a representative consumer has CES preferences and values product quality, the product-level demand functions facing a firm can be written as: lnYijt = −σlnPijt + αj + αt + εijt, where Yijt is product quantity, Pijt is price, σ is the elasticity of substitution between products, αj and αt are product and year fixed effects, respectively, and εijt equals quality times σ −1.[12] The authors set σ = 4, the median elasticity of substitution for clothing and textile products from Broda et al. (2006), and rewrite the expression as lnYijt + σlnPijt = αj + αt + εijt. They run this regression, recover the residual ε[hat]ijt, and interpret ε[hat]ijt/(σ−1) as a measure of quality at the firm-product level. The intuition is the same as discussed in Section 2.1 above: conditional on price, higher quality products have higher market share and hence a higher ε[hat]ijt. This method is akin to methods to recover quality at a more aggregate level by Hummels and Klenow (2005), Khandelwal (2010), Hallak and Schott (2011), and Feenstra and Romalis (2014), among others. Variations have been used by Bas and Strauss-Kahn (2015), Fan et al. (2015, 2018), Stiebale and Vencappa (2018), and Bas and Paunov (2019).

While the Khandelwal et al. (2013) method has proven useful, it requires several non-innocuous assumptions, both in the specification of demand and in the estimation of σ carried out by Broda et al. (2006). An alternative approach uses reduced-form relationships between prices and other observables to argue indirectly that quality differences appear to be playing an important role, without imposing the functional form assumptions required to construct explicit measures of quality. Kugler and Verhoogen (2012) take advantage of rich data from the Colombian manufacturing census on output and input prices to document several facts. First, on average within narrow product categories, larger plants charge higher prices for their outputs. Second, larger plants also pay more for their material inputs - a fact that generalizes the well-known finding in labor markets that larger firms tend to pay higher wages (Brown and Medoff, 1989). Third, the output price-plant size and input price-plant size correlations are more positive in sectors with greater scope for quality differentiation, where, following Sutton (1998), the scope for quality differentiation is proxied by R&D and advertising expenditures. The empirical patterns are difficult to reconcile with models that do not accord an important role to quality differences and suggest that producing high-quality outputs requires high-quality inputs, a hypothesis that has been corroborated by other studies discussed below.

An important caveat is that one should be cautious about interpreting high prices alone as indicators of quality, even if they are correlated with high input prices. Firms may face positive input cost shocks, and they may pass those on to consumers in the form of high prices. But in the absence of quality differences, we would expect such high-cost firms to have smaller market shares than low-cost firms. This underlines the need to examine sales (or other indicators of firm size) in addition to prices before drawing strong conclusions about quality.

2.2.3 Measures of Technology Use

Direct information on technologies used by manufacturing firms is also often difficult to obtain. Standard firm-level datasets do not contain it, and firms are often reluctant to speak about specific technologies, for fear of revealing proprietary information to competitors. The technology-adoption literature has tended to focus on agriculture, where information on technology use is more readily available (Foster and Rosenzweig, 2010). In developed countries, there have been a number of studies of technology adoption across reasonably large sets of manufacturing firms, for instance the “insider econometrics” studies reviewed by Ichniowski and Shaw (2013), and studies of adoption of energy-efficient technologies reviewed by Allcott and Greenstone (2012). In developing countries, studies employing direct measures of technology use by manufacturing firms have been scarcer, but include the recent papers on Pakistani soccer-ball producers by Atkin et al. (2017b) and on Ghanaian garment producers by Hardy and McCasland (2016), which we discuss in a later section. The World Bank is currently engaged in a series of surveys of technology use in developing countries, which are likely to stimulate increasing work in the area. One challenge in this line of research is that machines and other physical technologies are often specific to particular sectors and can only be captured by detailed, tailored surveys. Also, as noted above, it is often unclear the extent to which one technology can be considered “better” than another. But measures of technology use, when available, have the great advantage that they are informative even in the absence of strong functional-form assumptions.

As discussed above, we can think of management practices as a form of technology. The measurement of management practices has been advancing rapidly, following the influential work of Bloom and Van Reenen (2007, 2010). The World Management Survey (WMS) was first implemented in the US and Europe but has now been extended to 35 countries, including low-income countries such as Ethiopia and Mozambique (Bloom et al., 2014). Using open-ended questions on monitoring, production targets, and incentives, posed by skilled interviewers, the survey has constructed management scores that have proven to be robustly correlated with a variety of independent measures of firm performance. Information on management practices has also been collected using “closed-ended” (i.e. multiple-choice) questions in the Management and Organizational Practices Survey conducted by the US Census and in similar surveys in Mexico, Pakistan, and other countries (Bloom et al., 2016b, 2019).[13] An important advantage of focusing on management practices as a form of technology use is that similar practices are applicable across a wide range of contexts. It has been possible to construct consistently measured management scores across a range of countries and sectors, and this in part explains the substantial impact of this research agenda on several fields.

There is a debate in this literature about whether particular practices can be considered better than others in some absolute or context-independent sense. On one hand, there is a long tradition in management research, often referred to as the “horizontal” (or “design” or “contingency”) view, that sees the best management practices as contingent on many features of a firm’s environment (e.g. Woodward (1958)). On the other hand, the key proponents of this literature argue for a “vertical” view that some practices are better than others across settings (see e.g. Van Reenen (2011) and Bloom et al. (2014)).[14] This is ultimately an empirical question, one that in my view is not yet resolved. As with other technologies, one should not infer from the mere fact that more-successful firms use a particular practice that all firms should adopt it. Firms may lack the know-how to implement the practice effectively, or may face different output market or input market conditions than those who use the practice successfully. It seems likely that some firms are making mistakes by not adopting some higher-scoring practices (e.g. tracking inventories). But for other practices (e.g. performance pay) the situation is less clear-cut. It seems important to consider carefully firms’ capabilities and the settings in which they operate before concluding that a particular practice is better than another.

2.2.4 Measures of Product Innovation

The most common measures of innovation-related activities in developed countries are patents and R&D expenditures. But as discussed above, most innovation-related activities in developing countries are directed towards catching up to the world frontier, not extending it, and such efforts are typically not reflected in patents or R&D (although there have been a few studies, some of which are reviewed below). An arguably more informative approach for developing countries is to focus on the range of products produced by a given firm. This is increasingly feasible as firm-product-level datasets become more widely available. As data at the firm-product level become increasingly available, it is becoming possible to observe product innovation directly, as additions to the set of products produced by a firm (see e.g. Goldberg et al. (2010), Bas and Paunov (2019).) Access to barcode-level product data, linkable to firms, is expanding rapidly in developed countries (e.g. Faber and Fally (2017)) and developing countries (e.g. Atkin et al. (2018)), and incorporating this rich new information would be a promising direction for research.

2.2.5 Discussion

There are costs and benefits to each of the measures of upgrading we have considered. TFP measures aim most directly at estimating a firm’s capabilities, Λit, which in theory are unambiguously related to technical efficiency and firm performance. But the difficulties in TFP estimation are many, and, perhaps as a consequence, results with TFP as an outcome have been mixed. The other indicators we have considered are often available only in specific settings and need to be interpreted with caution, since it is not obvious that increases in them are optimal for firms or beneficial for growth, but they typically require fewer auxiliary assumptions. It seems clear that the literature should continue to consider various measures of upgrading, and that we should have the most confidence in patterns that show up consistently across measures. But beyond that, my sense is that the most compelling recent studies are those that have focused on directly observable measures, and that expanding the settings in which such information is available is a promising avenue for research.


[1] Many of the issues raised below are discussed in more detail in previous reviews by Bartelsman and Doms (2000), Katayama et al. (2009), Ackerberg et al. (2007), and De Loecker and Goldberg (2014).

[2] This is a “gross output” production function; an alternative is to estimate a “value-added” production function; for advantages and disadvantages, see Ackerberg et al. (2015) and Gandhi et al. (forthcoming, 2017).

[3] For instance, Jones (2005) considers an environment in which a firm produces a single product and chooses over Leontief techniques, where the Leontief coefficients are drawn from independent Pareto distributions. As the set of techniques over which the firm chooses becomes large, the maximum output for a given set of factor choices can be expressed as a Cobb-Douglas function similar to equation (3). Subsequent research has derived similar results in this spirit, with specific assumptions on functional forms and distributions of technique draws (Growiec, 2008a,b; Boehm and Oberfield, 2018).

[4] Intuitively, in the Olley and Pakes (1996) case, in the context of a value-added production function, if investment is a function of productivity and existing capital stock, ιi = ι(ωi, ki), and ωi is a scalar and strictly monotonically related to ιi then this function can be inverted, and the productivity term can be expressed as a function of investment and capital: ωi = h(ιi, ki). A flexible polynomial in ιi and ki can then serve as a proxy for ωi in an equation similar to equation (3). Levinsohn and Petrin (2003) propose a similar approach for materials. Ackerberg et al. (2015) also invert a materials-demand equation, but (in contrast to Levinsohn and Petrin (2003)) one that conditions on labor inputs.

[5] Gandhi et al. (2017) note that their criticism in Gandhi et al. (forthcoming) does not apply in a setting where a linear function of materials is a perfect complement to other inputs in producing output; this setting yields the value-added specification employed by Ackerberg et al. (2015).

[6] For instance, OLS estimates will be biased if a firm faces idiosyncratically high input prices and spends less on inputs as a result (De Loecker and Goldberg (2014) call this “input-price bias”) or faces idiosyncratically high output prices and spends more on inputs as a result (“output-price bias”).

[7] In Kugler and Verhoogen (2012), higher productivity leads to lower input requirements conditional on product quality but also leads firms to produce higher-quality goods, which carry a higher price. Whether physical units of output increase or decrease with firm capability depends on the elasticity of demand faced by the firm, the extent to which capability reduces unit costs conditional on quality, and the scope for quality differentiation in the industry. (In that model, physical output as a function of capability can be readily calculated by dividing revenues by output price (equations (9d) and (9c), respectively, in Kugler and Verhoogen (2012).)

[8] In another illustration, De Loecker et al. (2016) pursue a more model-based approach to estimating production function parameters, allowing for quality differences on both the input and output sides. They find plausible estimates when they control for quality differences, but nonsensical estimates when they do not (Table V). See the further discussion in Sections and 3.2.1 below.

[9] Another example, of kidney dialysis centers in the US, is provided by Grieco and McDevitt (2016).

[10] The two exceptions I am aware of for a large number of firms are the dataset on the Bangladeshi garment sector used by Cajal Grossi et al. (2019) and the dataset on Chinese steel firms used by Brandt et al. (2018).

[11] Sutton (2000, 2004) conducts detailed quality-benchmarking studies in Indian machine-tool and Chinese and Indian autoparts producers. In an important early contribution, (Goldberg and Verboven, 2001) use detailed data on product attributes in the European car market to control for quality differences.

[12] Khandelwal et al. (2013) observe prices and quantities separately by export destination in Chinese customs data, and include a destination-year fixed effect.

[13] Relatedly, McKenzie and Woodruff (2017) review findings from seven countries using a battery of questions designed for smaller developing-country firms.

[14] For example, Bloom et al. (2014, p. 852) write, “The focus of the WMS questions is on practices that are likely to be associated with delivering existing goods or services more efficiently. We think there is some consensus over better or worse practices in this regard.”

Previous Chapter II.i. What Do We Mean by Upgrading? A Simple Framework
Back to top
Next Chapter III.i. Drivers of Upgrading: Output-Side Drivers