Baseline regression results
As an important indicator of the balanced relationship between economic development and environmental protection in a region, the value of carbon emission intensity is influenced by both carbon emissions and GDP in each province. Among the 30 provinces in Fig. 1, Shanxi, Ningxia Hui Autonomous Region and Guizhou rank among the top three in terms of carbon emission intensity, with 14.43, 10.48 and 8.14, respectively, accounting for 10 per cent, 7.4 per cent and 5.7 per cent of the total. The reason why the carbon emission intensity of these three provinces is so high is related to their industrial structure dominated by heavy industries such as coal on the one hand, and the other hand to their relatively low level of economic development and the fact that the efficiency of energy use needs to be improved. Figure 2 shows the comparison of carbon emission intensity between 2013 and 2019. The year 2013 is chosen as the starting point for the comparison because it was the year when the Chinese government organised the National Low Carbon Day for the first time, marking the beginning of China’s incorporation of the concept of low-carbon development into its national development strategy. The year 2019 was chosen because, in December of that year, the Xin Guan epidemic began to break out, after which China began to strengthen control and entered a special period. As can be seen from the comparison chart, except for Inner Mongolia, Ningxia, Shanxi, Liaoning and Heilongjiang, which are predominantly heavy industries such as coal and are sparsely populated and relatively backward in terms of economic development, the carbon emission intensity of all other provinces has shown a significant decline. This shows that, driven by the National Low Carbon Day, more and more provinces are beginning to pay attention to low-carbon development and are actively taking measures to reduce their carbon emission intensity.
The difference in the number of green inventions is obviously inseparable from the development of each province. In delving deeper into this phenomenon, it is not difficult to find that it is closely related to a number of factors, such as the level of economic development, the strength of scientific and technological innovation, the strength of policy support, and the awareness of environmental protection in each place. Among the 30 provinces shown in Fig. 3, Jiangsu Province, Guangdong Province and Beijing Municipality top the list of green inventions, with 6,869, 6,245 and 6,066 inventions respectively, accounting for 14%, 13% and 12% of the overall number. This data not only reflects the excellent performance of these three provinces and cities in green inventions, not only highlights their leading position in the field of green science and technology innovation, but also provides strong support for China to achieve green development and build an ecological civilisation, as well as reduces the intensity of carbon emissions and achieves the coordinated development of the economy and the environment.
After a detailed analysis of the underlying data through the core variables, a more in-depth data analysis was carried out through the constructed baseline model. The analysis commences with a variance inflation factor (VIF) test aimed at confirming the absence of multicollinearity among the variables. Table 3 presents the outcomes of the VIF test, revealing that all variables exhibit VIF values considerably below 10, indicating the absence of multicollinearity among them. Subsequently, Table 4 showcases the correlation coefficients among the observed variables. Thus, the correlation coefficient matrix underscores a significant correlation existing between the core explanatory variables and other explanatory variables, thereby bolstering the regression analysis in the empirical investigation.
Table 5 presents the estimated results regarding the influence of green innovation on carbon emission intensity. Column (1) showcases univariate regressions incorporating province-fixed effects and year-fixed effects, while Columns (2) and (3) isolate each fixed effect separately. Likewise, Column (4) presents the regression without any fixed effects. Notably, the coefficient of green innovation consistently exhibits a significantly negative value across all variations of these fixed effects. This consistent finding underscores the resilience of green innovation in mitigating carbon emission intensity, thereby providing initial support for Hypothesis 1. Subsequently, control variables are introduced into the aforementioned regressions for comparative analysis.
Table 6 illustrates the results of the baseline regression incorporating control variables. Column (1) displays regression outcomes without control variables, while Column (2) presents results with control variables. In Column (1), green innovation manifests a notable inhibitory effect on carbon emission intensity, with a direct impact coefficient of -0.080. This indicates that for every 1% increase in the green innovation indicator, carbon emission intensity diminishes by 0.08%. Despite the introduction of control variables in Column (2), the coefficient of green innovation remains negative, affirming its role in contributing to carbon reduction. This finding reaffirms Hypothesis 1, underscoring the efficacy of green innovation in dampening carbon emission intensity. Firstly, through the research, development and promotion of clean energy technologies, such as solar energy and wind energy, green innovation can effectively reduce the consumption of fossil fuels, thereby lowering the emission of greenhouse gases such as carbon dioxide. Secondly, green innovation also focuses on energy saving and emission reduction in the production process, reducing carbon emissions per unit of output by optimising production processes and improving energy efficiency. In addition, green innovation advocates green consumption and a circular economy, encouraging consumers to choose environmentally friendly products and services, promoting the recycling of resources and further reducing the intensity of carbon emissions.
Robustness testing
Measurement errors
Given the potential influence of measurement errors on regression outcomes, a thorough measurement error analysis is imperative. In this study, such analysis entails substituting the indicators of both independent and dependent variables. Specifically, the indicator for the independent variable, green innovation, is altered from the number of green invention applications plus one and then logged to the number of green inventions obtained plus one and then logged. Similarly, the indicator for the dependent variable, carbon intensity, is substituted with total carbon dioxide emissions. Subsequent regression analysis utilising these modified variables to further scrutinise the impact of green innovation on carbon emissions, with the results presented in Table 7. Remarkably, the coefficient in the regression of green innovation on carbon emissions remains significantly negative, indicating the robustness of the baseline regression even after altering the variable measurement. Notably, the consistent and significant sign of the explanatory variable’s coefficient post-measurement error analysis reaffirms the argument’s robustness, validating the assertion that green innovation effectively curtails carbon emissions.
Endogenous problem
The issue of endogeneity poses a potential challenge to the study’s findings. In large sample sizes, endogeneity may lead to parameter estimations that closely approximate true parameters, thereby undermining the interpretability of results. Several factors, such as measurement error, omitted variables, and reverse causality between independent and dependent variables, can contribute to endogeneity. For instance, consider a scenario where government mandates necessitate increased green innovation in response to rising carbon emissions within a region annually. To mitigate this concern, the study employs an instrumental variables approach, utilising green innovation with a one-period lag as the instrumental variable in a two stage least square (2SLS) regression. Table 8 presents the outcomes after employing instrumental variables to address endogeneity.
Table 8 illustrates that in the first stage of the 2SLS regression, the regression coefficient of the instrumental variable on the endogenous variable is significantly positive, indicating the endogeneity of the instrumental variable. Typically, there exists no correlation between the previous period of the endogenous variable and the error term of the current period, ensuring the homogeneity of the instrumental variable. Furthermore, both under-identification and weak instrumental variable tests were conducted on the instrumental variables, yielding significant results, thereby affirming the reasonableness and validity of the instrumental variables. Subsequently, the second stage regression reveals that the coefficient of green innovation remains significantly negative, reinforcing the argument’s robustness that green innovation effectively curbs carbon emission intensity.
Empirical testing of the mediating effect
This study employs a step-by-step regression to examine the significance of the coefficients on the large dataset as mediating variables sequentially. Comprising three steps in total, this testing approach elucidates the mechanism of effect, as depicted in Figs. 4 and 5 below. Initially, the regression of the independent variable on the dependent variable assesses the significance of the direct effect regression coefficient \(c\). Subsequently, the regression of the independent variable on the mediating variable gauges the significance of the regression coefficient \(a\). To conclude, the regression of both the independent and mediating variables on the dependent variable evaluates the significance of the regression coefficients \(c^{\prime}\) and \(b\).
Figure 5 illustrates the indirect effects of the mediating effect. The stepwise regression outcomes for the mediating effect are presented in Table 9, with Column (1) indicating the regression result for the direct effect of the mediating effect, Column (2) representing the regression result for the explanatory variable on the mediating variable, and Column (3) displaying the regression result for the indirect effect of this effect. The table reveals that the estimator \(c\) is − 0.106, the estimator \(a\) is 0.177, the estimator \(b\) is − 0.075, and the estimator \(c^{\prime}\) is − 0.092, with all these estimators significant at the 1% confidence interval level. Their effects conform to the equation:
$$ c = c^{\prime} + a \times b $$
Furthermore, the magnitude of the mediating effect is measured by the mathematical equation: \(a \times b\). This specifies the mediating mechanism of big data as depicted in Fig. 6, where green innovation directly weakens carbon intensity, while indirectly mitigates it by fostering the development of big data. This outcome effectively validates the mediating role of big data development posited in Hypothesis 2. The big data platform can facilitate the sharing and exchange of information related to green innovation and carbon emissions, promote cooperation and synergy among different fields and industries, and jointly promote green development and low-carbon transformation. Simultaneously, the supportive role of green innovation in big data development as proposed in Hypothesis 3 is also confirmed. Green innovation can promote the integration of big data with other green technologies, thereby facilitating the application of big data technologies in the environmental field.