Ed. note: Today’s guest blog comes from Doug Parker and Paul Stey, PhD, co-founders of Ecolumix, a data company that provides verified environmental performance information to the ESG and EHS sectors. Doug will be a panelist in our 1st Annual Practical ESG Conference in October.
As the scrutiny of ESG investing intensifies and calls for increased transparency and accountability in this space continue to accelerate, significant questions persist about the quality of the data upon which fund managers and analysts rely. The primary reason for this skepticism is that most ESG data overwhelmingly relies on voluntary, unverified corporate disclosures: not exactly the gold standard for information assurance. And as regulatory mandates for ESG-disclosures emerge, stakeholders now realize they need access to reliable data to ensure they can adequately assess and manage ESG performance and risk. (Ed. note – this was discussed in last week’s SEC Investor Advisory Committee meeting).
As a result, some have turned to Artificial Intelligence (AI) and Machine Learning (ML) as silver bullets to separate fact from fiction in the ESG sector. Investors and analysts want credible insights now. AI and ML models are incredibly powerful and can be used to solve an enormous range of problems, so the appeal is understandable.
But in the ESG context, one of AI’s main applications is to use natural language processing (NLP) to digest and interpret declarations made by companies about their sustainability efforts. Earnings call transcripts, sustainability reports, speeches made at conferences and other external information can be analyzed programmatically with AI to try to judge a company’s commitment to ESG. But there are significant limitations to this automated ESG analysis. Just as unverified self-disclosures are not the answer, AI and ML – primarily built upon analysis of that same information – also does not address this data dilemma.
Applying ML and AI to the ESG Sector
In general, AI and ML models perform effectively on well-articulated questions with extensive sets of labeled training data. A question is “well-articulated” when statistical or ML models with quantitative inputs and outputs can be used to answer it, and we can evaluate those models for accuracy.
In addition to having well-articulated questions as a prerequisite, AI and ML models are also extremely dependent on the data used to train them. The old “garbage in, garbage out” adage definitely applies to AI and ML models. When unverified data is the foundation of the model, you likely won’t end up with a meaningful result that gets investors and analysts out of the “unreliable data box” they are seeking to escape.
So, what would an alternative look like? Well, the reality is that it is exceedingly difficult to provide a reliable and objective answer to questions such as “What is the environmental impact of The XYZ Company?” But let’s outline what it might look like if we had reliable, verified data to answer it.
- First, we want to know about every gram of emissions and releases associated with The XYZ Company. This data should come from a dependable, verified source. We wouldn’t simply want to take the company’s word for it in an annual feel-good letter to shareholders.
- Second, we need context about the extent to which the emissions and releases are toxic to the environment or human health. A kilogram of oil-contaminated topsoil has lower toxicity than a kilogram of vinyl chloride. Having the expertise to discern the risk contextually and comparatively is essential.
- Third, we have to identify if and how extensively The XYZ Company has violated any existing environmental or safety regulations in the corresponding jurisdiction using official regulatory agency data to source this information.
- Finally, we must understand how many people are potentially at risk due to their proximity to The XYZ Company. Factors to consider in this analysis might include the impact of either a catastrophe or sustained, long-term emissions and the extent of that impact. Would it be limited to a small geographic area—perhaps in a mostly unpopulated, remote part of the world? Or would the effect of a catastrophe be far-reaching and harm hundreds or thousands of the most vulnerable?
A sustainability report or AI-generated analysis of such data won’t address the questions above. But the good news is that the application of verified data can. The methodology we apply at Ecolumix is to evaluate companies’ performance by using risk-weighted analysis of mandated data – the information that companies are legally required to submit because of existing environmental or worker safety regulations.
Verified Data is Critical for Accurate Analysis
As a foundational principle, reliable ESG data must be subject to third-party verification. This data can be mined from government-mandated reporting that tracks pollution, greenhouse gas emissions, hazardous waste management, toxic releases, and enforcement actions, to name just a few. Incorporating verified data points like these can be the catalyst that pushes meaningful ESG-consciousness in the financial markets. However “strong” government-verified data can be, it is also incredibly confusing and overwhelming in its raw form. Therefore, complete and accurate ESG risk analysis requires experts who understand what the data means to put it into proper context.
ESG data companies that understand these government data sets can present this vast wealth of information transparently, in an easily accessible, digestible fashion that allows users to compare apples to apples. This process cannot yet be automated. Human intervention and intelligence are needed to meaningfully extract, navigate, and organize the data to ensure ESG performance is accurately measured and explained. Rolling out massive amounts of mandated, verified data to institutional and individual investors, in an easy-to-understand format, contextualized by experts, is the actual “silver bullet” that can transform the credibility of ESG data analysis, risk management, and the future of ESG investing.