Heads up: this is our detailed methodology. Looking for a shorter overview? Check out How ratings work.
Impact ratings show you how well your portfolio supports what you care about. Here's our methodology for creating them:
1. Define metrics for each cause on Ethos
For each of the 45 causes on Ethos (read more about these here), we first define a set of metrics that gauge a company or fund's performance in that area. Causes on Ethos are loosely based on the United Nations Sustainable Development Goals.
For example, the cause Gender Equality includes metrics such as:
- Gender pay gap at a company
- Policies that support working mothers, such as parental leave and childcare
- Women mentorship programs
- Integration of gender equality into training programs
- Explicit, public efforts to address gender equality at the company
- Gender diversity on the Board of Directors
2. Gather available data for every metric
We then pull in millions of data points from companies, funds and credible third parties to score companies and funds on each metric. For example, we include ratings of companies on policies for working mothers from Working Mother. Whenever Working Mother has new data available, we update our ratings of companies included in Working Mother's data.
3. Normalize scores
We then normalize raw data to a scale of 0-100 so that we can compare companies and funds across metrics. Because raw data involves many data types with both normal and non-normal distributions, converting raw data to a 0-100 scale requires multiple strategies. For example, some data is binary (yes/no, such as whether a company had a major controversy in an area) and some data is very widely distributed (such as tons of carbon emissions). The goal of applying each strategy is always to maintain the original data's presentation of company performance as best as possible, and to aggregate data points into the most accurate view of company performance possible.
The normalization strategies we use are:
Ethos tests all normalization strategies for each dataset (metric) to assess which strategy most accurately represents the distribution of company performance for that dataset. We look at both the relationship among companies (e.g., are data clustered around a certain range) and at external, absolute gauges of company performance (e.g., is there a credible third party that says the best-performing company in something like gender pay equality should only be at a "B" or "C" level). We then use these assessments to make a decision on the best normalization strategy for each dataset.
- Binary: scores are either 100 (yes) or 0 (no). This data is typically simple yes/no metrics. Because we translate metric data to a 0-100 rating, normalizing this binary data (which does not have a normal distribution) would skew interpretation of company results. E.g., if most companies do not have a major controversy in data security and get a base score of 100, normalizing the data would move those scores closer to 50, indicating a "C" rating (when the companies performed as well as possible on this metric)
- Standard score: scores represent standard scores, or z-scores, converted to an approximate 0-100 scale. This is used when there is raw data (e.g., metric tonnes of carbon emissions) that is approximately normally distributed. We calculate the mean and standard deviation of all companies with data, then calculate a z-score for each company. Most z-scores will be in the range of +/- 3 standard deviations (~99.7% of data points if it's an approximately normal data set). To translate this to an approximate 0-100 scale we multiply each z-score by 25 (translating 1 standard deviation to a value of 25) and then add 50 to each z-score (moving the mean of all data points to 50). This means most data points will fall in the range of -25 (-3 standard deviations) to 125 (+3 standard deviations). To deal with outliers, we then winsorize, or "cap", all scores at +/- 3 standard deviations; i.e., a minimum score of -25 and a maximum score of 125. Since final ratings of companies include a weighted average of many raw metric scores (typically 20-50), individual metric scores less than 0 or greater than 100 (which are uncommon) almost never "pull" a final company rating below 0 or above 100. If they do we cap the final rating at a minimum of 0 and maximum of 100
- List distribution: scores are a distribution of data from 0-100, typically used with outside rankings that have limited data (such as "Best of" or "Worst of" lists that only include 100 or 200 companies). Ethos assumes these lists intend for included companies to be rated highly ("Best of" lists) or poorly ("Worst of" lists) relative to companies not included on the lists, so we create a distribution from 100-60 ("A" and "B" scores) for companies making a "Best of" list, or from 40-0 ("D" and "F" scores) for companies making a "Worst of" list. Companies not on the list are given a uniform score of 0 for "Best of" lists (since they were uniformly measured as not good enough to make the list) or 100 for "Worst of" lists (since they were uniformly measured as good enough to NOT make the list). Because this skews additional distribution of performance among companies not on the lists, these scores are usually given a small weight in calculating final company ratings
- Proportional scale and Box-Cox proportional scale: original data is converted to a proportional 0-100 scale, i.e., the minimum data point is set to 0, the max set to 100, and all others scaled proportionally. For example, if original data points were 0, 200, 50, and 50, the 0 data point would stay at 0, the 200 data point would be set to 100, and the 50 data points would be scaled to 25 (to match the scaling of min/max to 0 and 100). Ethos generally uses this strategy for data that is not normally distributed but still has a relationship indicating relative performance of companies. For example, if most companies score very highly on an assessment of how well they create jobs, calculating standard scores would put all those companies close to the mean and, when converted to a 0-100 scale, put their score close to the mean of 50. This would be a "C" rating, when in fact the original data may indicate they should all be closer to an "A" rating. Using a proportional 0-100 scale converts the scores of these companies closer to 100. Depending on the distribution of the original data, Ethos also uses a Box-Cox transformation in some cases to help normalize data and highlight more of the difference in company scores. This is often the case when there are many original data points close to zero or another natural limit. The Box-Cox transformation raises all original data points to the same exponent, or Lambda value (all data is transformed in the same way)
- Inherit: scores mirror the raw data, which is already distributed on a 0-100 or equivalent scale (e.g., 0-5). Companies have already been rated from 0-100, so further normalizing would skew the intended distribution of scores. In cases where the data is on an equivalent scale (e.g., 0-5), we simply multiply raw data to convert to the 0-100 scale (e.g., multiply data on a 0-5 scale by 20)
We are always open to suggestions. If you have any or if you would like more information, please reach out to firstname.lastname@example.org.
4. Combine normalized metric data to create ratings for companies and securities
We then combine normalized metric data to create a single rating for each company and fund on Ethos with sufficient available data (usually at least 60% of metrics for a particular cause) for each of the 45 causes on Ethos.
To combine the normalized data, we first determine a weight for every metric within each impact area. For example, the Gender Equality rating might give 10% weight to the gender pay gap metric, in which case 10% of a company's Gender Equality rating would be composed of its gender pay gap score. To determine an appropriate weight for each metric, we look at:
- How credible is the data behind the metric? Do we believe it accurately represents performance at the company or security for the metric?
- How reliable is the data? Is it consistently reported?
- How relevant is the data to the intended impact area? Does what it measures closely align with the impact area? For example, we believe a company's gender pay gap is more relevant to gender equality than the company's policies on telecommuting
After assigning weights to each metric, we multiply metric scores by metric weights to get the rating for a company or fund.
5. Use company and fund ratings to create ratings for your portfolio
When you sign up for Ethos, you select the causes you care about to create a custom Impact Formula (read more here). This Impact Formula combines company and fund ratings from the steps above.
Then when you link your bank accounts to Ethos, we use company and security ratings for your custom Impact Formula to create ratings for your portfolio:
- Determine all company (stock) and fund holdings in your investment accounts
- Determine the value of all your holdings in specific companies (through stocks and fund compositions); e.g., you may hold $10,000 in a fund that has 5% of its portfolio in Company A, meaning you hold $500 of Company A through that fund
- Get the ratings for all your held companies, based on your custom impact formula
- Create weighted average ratings for your holdings, accounts, and overall portfolio; i.e., multiply ratings of held companies by their weight within a holding, account or portfolio. For example, if Company A is rated 60 and Company B is rated 80, and you have an account with $500 stock in Company A and $500 stock in Company B, the account's rating would be 70 (most accounts have hundreds or thousands of held companies through large funds, but this is the idea)
Please contact us at email@example.com with questions.