Ratings methodology

Heads up: this is our detailed  methodology. Looking for a shorter overview? Check out How ratings work.


Impact ratings show how well a brand, company, or investment supports a cause, or a combination of causes you select. Here's our methodology for creating ratings:

1. Define metrics for each cause on Ethos

For each of the 45 causes on Ethos, we define a set of metrics that gauge a company or fund's performance in that area. Causes on Ethos are loosely based on the United Nations Sustainable Development Goals.

For example, the cause Gender Equality includes metrics such as:

  • Gender pay gap at a company
  • Policies that support working mothers, such as parental leave and childcare
  • Women mentorship programs 
  • Integration of gender equality into training programs
  • Explicit, public efforts to address gender equality at the company
  • Gender diversity on the Board of Directors

Ethos currently uses ~190 metrics across all causes.

2. Gather raw data for every metric

We then aggregate data points for each metric. Data comes from companies (including SEC filings and annual reports), government agencies, and independent third parties. For example, the metric “Leaders in supporting working mothers” assesses company policies for working mothers, using data from Working Mother. When Working Mother has new data available, we update our raw data for the metric. 

3. Calculate z-score, or standard score

Once we have raw data for a metric, we calculate a z-score, or standard score, for each raw data point. Z-score is a measure of how many standard deviations a number is above or below the mean. Raw scores above the mean have positive z-scores, while those below the mean have negative z-scores.

Depending on the metric, we calculate mean and standard deviation for a company’s peer group or for all companies on Ethos. If metric data varies a great deal among industries (e.g., carbon emissions), we use peer group as a more appropriate population (so an airline’s carbon emissions are measured against other airlines rather than all companies, for example). If the metric is not industry-specific (e.g., percent of women on the board of directors), we use all companies as the population. 

If some industries have a larger impact on a metric (e.g., the Transportation industry for carbon emissions), we apply an industry “materiality factor” to the z-score of companies in that industry. For example, the Transportation industry might have a materiality factor of 2 for carbon emissions metrics, meaning z-scores of transportation companies are multiplied by 2. This increases their normalized score relative to other companies if they scored above average, or decreases their normalized score if they performed below average. 

This “materiality factor” is used to give greater “weight” to companies that are in high-impact industries for a particular metric. It rewards companies making outsized positive contributions to improving a metric, and lowers ratings for companies making an outsized negative contribution to a metric.

4. Normalize z-scores to a 0-100 scale

Most z-scores will be in the range of +/- 3 standard deviations (~99.7% of data points if it's an approximately normal data set). To translate this to an approximate 0-100 scale we multiply each z-score by 25 (translating 1 standard deviation to a value of 25) and then add 50 to each z-score (moving the mean of all data points to 50). 

This means most data points will fall in the range of -25 (-3 standard deviations) to 125 (+3 standard deviations). To deal with outliers, we winsorize, or "cap", all scores at +/- 3 standard deviations; i.e., a minimum score of -25 and a maximum score of 125. Since final ratings of companies include a weighted average of many raw metric scores (typically 20-50), individual metric scores less than 0 or greater than 100 (which are uncommon) almost never "pull" a final company rating below 0 or above 100. If they do we cap the final rating at a minimum of 0 and maximum of 100.

For metrics such as “Best of” or “Worst of” lists with a few hundred or fewer companies, we assume the list intends for included companies to rate highly ("Best of" lists) or poorly ("Worst of" lists) relative to companies not on the list. To account for this we create a distribution from 100-60 ("A" and "B" scores) for companies making a "Best of" list, or from 40-0 ("D" and "F" scores) for companies making a "Worst of" list. Companies not on the list receive a uniform score of 0 for "Best of" lists (since they were uniformly measured as not good enough to make the list) or 100 for "Worst of" lists (since they were uniformly measured as good enough to NOT make the list). Because this skews additional distribution of performance among companies not on the lists, these metrics are usually given a small weight in calculating final company ratings.

For metrics where raw data has already been distributed on a 0-100 or equivalent scale (e.g., 0-5), we skip steps 3 and 4. When this is the case, further normalizing would skew the intended distribution of scores from the raw source. In cases where the data is on an equivalent scale (e.g., 0-5), we simply multiply raw data to convert to the 0-100 scale (e.g., multiply data on a 0-5 scale by 20).

The goal of each strategy above is always to maintain the original data's presentation of company performance as best as possible, and to aggregate data points into the most accurate view of company performance possible. 

Ethos tests all normalization strategies for each dataset (metric) to assess which strategy most accurately represents the distribution of company performance for that dataset. We look at both the relationship among companies (e.g., are data clustered around a certain range) and at external, absolute gauges of company performance (e.g., is there a credible third party that says the best-performing company in something like gender pay equality should only be at a "B" or "C" level). We then use these assessments to make a decision on the best normalization strategy for each dataset. 

We are always open to suggestions. If you have any or if you would like more information, please reach out to support@ethos.so.

5. Combine normalized metric data to ratings for each cause on Ethos

We then combine normalized metric data to create a single rating for each company on Ethos with sufficient data (usually at least 60% of metrics for a particular cause) for each of the 45 causes on Ethos.

To combine the normalized data, we first determine a weight for every metric within each impact area. For example, the Gender Equality rating might give 10% weight to the gender pay gap metric, in which case 10% of a company's Gender Equality rating would be composed of its gender pay gap score. To determine an appropriate weight for each metric, we look at: 

  • How credible is the data behind the metric? Do we believe it accurately represents performance at the company or security for the metric?
  • How reliable is the data? Is it consistently reported?
  • How relevant is the data to the intended impact area? Does what it measures closely align with the impact area? For example, we believe a company's gender pay gap is more relevant to gender equality than the company's policies on telecommuting 

After assigning weights to each metric, we multiply metric scores by metric weights to get the rating for a company.

6. Calculate z-scores and normalize company ratings for each cause

We then calculate z-scores for each company rating and normalize to a 0-100 scale. To do this we follow a similar approach as in steps 3 and 4.

Scores are now equal to the ratings you can see on each cause and company profile on Ethos.

7. Calculate fund ratings for each cause

Fund ratings are a weighted average of ratings for each company held by the fund. For example, if a fund includes 1% Company A stock and Company A has a rating of 80 for a particular cause, 1% of the fund's rating for that cause will be 80. The other 99% of the fund's rating will be made up of ratings for the fund's other company holdings. 

8. Weight company and fund ratings based on your Impact Assessment

When you take an Impact Assessment on Ethos, we use your input about what’s most important to you to weight ratings of companies and funds. For example, if you picked “Reduced greenhouse gas emissions”, “Renewable energy growth”, and “Disaster readiness and effective aid” and rated them all a 4 out of 7 in importance, your personalized ratings of companies would be calculated using 1/3 of a company’s rating for each cause. You can pick as many causes as you want for your personalized ratings and give them any importance you want, and Ethos will weight your ratings accordingly. 

9. Linking accounts (optional)

When you link an external account to Ethos (e.g., a bank account), we use company and fund ratings based on your Impact Assessment to create ratings for your linked account. We do this by:

  • Determine all company (stock) and fund holdings in your linked account, including the value of each
  • Get the ratings for your held stocks and funds, using your personalized ratings from your Impact Assessment
  • Create weighted average ratings for your accounts; i.e., multiply ratings of held stocks or funds by their weight within an account. For example, if Company A is rated 60 and Company B is rated 80, and you have an account with $500 stock in Company A and $500 stock in Company B, the account's rating would be 70

Please contact us at support@ethos.so with questions.


Was this article helpful?

3 of 3 found this article helpful
Have more questions? Contact us