Exactly. Removing a couple words the AI associates with women doesn’t magically fix the dataset. The AI was trained with biased input samples, so it too is biased.
This doesn’t mean that any overfitting occured however. Overfitting is an entirely different issue.
The only actual references to what the system did in the Reuters article are:
It penalized resumes that included the word “women’s,” as in “women’s chess club captain.” And it downgraded graduates of two all-women’s colleges, according to people familiar with the matter.
The group created 500 computer models focused on specific job functions and locations. They taught each to recognize some 50,000 terms that showed up on past candidates’ resumes. The algorithms learned to assign little significance to skills that were common across IT applicants, such as the ability to write various computer codes, the people said.
Instead, the technology favored candidates who described themselves using verbs more commonly found on male engineers’ resumes, such as “executed” and “captured,” one person said.
So, assuming that you give the AI both accepted and denied applicants’ resumes and ask it to figure out the difference; this would suggest that, when looking at resumes:
- All of Amazon’s applicant’s are equally skilled regardless of whether they are accepted or not
- Amazon’s normal hiring process pays no attention to resumes when choosing applicants, except for:
- favoring use of certain words like “executed” or “captured”
- favoring the lack of Women-specific colleges or competitions
This sounds bizarre; do they really get all completely equally qualified applicants? Or are resumes just a horrible tool for assessing this?
The second point is more liable to be controversial, either the recruiters are specifically favoring those two traits, or something not on the resume that they select against correlates with those two traits.