Facebook is sharing a new and diverse dataset with the broader AI community. In an announcement spotted by VentureBeat, the company says researchers want to use the collection, called Casual Conversations, to test their machine learning models for bias. The dataset includes 3,011 people in 45,186 videos and gets its name from the fact that these people give unscripted answers to the company’s questions.
Suggestion For You:
Importantly, Casual Conversations involves paid actors to whom Facebook explicitly asked to disclose their age and gender. The company also hired trained professionals to label the subjects’ ambient light and skin tones according to the Fitzpatrick scale, a system developed by dermatologists to classify human skin tones. Facebook claims the dataset is the first of its kind.
You don’t have to look far to find examples of bias in artificial intelligence. A recent study found that facial recognition and analysis programs such as Face++ rate the faces of black men more angrily than their white counterparts, even when both men are smiling. The same flaws have permeated consumer AI software. In 2015, Google adjusted Photos to no longer use a label after software engineer Jacky Alciné discovered that the app was misidentifying its black friends as “gorillas.”
Many of these problems can be traced back to the datasets that organizations use to train their software, and that’s where an initiative like this can help. A recent MIT study of popular machine learning datasets found that about 3.4 percent of the data in those collections were inaccurate or mislabeled.
Although Facebook describes Casual Conversations as a “good, bold first step forward,” it admits that the dataset is not perfect. For starters, it contains only people from the United States. The company also did not ask participants about their ethnicity, and when it came to gender, the only options were “male,” “female,” and “other.” However, the company plans to make the dataset more inclusive in the coming year.