Why did Microsoft delete the world’s largest public face recognition database?
Microsoft has quietly removed the MS celeb face recognition database, which Microsoft claims is the world’s largest open face recognition database, the Financial Times reported. The database was published in 2016 and contains 10 million photos of about 100000 people without my permission.
In Europe, most face recognition projects have the following basic requirements: all photos of non target people are not allowed to be saved. That is to say, unless the consent of the judiciary (such as a suspect), no video and images can be collected, otherwise it may be violated.
Professionals speculate that Microsoft’s deletion of the MS celeb face recognition database, in addition to cultural factors, should also consider legal risks. As far as all countries in the world are concerned, people in Europe are particularly concerned about personal privacy. So, although it’s biometric technology, fingerprint identification can be accepted because the user is informed. But face recognition technology is very sensitive, because the characteristics of face recognition are non-contact and passive. Therefore, in European and American culture, face recognition without the user’s knowledge will bring a strong sense of being “peeped”.
As an American company, Microsoft’s management and engineers have strong feelings for the protection of personal privacy. So it’s no surprise to make such a decision. Of course, in addition to cultural factors, from a technical point of view, it is necessary for individuals to improve their awareness of privacy protection. Because if there is no restriction on the use of face recognition technology, the user’s life trajectory will be fully mastered by setting up the camera with the human image recognition function in the key nodes. Furthermore, through data analysis and “mining”, commercial companies may infringe on the interests of consumers. For example, through the evaluation of key nodes (such as high consumption places, pharmacies and hospitals), body shape and walking posture, insurance companies can grasp the economic situation and health status of customers. Then there’s a basis for pricing customers’ premiums “precisely.” Therefore, the cultivation of citizens’ awareness of privacy protection is not only simple cultural posturing, but also involves the real interests of individuals.
As an academic research aspect of face recognition, Ms celeb is the largest open face recognition data set at present, and the best face recognition models are trained on this data set. Because face recognition is widely used in industry, and the company’s private data set is obviously many times larger than the public data set, direct comparison is unfair, so the face recognition papers must indicate whether the training data set is publicly available when they are published.
After the deletion of the MS celeb, although most researchers have a backup in their hands, strictly speaking, this data set does not exist. When publishing face recognition papers in the future, Ms celeb must be listed as a private data set, which cannot be compared with the method of training on public data sets. This short-term research on face recognition is even a good thing, because MS celeb is too large compared with many test sets. At present, the performance of each large data set is close to saturation. When the data can be covered, the performance of the algorithm will weaken. However, if the trend of deleting public data sets is formed, and CASIA webface and vggface will follow in the future, face recognition will face the situation of no data available. Some politically sensitive research groups may no longer carry out face recognition research. In the future, face recognition research can only be carried out within the company, which is very detrimental to the development of technology. Unfortunately, this is probably the current trend, and the dukemtmmc data set in the direction of person Reid has also been deleted.
Post time: Jul-22-2021