Facial Recognition’s Distorted View
by Michael Dean Thompson
There is a tendency within the human brain to settle on the first solution even when another, better solution is available. Automated facial recognition (“AFR”) systems can exacerbate the problem simply by the fact that they are designed to address an area where most of humanity is objectively poor. This phenomenon is summed up quite nicely in the journal Science—“People form stronger, longer lasting beliefs when they receive information from agents that they judge to be confident and knowledgeable.” These two tendencies converge in AFR so that cops investigating a case will settle on a solution provided by the AFR while ignoring contradictory evidence, leading to false arrests and, potentially, wrongful convictions.
Horror Stories
Alfonso Cornelius Sawyer experienced this firsthand when Maryland Transit Authority (“MTA”) Police charged him with two counts of second-degree assault along with additional charges associated with the theft of a cellphone. He was denied bail after his subsequent arrest and arraignment due to the severity of the crime, which carried a potential 25-year prison sentence.
On March 26, 2022, a man boarding a bus just outside Baltimore was not wearing a facemask. When the driver told him he had to wear a mask on the bus, he leaned over the shield and argued with her, saying, “I hit bitches.” When she pulled out her iPhone to call the police, the rider snatched the phone and fled. She gave chase and was punched several times in the face. Afterwards, he stood on the curb laughing at her as she wiped blood from her nose.
MTA extracted a grainy image from surveillance cameras in which the slim Black man’s face was partially covered by a ball cap and hoodie. When an analyst submitted the images, several responses came back. One of the responses was Sawyer. It’s possible that Sawyer’s image caught the analyst’s attention because he had recently been on probation for traffic violations, which was likely displayed with the response. Unfortunately, DataWorks Plus, the company that performed the facial recognition process and which serves some 2,500 public agencies, was unwilling to respond to emails from The New Yorker and answer questions about why Sawyer was selected, which probe photos were used, and what the results had been. Likewise, the analyst at the Hartford County State Attorney’s Office no longer worked there, and no records of the probe photos and candidate lists were available.
Sawyer’s denial of bail meant it was up to his wife to prove his innocence. The police seemed more interested in finding supporting evidence than paying attention to the contradictory evidence. For example, when cops visited the home where Sawyer and his wife lived—the home of his wife’s sister—they noted that they were unable to locate the clothes worn by the assailant. In addition, his sister-in-law, a retired school teacher, stated that on the morning in question, she had seen Sawyer and his wife still asleep in her living room at 9:30 a.m.—about an hour after the crime in question.
It did not help Sawyer’s cause that his parole officer confirmed he was the assailant. The parole officer had only seen Sawyer twice before, and both times, Sawyer had worn a face mask. But, when later confronted by Sawyer’s wife with photos of her husband, the observation that the assailant was considerably shorter and younger than Sawyer’s six-foot-six frame and more than 50 years of age, the probation officer changed his tune. Sawyer’s wife told The New Yorker that the parole officer had said the police officers brought surveillance video to him and suggested it was her husband. He initially told Sawyer’s wife that he had not made a definitive identification and emailed the detective afterward to say, “I reviewed the photos … and have doubts that the person in the BOLO is Mr. Sawyer.”
Somehow, the investigators failed to notice the age and height discrepancies, that Sawyer had more facial hair, a noticeable gap in his teeth, and an obvious limp from a football injury. The bus driver, however, had indicated in a photo lineup that Sawyer was “Not the person who assaulted me.” By the time they had the victim look at a series of photos of potential assailants, they had submitted into the AFR’s solution enough data to ignore the very person who experienced the attack, along with everything else that cleared Sawyer.
Robert Williams of Michigan can identify with Sawyer’s plight. He is the first publicly known case of false arrest resulting from AFR, according to The New Yorker. Williams’ wife and children were there to experience that moment of terror and confusion when he was arrested for allegedly stealing watches in Detroit. In an effort to validate the arrest, the police worked with a security contractor who identified him in a photo lineup of six people after reviewing the surveillance tape, even though the consultant had never seen Williams in person and was not present during the theft.
It was an old photo from an expired driver’s license that hit on the probe image, though other more recent images of the man that were in the database did not. That inconvenient fact was not taken into consideration. Williams’ photo was not even the first hit; it was the ninth. The analyst performed a “morphological assessment,” where the shape of Williams’ nostrils were found to be most similar to the thief. The analyst ran two more searches. One returned 243 matches. The other one performed against an FBI database found no matches.
Despite only one of three facial recognition queries providing a close match on Williams, and that literally by a nose, the cops arrested him. They could have checked his location information on his phone but did not. It was only after his arrest that cops held up a photo of the thief next to Williams’ head and realized he was not the thief because he was much larger and had a broader face, among other differences.
Automated Facial Recognition
Proponents of automated facial recognition like to point to the National Institute of Standards and Technology (“NIST”) study in 2019. That study found that when comparing high quality images such as in a portrait or mugshot, the very best of the tested algorithms did very well, even among people of different races. Even so, those systems were 100 times as likely to misidentify a person of Native American descent than a white male. Black women fared the worst in the study. And while their failure rate may seem small at first glance, when compared to the number of database images and the number of daily probes, the numbers jump to a very tangible frequency of daily misidentification.
That is for the very best systems, Microsoft and Amazon and the like. The majority of the tested systems did not do so well, and some performed very poorly. Meanwhile, many of the systems in use by cops today did not participate in the NIST study and lack such standardized testing. It is likely, then, that the systems the cops use are not in the same category for successful match rates.
The NIST also noted several conditions that would increase the number of errors, both false positives (where a match is found but is wrong) and false negatives (where no match is found though one exists in the database). Something as simple and common as a bruise, such as a black eye or even aging can confuse the systems. Likewise, images on a T-shirt can throw them off. Many facial recognition systems miss features like moles that would be obvious and identifying to a human observer. And, most relevant to many criminal investigations, grainy images, differing camera angles, and partially obscured images present significant challenge to these systems.
A key problem with many images collected from surveillance video is the camera angle and distance. Any person who has ever held a camera as they took a selfie has experienced foreshortening, where closer items in the image seem distorted. For AFR, the camera angle is a two-fold problem. The first is that the database images probably will not have the same or similar angle. That means the system will have to calculate the angle of the camera’s view and adjust the visible points in an effort to eliminate foreshortening. Researchers have identified some 80 metrics that can be used. For example, the ratios of the widths of the eyes, nose, and mouth are some of the measures. Any significant angle will additionally obscure some number of points, like one of the ears and part of an eye. Since people’s faces are not perfectly symmetrical, the AFR will be making comparisons on an incomplete set of points, and that assumes it has calculated the incident angle correctly.
Even when provided two straightforward images, there will still be some variance between the images because of slight differences in angle. AI training attempts to teach the system how much tolerance is acceptable. The identification threshold set by analysts using the tool is how much the images can vary from the probe. That is why a photo of Williams can surpass an identification threshold on one system despite human eyes being able to quickly pick out obvious size differences.
Some companies, such as DataWorks Plus (the company in Sawyer’s case), offer tools to assist with eliminating those errors. These tools can crop images to isolate faces. Some allow for modifying the image to add features obscured by shadows. Others even permit the analyst to combine two or more images to draw out features or create a 3D image. The problem is that such changes in the probe photo exponentially increase algorithmic and personal biases. Neither have those features found their way through systematic testing as would be in a peer reviewed study, without which their accuracy cannot be known.
Too much of automated facial recognition is still highly subjective, despite common perceptions to the contrary. AFR systems can return hundreds of images based on the threshold setting, as happened with Williams. Lower thresholds cast much wider nets. Those matches may be returned along with a dossier of things like an arrest history. As an analyst scans the images, they can score people higher without even realizing it, magnifying the already substantial personal biases they bring to the table.
The extraordinary complexity of AFR means it boils down to an algorithmic best guess. As Kidd and Birhane point out in Science, “transmitted biases or fabricated information are not easily correctable after the fact either within individuals or at the population level. This aspect of human psychology interacts with how humans treat agentive entities and, in particular, their tendency to be greatly swayed by agents that they perceive as confidant and knowledgeable.” In effect, people are more inclined to reach what the authors termed a “threshold of certainty” because it has been delivered by AI. That likely remains true even if a human analyst acts as an intermediary between the police investigator and the AFR system. After all, in the investigator’s mind, the analyst will have overcome any uncertainties that might have been present in the initial output.
Since the creation of the calculator, and even before, our machines have performed feats that have seemed miraculous to previous generations. Today, we are not accustomed to machines that fail to provide instant, correct answers but instead provide “fuzzy” answers.
New Orleans Mayor LaTonya Cantrell’s trust in tools made her certain that lifting a ban on AFR would change policing for the better. “Passage of this ordinance by the City Council now paves the way to increase the NOPD’s ability to protect and serve the residents, businesses, and visitors to the City of New Orleans,” she proclaimed. “This is a win for everybody.”
The NOPD began using AFR in October of 2022. That was about the time Randal Reid, a Georgia resident, was arrested for credit card theft. He had never been to New Orleans or even Louisiana for that matter. It turned out that he was arrested because of a false AFR identification. No other evidence placed him anywhere near the crime. He spent days in jail and thousands of dollars because ClearView AI, which advertises itself as “100% accurate,” missed a mole and the significant weight difference between him and the actual suspect.
In the year that followed, the NOPD’s self-reporting showed 19 AFR requests, 15 of which were fulfilled. Only six of those came back with matches, and three of those were false positives. Just three positive matches remain, two of which have not yet had their day in court. Thus far, adopting AFR has not noticeably improved policing in the city, but it has wreaked havoc on those falsely identified.
When The New Yorker interviewed Sawyer about his experience with AFR, he said he had never heard of the technology before then. He added that if it had not been for his wife, “I’d-a ate that.” That is, he would have pleaded guilty in hopes of receiving a lighter sentence despite being factually innocent. As Criminal Legal News has covered previously, only roughly 2% of criminal cases ever go to trial. The trial penalty is just too great to risk it. So, while only a half-dozen cases of arrests due to false positives have thus far been uncovered, the number is almost certainly much higher, but we’ll never know of most of them because the cases resulted in pleas before discovery into the prosecution’s evidence.
In Reid’s case, his warrant never even mentioned that his arrest was due to a positive AFR match. Fortunately, his sharp-eared attorney just happened to overhear the words “positive match.” How many cases, then, have pleaded out not even knowing that AFR had played a role in the arrest?
Sources: NewYorker.com, FoxNews.com, Science.org
As a digital subscriber to Criminal Legal News, you can access full text and downloads for this and other premium content.
Already a subscriber? Login