Research Session: A Matter of Matching

For our monthly Citizen Codex research session, I put together a presentation on entity resolution, the work of deciding whether two pieces of data refer to the same real-world thing. I'd spent six years on this problem at a previous role, and slides alone weren't going to do it justice. So I built a site: sliders to show how the stakes shift from a duplicate bank charge to a man being declared dead by Social Security, flip cards for the questions every team should be asking, a decisions log that pulled real product calls I'd made as a PM out of the abstract and onto the page. I wanted to show how multi-faceted entity resolution is and how crucial a 360 understanding of it is.

‍

Takeaways

What I didn't expect was how much the conversation would push my own thinking when I presented it to the team. I'd planned on making a case for keeping humans in the loop, but the team pushed back, arguing LLMs may actually be better suited to the kind of interaction-effect reasoning entity resolution demands. Others pressed on how you generate a probability at all in an LLM-based system, when the math that produces confidence in traditional matching doesn't carry over cleanly. The discussion turned to graph linkages and community structure as ways to add signal where individual identifiers fall short, and to the VA and military health records, one of the most consequential entity resolution failures I have seen in the country right now. Each of those threads brought up by the team complicated my own picture of entity resolution in ways I hadn't fully thought through, and that is the point of doing this kind of session in the first place.

What I keep coming back to is that entity resolution cannot be seen as a back-end concern. It's a values decision disguised as a technical one. There are too many important questions that sit with this topic for it just to sit with a technical team. Who bears the cost of a wrong match, how confidence gets translated into something a user can act on, whether a person has any path to correct a record about themselves. These are product questions, design questions and business questions hopefully long before they're engineering ones. I brought this topic to Citizen Codex because we sit in a part of the world where these decisions land on real people through the projects we take on like benefits systems, voter rolls, eligibility determinations, and identity verification. I'm so grateful that we are building a practice of stopping to discuss these topics before we build, not after something breaks.

‍

Lawn Mowing Experiment

Research Session: A Matter of Matching

Pulling entity resolution out of the context I knew it in, and finding it almost everywhere else

Takeaways

Contact the Newsroom

Further Reads