Annotating the last mile
This post is a compiled snippet from a Twitter thread that I've transported over here for posterity. The original thread started here: https://twitter.com/b_cavello/status/1342174660700512259
There are SO many people working behind-the-scenes without whom our modern AI systems would not exist!
This work has been an incredible opportunity to learn and create with brilliant thinkers and change-makers to ensure these workers’ efforts are recognized and rewarded. twitter.com/PartnershipAI/status/1341798229055172615
When I worked at a tech company, I primarily thought of data as a commodity. Phrases like “data is the new oil” reinforce the idea that data is something that people can “discover” or “collect” without recognition of the effort that goes into producing that data.
Even as algorithmic pre-processing of data facilitates some of the initial labeling, translation, and other data enrichment tasks, there remains a need for human review. This is “automation’s last mile.”
www.youtube.com/watch?v=zj2DEQCOTh0
Even putting aside for a moment the CREATION of data (there’s a lot to discuss there!), the curation, cleaning, and validation work done by an often unrecognized workforce of data enrichment workers enables the many impressive results we see in today’s AI systems
Data enrichment work is a key component of AI development.
Unfortunately, because of this mindset of data as a ready-made commodity, AI researchers, data scientists, and product managers often underestimate the effort and skill involved in doing this important work.
Often, when AI developers look to source data enrichment work, they turn to gig platforms where workers constantly encounter poorly written instructions and battle apparently arbitrary rejections of their work, receiving no pay for their efforts.
No one wants it to be this way!
Poorly written instructions, unrealistic time estimates, & inadequate compensation can delay projects or create mismatches between work & workers’ experience.
There are several benefits to AI developers discussing the data enrichment goals with the people doing the work.
Shifting the mindset about where data comes from and our relationship with data enrichment work can enable both greater collaboration between AI developers and enrichment workers and secure the quality working conditions these critical players deserve.
It may feel formidable, but in reality, changing this ecosystem starts with some small but high ROI interventions: running pilots to ensure instructions are clear & time expectations are realistic, calculating a rough estimate of expected wages, & seeking reputable platforms.