Akash Pandey (26), a government job aspirant hailing from Basti, Uttar Pradesh, chanced upon a flexible work opportunity online, which could fetch him ₹12,000-13,000 per project for transcribing audio and marking objects in images.
Meanwhile, Ikshita Nagar (26), a young Delhi doctor preparing for the PG entrance test, put out some extra hours to identify the types of wounds in an image to classify them as burns, abrasion or surgical, as well as to solve NEET questions.
Thousands of gig workers like Pandey and Nagar are becoming the backbone for training artificial intelligence-based large language models (LLMs) by taking up microtasks such as transcribing audio files, labelling images, translating language, as well as marking boxes to identify objects in a self-driving clip and the best responses generated by a chatbot.
India is fast emerging as a hub for data annotation services with flexible workers, mid-tier business analysts and even skilled data engineers contributing to build high-quality datasets. Data annotation, or simply data labelling, is the most crucial and foundational step for building high-quality datasets to train AI models, enhance accuracy, curtail hallucinations and build safety guardrails against inappropriate or harmful content. As per industry estimates, by 2028, the global market for data annotations will be valued at $8.22 billion predicted to grow at 26.2% annually. Of this, the market serviced by India can exceed $7 billion by 2030 with a workforce of up to 1 million.
According to HR services company TeamLease, 20,000 full-time workers are engaged in the managed services paradigm as annotators in India. Across international platforms, 50,000 Indian annotators are actively employed as independent contractors. “Annotation-as-a-service is on a meteoric rise especially in India,” said Alok Aggarwal, a celebrated author and chief executive of AI startup ScryAI.
There are more than 400,000 annotators worldwide. The number is expected to double every three years, thereby having almost 6 million workers in this field by 2040, he said.
Global AI companies Databricks, Fractal, Tredence and startups like Cropin and Minus Zero said they are expanding the team of in-house experts for faster, cost-effective data annotation while also depending on outsourced services in India.
“For the entire MLOps (machine learning operations) pipeline, human-in-the-loop is crucial for handling biases, ensuring accuracy and reliability,” said Rajesh Ramdas, senior director, field engineering, Databricks India.
The San Francisco-based data analytics and AI company has recently released a DBRX 132-billion parameter model.
Hardik Dave, founder and chief executive of startup IndikaAI said, besides building foundational models, enterprises who are fine-tuning LLMs on proprietary data in sectors such as healthcare, need specialised skills for labelling the data.
“While an average labeller can make ₹25k-30k per month on our platform, a radiologist can make upto ₹1 lakh/month for few hours of work.”
Source: The Economic Times