Bhashini's Purpose
Bhashini shall act as an orchestrator to unify and align a large diverse network across government, industry, academia, research groups and start-ups to bring all their contributions into an open repository.
Infrastructure
Unified Architecture for Bhashini's Infrastructure
Bhashini shall create a unifying architecture, underpinned by principles of open data and open source software, to enable contributions from the research initiatives and the ecosystem. This shall also catalyze the ecosystem to work on an integrated approach to build diverse solutions on top. The idea is to build a community of contributors that works with a unified approach to help Bhashini realize its stated objectives.

Bhashini's Roadmap
Foundation Track
Publish ULCA API
Data repository
Model repository
Benchmarking system
Data collection tools
Contribution Track
Training and benchmark datasets
Data contributions from government entities, language chapters, communities etc.
Crowdsourcing initiatives
Open source language models
Innovation Track
Hackathons and challenge rounds for developing applications
Inter-ministerial projects that leverage Bhashini to provide citizen centric services
Workshops to encourage startups to utilize contributed data and models
Grand Challenge Track
Conduct one grand challenge related to Bhashini's goals every year
Participation from academia and industry
Application Programming Interfaces
Data Sets Language datasets
- Parallel text corpus in two or more languages
- Monolingual text corpus
- Automatic Speech Recognition (ASR) corpus
- Text to Speech (TTS) corpus
- Optical Character Recognition (OCR) corpus
- Natural Language Understanding (NLU) datasets
Models Language specific tasks
- Machine Translation (MT)
- Automatic Speech Recognition (ASR)
- Text to Speech (TTS)
- Optical Character Recognition (OCR)
Benchmarks Open benchmarking
- Large, diverse and task specific benchmarks
- Research community approved metric system