Transcend the language barrier

Bhashini aims to build a National Public Digital Platform for languages to develop services and products for citizens by leveraging the power of artificial intelligence and other emerging technologies.
Bhashini's Purpose
Bhashini shall act as an orchestrator to unify and align a large diverse network across government, industry, academia, research groups and start-ups to bring all their contributions into an open repository.
Infrastructure
Unified Architecture for Bhashini's Infrastructure
Bhashini shall create a unifying architecture, underpinned by principles of open data and open source software, to enable contributions from the research initiatives and the ecosystem. This shall also catalyze the ecosystem to work on an integrated approach to build diverse solutions on top. The idea is to build a community of contributors that works with a unified approach to help Bhashini realize its stated objectives.
Unified Architecture for Bhashini's Infrastructure

Bhashini's Roadmap

timeline

Foundation Track

  • Publish ULCA API
  • Data repository
  • Model repository
  • Benchmarking system
  • Data collection tools

Contribution Track

  • Training and benchmark datasets
  • Data contributions from government entities, language chapters, communities etc.
  • Crowdsourcing initiatives
  • Open source language models

Innovation Track

  • Hackathons and challenge rounds for developing applications
  • Inter-ministerial projects that leverage Bhashini to provide citizen centric services
  • Workshops to encourage startups to utilize contributed data and models

Grand Challenge Track

  • Conduct one grand challenge related to Bhashini's goals every year
  • Participation from academia and industry

Foundation Track

Publish ULCA API Data repository Model repository Benchmarking system Data collection tools

Contribution Track

Training and benchmark datasets Data contributions from government entities, language chapters, communities etc. Crowdsourcing initiatives Open source language models

Innovation Track

Hackathons and challenge rounds for developing applications Inter-ministerial projects that leverage Bhashini to provide citizen centric services Workshops to encourage startups to utilize contributed data and models

Grand Challenge Track

Conduct one grand challenge related to Bhashini's goals every year Participation from academia and industry
Universal Language Contribution API
About ULCA

ULCA is a standard API and open scalable data platform (supporting various types of datasets) for Indian language datasets and models.

The objective of ULCA is to support the research and development of AI tools in Indian languages.



ULCA
Application Programming Interfaces
Data Sets  Language datasets
  • Parallel text corpus in two or more languages
  • Monolingual text corpus
  • Automatic Speech Recognition (ASR) corpus
  • Text to Speech (TTS) corpus
  • Optical Character Recognition (OCR) corpus
  • Natural Language Understanding (NLU) datasets
Models  Language specific tasks
  • Machine Translation (MT)
  • Automatic Speech Recognition (ASR)
  • Text to Speech (TTS)
  • Optical Character Recognition (OCR)
Benchmarks  Open benchmarking
  • Large, diverse and task specific benchmarks
  • Research community approved metric system