In a significant move, AI4Bharat launched 'Airavata', a dedicated language model focused on improving AI's proficiency in Hindi.
In Brief
In a noteworthy announcement, AI4Bharat revealed 'Airavata', a sophisticated language model aimed at enhancing Hindi language processing in AI technologies, developed by refining the OpenHathi framework.

IIT Madras, a prominent institution in India, has showcased its AI research capabilities by launching Airavata, an advanced language model specifically tuned for Hindi. The model was refined using a variety of Hindi datasets to optimize its performance for various applications. AI4Bharat Hindi ranks as the most spoken language in India, with more than 43% of the population being native speakers.
The AI lab stated, 'Currently, Airavata focuses exclusively on Hindi, but we have plans to expand its capabilities to include all 22 scheduled Indic languages in the near future.'
It's crucial to highlight that the efficiency of large language models hinges significantly on the quality of instruction-tuning datasets; unfortunately, there is a notable lack of diverse datasets for the Hindi language. LinkedIn post There has been significant advancement in creating foundational datasets for pre-training like RedPajama, for instruction tuning as seen with Alpaca, UltraChat, Dolly, and OpenAssistant; also notable are evaluation benchmarks like AlpacaEval and MT-Bench. Nonetheless, the bulk of these improvements has been heavily skewed towards the English language. large language models While there is some minimal acknowledgment of Indian languages, it mainly stems from accidental inclusion of Indian data in datasets used during the pre-training phases of these language models. However, the actual representation, tokenizer effectiveness, and overall performance of Indian languages still lag considerably behind English models, as noted by AI4Bharat Labs.
The lab further pointed out, 'Even renowned closed-source models such as ChatGPT and GPT-4 tend to perform worse when handling Indian languages compared to their capabilities in English.'
AI4Bharat Unveils Instruction Tuning Datasets said in its statement .
The team at AI4Bharat has also made available the instruction-tuning datasets that were critical for the model's development, fostering further exploration into Indic language models.
'Airavata' is built on human-curated datasets that comply with licensing guidelines, ensuring that the creation of instruction-tuned models is sustainable. The team intentionally avoids utilizing data generated from proprietary models, as this would elevate costs and hinder the applicability of the models across various platforms due to licensing constraints.
The belief here is that utilizing carefully curated datasets by humans is a more viable strategy when creating models for a variety of Indic languages. datasets However, similar to other large language models, Airavata faces inherent challenges, including the potential for generating misleading information or inaccuracies, particularly in specialized areas. There’s also the risk of inadvertently creating biased or inappropriate content.
It's important to clarify that this model is intended strictly for research applications and isn't suitable for any actual production environments. GPT-4 Earlier, the AI4Bharat lab also launched an open-source video transcreation platform named Chitralekha, which integrates a comprehensive workforce management system for transforming videos across languages, encompassing transcription, translation, and voice-over services.
This platform was developed in partnership with EkStep, a non-profit foundation that played a pivotal role in the creation of India’s Aadhaar project.
Furthermore, AI4Bharat has opened applications for its AI resident and associate program for the academic year 2024-25. This year-long pre-doctoral initiative emphasizes focused work in the AI sector.
Please remember that the details provided here do not constitute legal, tax, investment, financial, or any other type of advice. It's vital to invest only what you can afford to lose and seek independent financial guidance if you have any uncertainties. For additional information, we recommend checking the terms and conditions, as well as the support pages available from the issuer or advertiser. MetaversePost is committed to providing factual, unbiased news, but keep in mind that market conditions can change without prior notice.
Kumar is a seasoned tech journalist who specializes in the evolving intersections of AI/Machine Learning, marketing technologies, and emerging sectors such as cryptocurrency, blockchain, and NFTs. With over three years of experience in the industry, he has built a solid reputation for crafting engaging narratives, conducting in-depth interviews, and providing valuable insights. His expertise lies in producing impactful content, including articles, reports, and research papers for prominent industry platforms. With a unique ability to blend technical understanding with storytelling, Kumar excels at conveying complex tech concepts in a clear and exciting manner.
Blum celebrates its first anniversary with remarkable achievements including the 'Best GameFi App' and 'Best Trading App' accolades at the Blockchain Forum 2025.
Addressing the issue of DeFi fragmentation, Omniston is introducing solutions designed to enhance liquidity within the TON ecosystem. natural language processing (NLP), speech, and vision projects.
Disclaimer
In line with the Trust Project guidelines Vanilla has launched a groundbreaking feature offering up to 10,000x leverage for its super perpetual contracts on the BNB Chain.