Madrid-based AI startup Clibrain is making a significant contribution to the creation of generative AI models optimized specifically for Spanish speakers. The company recently unveiled its debut model, Lince Zero, which has been fine-tuned to understand and parse the nuances of the Spanish language. This initial release serves as a preview of Clibrain’s more powerful foundational model, aptly named Lince, which is still in development.
Clibrain’s motivation stems from the recognition that Spanish is one of the most widely spoken languages globally, boasting various dialects and regional variants. The company believes that the linguistic diversity within the Spanish language makes it difficult for existing models to accurately comprehend and generate Spanish text. To address this challenge, Clibrain is designing models that can handle the intricacies and subtleties of the Spanish language better than traditional Language Models (LLMs).
What differentiates Clibrain from other AI startups is its focus on leveraging a unique corpus of training data. Co-founder and CEO Elena Gonzalez-Blanco, with her background in linguistics research and poetry, has gathered vast amounts of valuable data that have not been utilized for training purposes until now. This exclusive access to a distinctive corpus of data gives Clibrain an edge in the development of its AI models.
While Clibrain builds on existing open source technologies for its models, it stands apart from others in the field due to its team of seasoned AI engineers. With a multidisciplinary staff of nearly 30 experts and a dedicated Research and Development laboratory focused on generative AI, Clibrain is well-equipped to drive innovation in the Spanish language processing domain.
The debut model, Lince Zero, is being released under an open source license, allowing users to explore and provide feedback. Although the 7-billion-parameter LLM is not yet as powerful as the foundational model, Clibrain assures that the more advanced model is on its way.
Despite not being the first conversational AI model tailored for the Spanish language, Clibrain maintains that it has outpaced previous attempts, including the Barcelona Supercomputing Center’s MarIA project. Furthermore, Clibrain acknowledges the existence of other non-English language-optimized LLMs, such as Baidu’s Chinese language model, Ernie, and the German-tuned LLM model family. However, Clibrain’s complete dedication to Spanish allows it to surpass these models in terms of comprehending Spanish linguistic nuances.
In terms of performance, Lince Zero is equivalent to OpenAI’s GPT-3 model, while MarIA’s performance is comparable to that of GPT-2. Clibrain believes that the key to superior performance lies not solely in the size of the model but also in a profound understanding of the linguistic intricacies of the Spanish language. By focusing on linguistic accuracy and detail, Clibrain aims to establish its models as the preferred choice for Spanish-speaking markets.
At present, Clibrain has relied on the founders’ own funding from previous startup ventures to support development efforts. However, as the company progresses with its Lince product roadmap, it may consider seeking external investment to propel further growth and expand its offerings.
Clibrain’s release of Lince Zero marks the first stage of its ambitious roadmap. With its commitment to linguistic expertise and its determination to bridge the gap in Spanish language processing, Clibrain is positioned as a contender in the field of generative AI models catered specifically to the Spanish-speaking world.
This article was first reported on TechCrunch.