Databricks and Hugging Face have joined forces to introduce a new feature that integrates Apache Spark and Hugging Face datasets for faster Artificial Intelligence (AI) model building. This integration is intended to simplify the process of transforming and loading data for AI models while providing users with cost and speed advantages.
With this new collaboration, users can now map their Spark dataframes into a Hugging Face dataset. This enables the user to use one single command to obtain a fully-loaded Hugging Face dataset which can then be used for model training and fine-tuning. Databricks claims that this integration brings memory-mapping and smart caching optimizations of Hugging Face datasets, while keeping the cost-saving and speed advantages of Spark.
According to Jeff Boudier, head of monetization and growth at Hugging Face, the collaboration will create robust AI workflows and lower the barrier for those trying to develop AI models. Craig Wiley, senior director of product management at Databricks, claims that the integration will drastically reduce data processing time and costs. The company expects to see a 40% reduction in the time it takes to process a 16GB dataset, from 22 minutes to 12.
Databricks is also introducing PyTorch distributor for Spark platform and adding AI functions to its SQL service. Furthermore, the company is working on OpenAI integration, Langchain support, and stream support to enhance the dataset loading process.
Databricks is an American software company founded in 2013 and based in San Francisco, California. Their company is the commercial arm of The Apache Software Foundation’s popular open source project Apache Spark which provides analytics, data processing, and information streaming technologies. Databricks provides platform and services to ingest, store, analyze, and visualize data across the organization.
Jeff Boudier is the head of Monetization and Growth of Hugging Face. He is responsible for the development of Hugging Face’s monetization strategy and, hiring and managing its growth team. He joins Hugging Face with more than a decade of experience in product, engineering, and strategy for tech companies. His past roles have included Tech Lead and Product Manager positions at Home Depot, Zumba, and Liberty Mutual.