Google Open-Sources Magika: AI-Powered File-Type Detection System for Accurate Identification

Date:

Magika: AI-Powered File Type Identification System Now Open Source

Google has open-sourced Magika, its AI-powered file-type identification system, with the aim of helping others accurately detect binary and textual file types. This move is expected to greatly improve file identification accuracy for various software applications and contribute to the field of cybersecurity.

Accurate file-type detection has always been critical in determining how files should be processed. However, it is a challenging task due to the diverse structures of different file formats, particularly when it comes to textual formats and programming languages. Existing file-type identification tools heavily rely on manually crafted rules and heuristics, which are time-consuming to develop and prone to errors. Moreover, attackers often try to fool these detection systems with malicious payloads.

To address these challenges, Google developed Magika, an AI-powered file type detector. Magika utilizes a custom, highly optimized deep-learning model created using Keras. Remarkably, the model weighs only about 1MB, enabling fast and precise file identification within milliseconds, even on a CPU. Magika uses Onnx as an inference engine to ensure efficient file-type detection.

In terms of performance, Magika surpasses existing tools by approximately 20%, based on a benchmark evaluation of 1 million files spanning over 100 file types. Notably, Magika outperforms other tools significantly when it comes to identifying textual files, including code files and configuration files. This improvement in performance is particularly valuable for security applications.

Internally, Google has been successfully using Magika to enhance the safety of its users. By employing Magika, Gmail, Drive, and Safe Browsing files are accurately routed to the appropriate security and content policy scanners. On average, Magika improves file type identification accuracy by 50%, compared to the previous system that relied on manually generated rules. This increased accuracy enables Google to scan 11% more files with their specialized malicious AI document scanners and reduces the number of unidentified files to just 3%.

See also  Hackers Exploit Large Language Models in Global Cybersecurity Battle

Magika is also set to integrate with VirusTotal, enhancing the platform’s existing Code Insight functionality, which uses Google’s generative AI to detect malicious code. This integration will act as a pre-filter, ensuring improved efficiency and precision when analyzing files using Code Insight. This collaboration further bolsters the global cybersecurity ecosystem, making the digital environment safer for users.

By open-sourcing Magika, Google intends to assist other software applications in improving their file identification accuracy. The code and model for Magika are now freely available on Github under the Apache2 License. It can be installed as a standalone utility and Python library using pip. In addition, an experimental npm package is available for those interested in the TensorFlow.js (TFJS) version.

Through Magika, Google has taken a significant step towards enhancing file-type detection accuracy through AI. This development is poised to benefit numerous organizations and researchers, allowing for the reliable identification of file types on a large scale.

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Advait Gupta
Advait Gupta
Advait is our expert writer and manager for the Artificial Intelligence category. His passion for AI research and its advancements drives him to deliver in-depth articles that explore the frontiers of this rapidly evolving field. Advait's articles delve into the latest breakthroughs, trends, and ethical considerations, keeping readers at the forefront of AI knowledge.

Share post:

Subscribe

Popular

More like this
Related

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

Tonix Pharmaceuticals TNXP shares decline 14.61% post-Q2 earnings report. Evaluate investment strategy based on company updates and market dynamics.

The Future of Good Jobs: Why College Degrees are Essential through 2031

Discover the future of good jobs through 2031 and why college degrees are essential. Learn more about job projections and AI's influence.