MediaTek boosts on-device AI speed by 50% using Phi models from Microsoft

MediaTek needed to boost AI capabilities for on-device AI performance, delivering real-time functionality and high accuracy without relying on an internet connection.

To meet those demands, MediaTek selected Phi models from Microsoft—customizable and efficient small language models that power highly secure, low-latency AI on handheld devices.

The deployment led to 50% faster AI processing, 80% improved prompt performance, and 30% greater power efficiency, enhancing real-time device interactions and the user experience.

In the dynamic world of business, digital transformation is no longer a mere buzzword—it’s a game changer. With the advent of applied AI use cases, digital transformation is gaining unprecedented importance. It’s revolutionizing industries by accelerating development and paving the way for innovative products and groundbreaking business models. This isn’t just about keeping pace; it’s about leading the charge and redefining what’s possible.

For MediaTek, a leading semiconductor company that delivers advanced system-on-chip (SoC) solutions for mobile devices, home entertainment, connectivity, the Internet of Things, and more, innovation is a must. MediaTek aimed to improve the performance of chipsets used in handheld devices by increasing speed and enabling real-time functionality, without relying on an internet connection. This was critical to ensure devices could perform efficiently and meet the high expectations of its customers.

Customer demand for on-device AI capacities

MediaTek’s customers, including major smartphone manufacturers, were demanding advanced AI capabilities that could operate seamlessly on their devices. The company needed small language models (SLMs) that could run efficiently on their chipsets, providing real-time AI functionalities such as translation and multimodal assistance.

Additionally, MediaTek needed to ensure that these AI capabilities could be delivered with minimal latency and high accuracy, even in the absence of a stable internet connection.

The stakes were high. Failing to meet these demands could affect MediaTek’s competitive edge in the market. The company needed to bridge the gap between device AI capabilities and cloud-powered scalability, enabling a better user experience with real-time AI interactions on handheld devices.

Finding the right technology match

MediaTek selected Microsoft Azure AI Foundry for its comprehensive feature set, particularly those designed to streamline engineering workflows. With rapid customization capabilities, engineers can efficiently adapt AI models to meet specific requirements, ideal for tailoring models to diverse applications and devices.

MediaTek valued the scalability of Azure and its ability to manage large datasets and complex computations, helping ensure effective AI model training. The company saw value in the platform’s seamless interoperability, robust AI capabilities, and its Phi models, which provide engineers advanced AI functions with minimal setup. Found in the Azure AI Model Catalog, Phi models are Microsoft-created, first-party language models. They are customizable, responsive, and ideal for AI use cases that do not require cloud or internet connectivity. MediaTek was impressed by the rapid evolution of the Phi model family, with Phi-3 being released in 2024, and has explored the use of each iteration of Phi, including the Phi-3.5 series and the Phi-4 series models.

“Our customers use the Phi-3.5 mini model, which allows them to rapidly customize their product.”

Yannic Peng, Product Manager, MediaTek

Boosting AI performance with Phi

MediaTek runs AI models natively to better facilitate real-time interactions by incorporating Phi-3.5 into its Dimensity 9400 chipsets. The Phi models offer advanced AI capabilities at a lower cost, provide high-quality training data, and include safety measures to yield accurate and reliable outputs. “Our customers use the Phi-3.5 mini model, which allows them to rapidly customize their product,” says Yannic Peng, Product Manager at MediaTek.

“Right now, the speed of the Phi-3.5 mini model is at least 18 tokens per second, which is more than twice the average human reading speed.”

Yannic Peng, Product Manager, MediaTek

By processing data locally, MediaTek can offer more secure solutions, crucial for applications handling sensitive information. This also opens new possibilities for AI interoperability across various devices, showcasing the versatility and transformative potential of Phi models.

MediaTek uses speculative decoding in Azure to further enhance the performance of AI models, making it a valuable tool for engineers working on advanced AI applications. It significantly improved token processing speed and helped guarantee the AI models deliver real-time responses. “Right now, the speed of the Phi-3.5 mini model is at least 18 tokens per second, which is more than twice the average human reading speed,” says Peng.

“Some customers want to add safety measures to prevent large language models from giving wrong or inappropriate answers. For these requirements, Azure has rich resources our customers can choose from. We can quickly work with Microsoft to find a solution to meet our customers’ needs.”

Frederic Wu, Technical Manager, MediaTek

Fine-tuning models and optimizing privacy

The Phi API Toolchain provides engineers with a comprehensive set of tools for model customization, fine-tuning, and deployment. This includes capabilities for adding AI models to different environments and optimizing performance.

Throughout the implementation process, MediaTek used edge AI optimizations to help ensure that sensitive information remained secure and private. Local processing minimizes the need for constant internet connectivity, helping to reduce latency and improve AI application responsiveness. “Some customers want to add safety measures to prevent large language models from giving wrong or inappropriate answers. For these requirements, Azure has rich resources our customers can choose from. We can quickly work with Microsoft to find a solution to meet our customers’ needs,” says Frederic Wu, Technical Manager at MediaTek.

Faster, more efficient processing and future plans

MediaTek’s use of Phi models notably increased the speed and efficiency of AI processing. Using speculative decoding technology helped the AI models process tokens at a rate of 18 tokens per second—more than 50% faster than the previous model, which processed at 12 tokens per second.

This improvement in speed was vital for delivering real-time AI interactions on handheld devices. MediaTek also experienced:

80% faster performance in language model prompts

30% greater power efficiency in its chips

This year, the company will expand its AI solutions by adding the Phi-4-multimodal model and enabling speech and vision inputs to further advance the user experience.

Discover more about MediaTek on Facebook, Instagram, LinkedIn, X, and YouTube.