This is the Trace Id: 5a40cb08c7c54dbf0ada2d148066087a
3/7/2025

Central Provident Fund Board modernises data management with Azure Databricks to better serve over 4 million people

Singapore’s Central Provident Fund Board (CPFB) aimed to better leverage data for decision making, policy formulation, and service delivery.

CPFB implemented a Unified Data Platform (UDP) using Azure Databricks to modernise its workflows and empower users with tools to gain valuable insights from data about its members.

UDP improved data accessibility and enabled more sophisticated analysis to improve the schemes and services that CPFB administers for the benefit of over 4 million members.

Central Provident Fund Board

The Central Provident Fund (CPF) is a key pillar in Singapore’s national social security system. The CPF system serves more than 4 million CPF members by providing a foundation in retirement, through home ownership, healthcare protection, and lifelong retirement income. CPF Board (CPFB) is also the administrator for national schemes such as the Home Protection Fund, Lifelong Income Fund (an annuity product), MediShield Life Fund, CareShield Life, and ElderShield Insurance Fund. With such a diverse customer base, CPFB needs to ensure that schemes and services are as inclusive and effective for as many members as possible. This requires leveraging data for policy formulation, operations, and service delivery. 

Identifying limitations in data management

CPFB was the first government agency in Singapore to be computerised in the 1960s. The first mainframe computer was installed to automate the manual ledger accounting system that kept track of CPF members’ accounts. Since then, CPFB has been leveraging data to better serve members. The modernisation of CPFB’s data systems is a continuous evolution, which became more important as the needs of members became more complex, the funds under management grew, and more services were rolled out for members. “While our data journey spanned many decades, it intensified about 10 years ago. We put in place an on-premises data warehouse to consolidate data from various sources. Over time, we found that this had limitations as our colleagues needed to perform more demanding analytics on large datasets,” recalls Vance Ng, Director of the Data Science Accelerator (DSA) Department at CPFB.

This required users to download data from the data warehouse before using it, and getting access to data required multiple approvals. Users relied on limited capabilities of individual desktops or laptops when performing computations, leading to challenges with scalability. “Anytime we needed to do extensive analysis, not only did we have to deal with managing large datasets stored in individual devices, which presented a security concern, but often we had to request additional laptops beyond the devices we had on hand,” shares Benedict Ho, Senior Deputy Director in the DSA Department at CPFB. Efforts to train and deploy machine learning models were also constrained. 

Building a Unified Data Platform

CPFB decided to modernise its data infrastructure and build a Unified Data Platform (UDP). “We wanted to make data more accessible in a secure manner and improve our users’ competency around the use of it,” explains Ho. “With this shift, we aimed to leverage data more efficiently, which in turn, would enable us to propose better policies and serve CPF members more effectively.”

CPFB already used Microsoft technologies extensively across its workplace, so moving ahead with Microsoft made natural sense. “We chose Azure Databricks because it works well with our existing Azure ecosystem, allowing us to leverage and extend our current investments. This meant our teams could continue using familiar systems while accessing new analytics features,” explains Ho. 

Throughout this transformation, CPFB prioritised data governance and security. “Even though we migrated to the cloud, Azure infrastructure provides native security services, like firewalls, that help us maintain the same security posture we had with our on-premises data warehouse,” shares Ho. 

UDP was built on Singapore’s Government Commercial Cloud 2.0 (GCC 2.0) platform. “Initially, we used Microsoft Purview as our governance and data dictionary tool to establish the rules and frameworks for data security downstream. We were one of the first government agencies to onboard with Azure GCC 2.0,” remarks Ho. “We’re now working closely with the Microsoft and Databricks teams to future-proof the design of UDP, such as enabling Unity Catalog and leveraging the latest generative AI functionalities available in Azure Databricks.”

“Even though we migrated to the cloud, Azure infrastructure provides native security services, like firewalls, that help us maintain the same security posture we had with our on-premises data warehouse.”

Benedict Ho, Senior Deputy Director, Data Science Accelerator (DSA), Central Provident Fund Board

Empowering users through training

CPFB recognised early that change management was critical. Users had to be equipped with the right skills and be familiarised with the new tools so that UDP could be used to its potential. “When we started the project, we made a choice to involve users early, and level up our skills together, even though we were still building UDP,” recalls Jared Koh, Lead Analyst in the DSA Department at CPFB. 

CPFB adopted a ‘train the trainer’ concept, preparing users to pass their knowledge and expertise on to others, who may then become trainers themselves. This format enabled users to take ownership of their data and workflows, while building familiarity and confidence using UDP, during the transition to the new platform.

Over 300 users across different departments in CPFB have been trained on Azure Databricks. This helped to strengthen the culture of data-driven decision making across the organisation. Efforts to build competency extended beyond the initial implementation phase. “Experienced data analysts are partnering with business users on more sophisticated data science projects,” shares Koh. 

Streamlining data processes with automation

Data that comes into UDP from various source systems in CPFB go through an extract, transform, and load (ETL) process. A data readiness dashboard in Azure Databricks informs users when data is ready for consumption. Before, this was communicated over email after performing quality checks. “Now, users can simply open the data readiness dashboard, see the status, and use the data once it’s ready,” says Koh. This self-service capability eliminated manual communications, enabling teams to focus on higher-value tasks.

UDP also made work more efficient for users. “Previously, many tasks were manually performed and time-consuming,” says Koh. “With Azure Databricks, users can automate the generation of reports. Simple reports can be done in a few minutes, and more complex ones can be scheduled to run overnight.”

“With Azure Databricks, users can automate the generation of reports. Simple reports can be done in a few minutes, and more complex ones can be scheduled to run overnight.”

Jared Koh, Lead Analyst, DSA, Central Provident Fund Board

Collaboration among various teams and users has improved with the adoption of Azure Databricks notebooks. “We used to send scripts over email with ‘version 1’, ‘version 2’, ‘version 2 final’, and even ‘version 2 final-final’. All of that is gone now,” remarks Koh. “Having one reliable data source and real-time collaboration tools have helped create better teamwork.” Teams can now work together better on notebooks and dashboards, making updates and fixing issues together. For example, data analysts can make changes, show users what the problems are, and provide a revised code to fix them. 

Developing machine learning models used to involve extracting data, building models on laptops, and then deploying that to servers. This process was tedious, especially when fine-tuning or enhancing models. With Azure Databricks, we can now do everything—develop, train, deploy, and fine-tune models—on one platform. This saves time and ensures a more secure development process.

Unmatched data accessibility

UDP improved data accessibility in a secure manner. With Azure Databricks and Power BI, users no longer need to download data onto their devices. Ho notes, “Users now have access to broader datasets in a secure manner because all analytics workloads are performed within UDP.” 

An example was the development of dashboards. Previously, users had to extract data from various sources, save it to a flat file, and create the dashboard. This can now be done in Azure Databricks and Power BI. “Instead of sending files, we share a link, and users can access and manipulate the dashboard themselves,” Ho adds.

A useful feature for CPFB is the Databricks Assistant, which offers coding assistance. “This incredibly powerful generative AI assistant makes it efficient not just for us to help users but also for them to solve problems themselves,” Koh adds.

Expanding possibilities with new tools

Looking ahead, CPFB plans to further enhance UDP by integrating Azure OpenAI capabilities and migrating to Databricks Unity Catalog. “With advanced data governance tools like Unity Catalog, we can achieve more fine-grained, row-level control of access,” Ho shares. 

Through these advancements, CPFB aims to make data analytics even more efficient and effective. Koh envisions lasting benefits for the millions of people CPFB serves, stating, “The benefits might not be immediately visible, but these improvements will have an impact in the long run.”

Discover more about CPFB on Facebook, Instagram, LinkedIn, TikTok, and YouTube.

Take the next step

Fuel innovation with Microsoft

Talk to an expert about custom solutions

Let us help you create customized solutions and achieve your unique business goals.

Drive results with proven solutions

Achieve more with the products and solutions that helped our customers reach their goals.

Follow Microsoft