PowerBI ML – How to build Killer ML with PowerBI

By | AI & ML, Data Visualisation | No Comments

PowerBI ML: Unleashing Machine Learning in Microsoft PowerBI in 5 easy steps

AI and ML are key tools enabling modern businesses to unlock value, drive growth, deliver insights and outcompete the market.  Its unmatched ability to handle massive sets of data and identify patterns is transforming decision making at every level of organisations. Consequently Data and AI strategy is therefore rapidly evolving to explore the ways in which AI can be best utilised to enhance business operations. However, pragmatically harnessing AI for business needs has remained challenging. This is because the solutions offered typically incur significant resource overhead, are hard to understand and may fail to deliver actionable business outcomes. A gap has therefore emerged between BI and AI; a failure to bridge the insights we learn, with the intelligence to improve. The most recent release of Microsoft PowerBI ML features aims to eliminate that gap, by bringing in Artificial Intelligence (AI) and Machine Learning (ML) capabilities into the practical setting of self-service analytics.

PowerBI has established itself to be a vital tool in modern data analytics. The easy to use interface coupled with powerful reporting capabilities has made it the reporting platform of choice in delivering reliable business insights. The recent inclusion of ML & AI capabilities has significantly strengthened the tool, by combining easy interactivity with cutting-edge data analysis.

Overview

PowerBI ML (Machine Learning) is now possible using Dataflows, the simple ETL tool that empowers analysts to prepare data with low-or-no code. Automated Machine Learning (AutoML) is then built off the back of Dataflows, again leveraging the interactive approach of Power BI without compromising on quality of analysis.

5 Easy Steps

  1. In a Workspace hosted by Premium capacity, select ‘+Create’ in the top right corner, and select ‘Dataflows’
  2. Choose the data source you wish to run the model on:
PowerBI ML Choosing Data Source

PowerBI ML Choosing Data Source

  1. After loading the data, the familiar Power Query screen will appear. Perform any data transformations as required, and select save & close:
PowerBI ML Power Query

PowerBI ML Power Query

  1. The dataflow should now appear underneath Dataflows in the workspace. Select the dataflow, then select the brain icon, and select ‘Add a machine learning model’:
PowerBI ML Add Model

PowerBI ML Add Model

  1. Create the model by inputting the relevant information. You will get the option to select the model type and inputs for the model:
PowerBI ML Select Model

PowerBI ML Select Model

After creating the model, you will need to train it. The training process samples your data, and splits it into Training and Testing data:

PowerBI ML Train Model

PowerBI ML Train Model

Once the model is finished training, it will appear under the Machine learning models tab in the Dataflow area of the Workspace, with a timestamp for when it was Last Trained. Following this you can then review the Model Validation report (a report which describes how well the model is likely to perform), by selecting ‘View performance report and apply model’.

Lastly, you can apply the model to the Dataflow by selecting ‘Apply model’ at the top of the validation report. This will then prompt a refresh for the Dataflow to preview the results of your model. Applying the model will create new entities (columns) in the Dataflow you created. Once the Dataflow refresh is completed, you can select the Preview option to view your results. Finally, to build reporting from the model, simply connect Power BI desktop to the Dataflow using the Dataflows connector to begin developing reporting on the results of your machine learning model.

Outcomes

With machine learning now integrated with PowerBI, users can upgrade from reporting on business performance to predicting it. From a business perspective, the addition of ML means that PowerBI reporting has gained an extra dimension. It can easily be incorporated into existing reporting and is capable of dramatically changing decision making. For the PowerBI ML user, no new skills are required, as ML leans heavily on the existing interface and user experience.

Common use cases where machine learning in PowerBI can be readily implemented include:

  • Improving your existing PowerBI CRM reporting by creating a general classification model to identify high and low value customers.
  • Boosting the value of your financial reporting by developing a forecasting model to help predict sales trends and downturns.
  • Enhancing your asset reporting by building a regression model to calculate the probability of asset failure or breakdown.
  • Refining your CRM reporting by constructing a binary prediction model to determine the likelihood of a customer leaving or staying.

If you want to know how machine learning can be implemented in your organisation, please contact us, and ask us about our AI services.

 

 

 

Data & AI Strategy metrics

By | Data & AI | No Comments

Why are Data & AI strategy metrics important? The beauty of “strategies” for some is that a strategy – unlike a tactic – often doesn’t come with any clear success / fail KPI’s. It allows a lot of wriggle room for ambiguous assessments of whether it worked or not. However any self-respecting Data & AI strategy should not allow this. After all, it is designed and executed in the name of improving the use of data and measurable outcomes within an organisation. A good Data & AI strategy should have measures to determine its success.

Data & AI Strategy metrics that matter

Commonly raised metrics are based around uptake and usage (software vendors are particularly fond of these). This seems based on the hope that the apparent usage of tools is inherently a good thing for a company that will somehow lead to – I don’t know – increased synergy?

Dilbert Utilising Synergy

Dilbert Utilising Synergy

Sometimes they are measured around data coverage by the EDW or project completion.  However, if I was to put my CEO hat on, I would want to know the answer to the question “how are all these Data & AI users improving my bottom line?”. After all, if the Data & AI tools are being heavily used, but only to manage the footy tipping competition, then I’m not seeing a great deal of ROI.

The metrics that matter are the Corporate metrics.

A good Data & AI Strategy should be implemented with a core goal of supporting the Corporate strategy, which will have some quantifiable metrics to align to. If not, a good Data & AI strategy isn’t going to help you much as your organisation has other problems to solve first!

In a simple case, imagine a key part of the strategy is to expand into a new region. The Data & AI strategy needs to support that by providing data & tools that supports that goal, enabling the team in the new region to expand – and should be measured against its ability to support the success of the Corporate strategy.

This is why at FTS Data & AI, our first step in defining a Data & AI Strategy for an organisation is to understand the Corporate strategy – and its associated metrics – so we can align your Data & AI strategy to it and create a business case to justify why you need to embark on a Data & AI strategy in the first place. The metrics are the foundation that prove that there is deliverable value to the business. This is why the Corporate Strategy sits at the top of our Strategy Framework:

Data & AI Strategy Framework

Data & AI Strategy Framework

We have extensive experience designing strategies that support your business. Contact us today to speak with one of our experts.

Data Quality: Enter the 4th Dimension

By | Data Platform | No Comments

Data quality is a uniform cause of deep pain in establishing a trusted data platform in Data & AI projects. The more systems that are involved the harder it gets to clear it up, before you even start accounting for how old they are, how up to speed the SME’s are, how poor front end validation was – there’s a host of potential problems. However something tells me that the number of projects where the customer has said that it’s OK if the numbers are wrong is going to remain pretty small.

Scope, Cost, Time – Choose one. But not that one.

Project Management Triangle

Data Quality is a project constraint

Many of you will be familiar with the Project Management Triangle which dictates that you vary two of Scope, Cost or Time to fix the other. The end result being that in the middle, Quality gets affected. For most Data & AI projects I have found cost and time tend to be least negotiable, so scope gets restricted. Yet, somehow Time and Cost get blown out anyway.

Whilst Data & AI is hardly unique in terms of cost and schedule overruns, there is one key driver which is neglected by traditional methods. Leaning once again on Larissa Moss’s Extreme Scoping approach, she calls out the reason. It’s because in a Data & AI project, Quality – specifically Data Quality – is also fixed. The data must be complete and the data must be accurate for it to be usable – and there is no room for negotiation on this. Given that the data effort consumes around 80% of a Data & AI projects budget, this becomes a significant concern.

How do we manage Data Quality as a constraint?

We have to get the business to accept that the traditional levers can’t be pulled in the way they are used to and that requires end user education. The business needs to be made aware that it is a fixed constraint – one that they are imposing, albeit implicitly. The business has to accept that if Quality is not a variable, then the three traditional “pick two to play with” becomes “prepare to vary all of them”.  Larissa Moss refers to this as an  “Information Age Mental Model” which prioritises quality of output above all else.

Here is where strong leadership and clear communication comes into play. Ultimately if one business demands a certain piece of information the Data & AI project team will have to be clear to them that to obtain that piece of data to the quality which is mandated, they must be prepared to bear the costs of doing so, including the cost of bringing it up to a standard that means it is enterprise grade and reusable, so that it integrates with the whole solution for both past and future components of the system. This of course does not mean that an infinite budget is opened up to deal with each data item. Some data may not be worth the cost of acquisition. What it does mean is that the discussion about the costs can be more honest, and the consumer can be more aware of the drivers for the issues that will arise from trying to obtain their data.

ELT Framework in Microsoft Azure

Azure ELT Framework

By | Data Platform | No Comments

The framework shown above is becoming a common pattern for Extract, Load & Transform (ELT) solutions in Microsoft Azure. They key services used in this framework are Azure Data Factory v2 for orchestration, Azure Data Lake Gen2 for storage and Azure Databricks for data transformation. Here are the key benefits each component offers –

  1. Azure Data Factory v2 (ADF) – ADF v2 plays the role of an orchestrator, facilitating data ingestion & movement, while letting other services transform the data. This lets a service like Azure Databricks which is highly proficient at data manipulation own the transformation process while keeping the orchestration process independent. This also makes it easier to swap transformation-specific services in & out depending on requirements.
  2. Azure Data Lake Gen2 (ADLS) – ADLS Gen2 provides a highly-scalable and cost-effective storage platform. Built on blob storage, ADLS offers storage suitable for big data analytics while keeping costs low. ADLS also offers granular controls for enforcing security rules.
  3. Azure Databricks – Databricks is quickly becoming the de facto platform for data engineering & data science in Azure. Leveraging Apache Spark’s capabilities through Dataframe & Dataset APIs and Spark SQL for data interrogation, Spark Streaming for streaming analytics, Spark MLlib for machine learning & GraphX for graph processing, Databricks is truly living up to the promise of a Unified Analytics Platform.

The pattern makes use of Azure Data Lake Gen2 as the final landing layer, however it can be extended with different serving layers such as Azure SQL Data Warehouse if an MPP platform is needed, Azure Cosmos DB if a high-throughput NoSQL database is needed, etc.

ADF, ADLS & Azure Databricks form the core set of services in this modern ELT framework. Investment in their individual capabilities and their integration with the rest of the Azure ecosystem continues to be made. Some examples of new upcoming features include Mapping Data Flows in ADF (currently in private preview) which will let users develop ETL & ELT pipelines using a GUI-based approach and MLflow in Azure Databricks (currently in public preview) which will provide capabilities for machine-learning experiment tracking, model management & operationalisation. This makes the ELT framework sustainable and future-proof for your data platform.

Agile Zero Sprint for Data & AI projects

By | Data & AI | No Comments

Agile methodologies have a patchy track record in Data & AI projects. A lot of this is to do with adopting the methodologies themselves – there are a heap of obstacles in the way that are cultural, process and ability based. I was discussing agile adoption with a client who readily admitted that their last attempt had failed completely. The conversation turned to the concept of the Agile Zero Sprint and he admitted part of the reasons for failure was that they had allowed Zero time for their Agile Zero Sprint.

What is an Agile Zero Sprint?

The reality of any technical project is that there are always certain fundamental decisions and planning processes that need to be gone through before any meaningful work can be done. Data Warehouses are particularly vulnerable to this – you need servers, an agreed design approach, a set of ETL standards – before any valuable work can be done – or at least without incurring so much technical debt that your project gets sunk after the first iteration cleaning up after itself.

So the Agile Zero Sprint is all that groundwork that needs to be done before you get started. It feels “un”-agile as you can easily spend a couple of months producing nothing of any apparent direct value to the business/customer. The business will of course wonder where the productivity nirvana is – and particularly galling is you need your brightest and best on it to make sure you get a solid foundation put in place so it’s not a particularly cheap phase either. You can take a purist view on the content from the Scrum Alliance or a more pragmatic one from Larissa Moss.

How to structure and sell the Zero sprint

The structure part is actually pretty easy. There’s a set of things you need to establish which will form a fairly stable product backlog. Working out how long they will take isn’t that hard either as experienced team members will be able to tell you how long it takes to do pieces like the conceptual architecture. It just needs to be run like a long sprint.

An Agile Zero Sprint prevents clogged pipes

An Agile Zero Sprint prevents clogged pipes

Selling it as part of an Agile project is a bit harder. We try and make this part of the project structure part of the roadmap we lay out in our Data & AI strategy. Because you end up not delivering any business consumable value you need to be very clear about what you will deliver, when you will deliver it and what value it adds to the project. It starts smelling a lot like Waterfall at this point, so if the business is skeptical that anything has changed, you have to manage their expectations well. Be clear that once the initial hump is passed, the value will flow – but if you don’t do it the value will flow earlier to their expectations, but then quickly after the pipes will clog with technical debt (though you may want to use a different terminology!)