Pete Crawford, Head of Data, Analytics & Automation, Sensis
Over the past two years we have witnessed a proliferation of open-source prediction APIs and ‘out-of-the-box’ cognitive technologies packaged via cloud computing platforms, enterprise CRMs or other solution providers. Examples include Google Cloud AI, AWS Comprehend, Azure AI and Salesforce Einstein. Consequently, the adoption and integration of machine intelligence into product development or marketing automation operating models has wider ramifications for data management and data literacy across global industries. Although much has been written on the role of big data in stimulating a shift in the way predictive analytics is viewed, the key drivers for many companies are still centred on existing data platform integration and the explicit inclusion of ‘external’ data sources.
However, the evolution toward smarter information supply chains and a ‘data economy’ (irrespective of concurrent labour force adjustments) is often unclear and the business benefits ill-defined. Several forces are shaping how business strategy adapts to the emergence of cognitive technology platforms.
FORCES SHAPING THE DATA ECONOMY Regulatory Compliance and Explainable AI
On either side of the Atlantic, government bodies are currently exploring adjustments to consumer data protection policies to strengthen the rules and structures which may demand the disclosure of automated decision making. For instance, Article 22 of the European GDPR directly addresses safeguards around profiling – an issue which has received publicity in the US from analysis suggesting inherent biases in prediction software connected to criminal recidivism based on racial grounds. Meanwhile, in the UK a Centre for Data Ethics and Innovation Consultation has been established to monitor the applications of data-driven and AI-based technologies. This leads to the prospect of greater regulatory expectations and auditing around translating machine learning models so that the intent behind the systems; the data sources feeding it; and other input rules are ‘explainable’.
Of equal importance will be trade-offs between personalized predictions and regulation of private data. As data becomes a more tradable commodity, existing data management notions of data cataloguing and data lineage are receiving greater focus.This is one area where machine learning is very much helping classify and organise data assets across multiple platforms in support of data governance, compliance obligations or collaborative data modelling.
Democratization of Data Science
The notion of automating a data science pipeline – from obtaining data to cleaning and normalising data through to model selection, evaluation and result interpretation – is a topic of deep contention.
As data becomes a more tradable commodity existing data management notions of data cataloguing and data lineage are receiving greater focus
Indeed, there is more to data science than combining programming skills with statistics. What is typically missed is an ability to observe and ideate with respect to business problems. There is a real risk of misinterpretation based on the assumption that ‘the data will tell me the answers’ as promises of autonomous analytics are promoted by platform providers or new automated algorithm selection or tuning frameworks such as DataRobot and Auto-sklearn. It is also likely that greater stratification and industry specialisation will occur as data science tools become more ubiquitous. An addition risk occurs with the definition and remuneration of a typical ‘data scientist’. Recent studies show that data scientist salaries have declined, which reflects the somewhat unrealistic financial outcomes expected from application of data science to business problems. At present data science is overtly orientated towards developing engineering solutions of business problems which are often undefined or poorly defined. The risk of over-promising goes hand-in-hand with the risk of undervaluing the human factors in data science.
The most immediate impact in this space is access and ready implementation of data agnostic geo-spatial and temporal analytic visualization tools. One example would be Uber Engineering’s open-sourced Kepler.gl. This heralds that maps are becoming a common layer to rapidly communicate or compile real-time streams of data to customers, partners and suppliers.
Data Marketplaces and Data Protectionism
Data has an inherent combinational value. This means that the ability to provide ‘insights-as-a-service’ APIs based on predictive or prescriptive analytics is contingent on both data quality and compounding data sources. In most cases, businesses only benefit from this ‘compounding factor’ by integrating open or licensed data with their own proprietary sources. In short, cognitive technology and more accurate predictions are powered by broader and deeper data. This exposes a major tension between data acquisition and data protectionism.
One one hand, data marketplaces, or platforms for securely and conveniently buying and licensing data or data models are proliferating. For instance, Microsoft, Adobe and SAP have recently announced an open data initiative together. From a business perspective this can allow the trading of aggregated merchant data about localised consumer purchases, insurance claims or even anomaly detection of financial transactions. However, this is tempered by a trend toward data localization whereby nations (such as China and India) or provinces are regulating cloud data to localise it at the point of origin.
This clash amongst public, government and technology companies will greatly influence how cognitive technology can take advantage of information supply chains.
Natural language processing (NLP) is now well established across various consumer devices with estimations that already over 25 percent of queries on Android devices are voice-based. The new frontier for NLP is analytical insight tools. Reporting tools such as Tableau have recently released a product that use plain language to query complex combinations of data. The implications here are that analytics can become more accessible across the organisation driving broader adoption. Once again, the need for a data literate workforce – especially around framing inputs and interpreting outputs will be paramount.
It would be naïve to assume that cognitive technologies alone will seamlessly offer automated analytical capabilities and business transformation. As ever, CIOs need to balance rapid experimentation and judicious evaluation with a willingness to understand the cultural and political factors which are shaping information supply chains and the data economy. An approach which acknowledges a need for a data-literate workforce and an awareness of greater regulation will best serve the opportunities that machine learning and predictive analytics offer.