How Useful is Synthetic Data?

5/5 (1)

5/5 (1)

When non-organic (man-made) fabric was introduced into fashion, there were a number of harsh warnings about using polyester and man-made synthetic fibres, including their flammability.

In creating non-organic data sets, should we also be creating warnings on their use and flammability? Let’s look at why synthetic data is used in industries such as Financial Services, Automotive as well as for new product development in Manufacturing.

Synthetic Data Defined

Synthetic data can be defined as data that is artificially developed rather than being generated by actual interactions. It is often created with the help of algorithms and is used for a wide range of activities, including as test data for new products and tools, for model validation, and in AI model training. Synthetic data is a type of data augmentation which involves creating new and representative data.

Why is it used?

The main reasons why synthetic data is used instead of real data are cost, privacy, and testing. Let’s look at more specifics on this:

  • Data privacy. When privacy requirements limit data availability or how it can be used. For example, in Financial Services where restrictions around data usage and customer privacy are particularly limiting, companies are starting to use synthetic data to help them identify and eliminate bias in how they treat customers – without contravening data privacy regulations.
  • Data availability. When the data needed for testing a product does not exist or is not available to the testers. This is often the case for new releases.
  • Data for testing. When training data is needed for machine learning algorithms. However, in many instances, such as in the case of autonomous vehicles, the data is expensive to generate in real life.
  • Training across third parties using cloud. When moving private data to cloud infrastructures involves security and compliance risks. Moving synthetic versions of sensitive data to the cloud can enable organisations to share data sets with third parties for training across cloud infrastructures.
  • Data cost. Producing synthetic data through a generative model is significantly more cost-effective and efficient than collecting real-world data. With synthetic data, it becomes cheaper and faster to produce new data once the generative model is set up.

Why should it cause concern?

If real dataset contains biases, data augmented from it will contain biases, too. So, identification of optimal data augmentation strategy is important.

If the synthetic set doesn’t truly represent the original customer data set, it might contain the wrong buying signals regarding what customers are interested in or are inclined to buy.

Synthetic data also requires some form of output/quality control and internal regulation, specifically in highly regulated industries such as the Financial Services.

Creating incorrect synthetic data also can get a company in hot water with external regulators. For example, if a company created a product that harmed someone or didn’t work as advertised, it could lead to substantial financial penalties and, possibly, closer scrutiny in the future.

Conclusion

Synthetic data allows us to continue developing new and innovative products and solutions when the data necessary to do so wouldn’t otherwise be present or available due to volume, data sensitivity or user privacy challenges. Generating synthetic data comes with the flexibility to adjust its nature and environment as and when required in order to improve the performance of the model to create opportunities to check for outliers and extreme conditions.

0
How is Your Supplier Using Your Data?

5/5 (2)

5/5 (2)

What is happening to the data that you are sharing with your ecosystem of suppliers?

Just before Christmas, a friend recommended reading “Privacy is Power” by Carissa Véliz. But the long list of recommendations that the author provides on what you could and should do is quite disheartening. I feel that I have to shut off a lot of the benefits that I get from using the Internet in order to maintain my privacy.

But then over the past couple of days came a couple of reminders of our exposure – our suppliers will share our data with their suppliers, as well as be prepared to use our resources to their benefit. I am reasonably technical and still find it difficult, so how does a person who just wants to use a digital service cope?

Bunnings’ Data Breach with FlexBooker

First example. Bunnings started using a service called FlexBooker to support their click-and-collect service.

To do this, they share personal information with the company for the service to work correctly. But hackers have stolen data for over three million customers from FlexBooker in a recent data breach.

How many of Bunnings’ customers were aware that their data was being shared with FlexBooker? How many would have cared if they had known?

I have only read the comments from Bunnings included in the Stuff report but I believe the reported reaction lacks the level of concern that this breach warrants. What did Bunnings do to verify FlexBooker’s privacy and security standards before sharing their customers’ data with them? What is going to change now that the vulnerability has been identified?

Neither of these things is clear. It is not clear if Bunnings have advised their customers that they could have been affected. There is no clear message on the Bunnings New Zealand site on the details of the breach.

In “Privacy is Power”, the author makes a strong case for customers to demand protection of their privacy. Organisations that use other companies as part of their services must be as demanding of their suppliers as their own customers would be of them.

Is Crypto Mining part of antivirus?

The second example is a little different. Norton has released crypto mining software as part of their antivirus suite. This crypto mining software uses the spare capacity of your computer to join with a pool of computers that are working to create a new blockchain block. Each time a new block is added, you would earn some cryptocurrency that you could change to a fiat currency, i.e. normal cash.

But I question why a crypto miner is part of an antivirus suite. Norton makes the case that they are a trusted partner, so can deliver a safer mining experience than other options.

Norton have made the use of this software optional, but to me, it does indicate the avarice of companies where they see a potential income opportunity. If they had included the software in their internet security suite, then there may be some logic in adding the capability. But to antivirus?

The Verge did some unscientific measurements on the value to a user of running this software. They found the cost of the electricity used during the operation of Norton’s mining software was about the same as what they earned. So Norton, with their 15% fee, would be the only ones making money.

The challenge remains for most of us. Our software vendors are adding new functionality to our services regularly because it is what we as customers expect. But I rarely check to see what has been changed in a new release as normally you will only see a “bugs squashed, performance improved” messaging. We have no guarantee that they have not implemented some new way of using our information or assets without gaining explicit approval from the user for this new use.

To Norton’s credit, they have made crypto mining optional and do not activate the software without their users’ consent. Others are less likely to be as ethical.

Summary

Both of these examples show how vulnerable customers of companies are to the exposure of their private data and assets. All organisations are increasing their use of different external services as SaaS options become more attractive. Commercial terms are the critical points of negotiation, not customer privacy. What assurance do customers get that their privacy is being maintained as they would expect?

One point that is often overlooked is that many cloud service contracts define the legal jurisdiction as being either the cloud provider’s home jurisdiction or one that is more advantageous for them. So, any intended legal action could be taking place in a foreign jurisdiction with different privacy laws.

Customer service organisations (i.e. pretty much all organisations) need to look after their customers’ data much more effectively. Customers need to demand to know how their rights are being protected, and governments have to put in place appropriate consequences for organisations where breaches occur outside that government’s jurisdiction.

Cybersecurity Insights
0
Cloudification of India’s Banking Industry

4.8/5 (4)

4.8/5 (4)

In this Insight, guest author Anupam Verma talks about the technology-led evolution of the Banking industry in India and offers Cloud Service Providers guidance on how to partner with banks and financial institutions. “It is well understood that the banks that were early adopters of cloud have clearly gained market share during COVID-19. Banks are keen to adopt cloud but need a partnership approach balancing innovation with risk management so that it is ‘not one step forward and two steps back’ for them.”

India has been witnessing a digital revolution. Rapidly rising mobile and internet penetration has created an estimated 1 billion mobile users and more than 600 million internet users. It has been reported that 99% of India’s adult population now has a digital identity in the form of Aadhar and a large proportion of the adult Indians have a bank account.

Indians are adapting to consume multiple services on the smartphone and are demanding the same from their financial services providers. COVID-19 has accelerated this digital trend beyond imagination and is transforming India from a data-poor to a data-rich nation. This data from various alternate sources coupled with traditional sources is the inflection point to the road to financial inclusion. Strong digital infrastructure and digital footprints will create a world of opportunities for incumbent banks, non-banks as well as new-age fintechs.

The Cloud Imperative for Banks

Banks today have an urgent need to stay relevant in the era of digitally savvy customers and rising fintechs. This journey for banks to survive and thrive will put Data Analytics and Cloud at the front and centre of their digital transformation.

A couple of years ago, banks viewed cloud as an outsourcing infrastructure to improve the cost curve. Today, banks are convinced that cloud provides many more advantages (Figure 1).

Why banks adopt cloud

Banks are also increasingly partnering with fintechs for applications such as KYC, UI/UX and customer service. Fintechs are cloud-native and understand that cloud provides exponential innovation, speed to market, scalability, resilience, a better cost curve and security. They understand their business will not exist or reach scale if not for cloud. These bank-fintech partnerships are also making banks understand the cloud imperative.

Traditionally, banks in India have had concerns around data privacy and data sovereignty. There are also risks around migrating legacy systems, which are made of monolithic applications and do not have a service-oriented architecture. As a result, banks are now working on complete re-architecture of the core legacy systems. Banks are creating web services on top of legacy systems, which can talk to the new technologies. New applications being built are cloud ready. In fact, many applications may not connect to the core legacy systems. They are exploring moving customer interfaces, CRM applications and internal workflows to the cloud. Still early days, but banks are using cloud analytics for marketing campaigns, risk modelling and regulatory reporting.

The remote working world is irreversible, and banks also understand that cloud will form the backbone for internal communication, virtual desktops, and virtual collaboration.

Access More Insights Here

Strategy for Cloud Service Providers (CSPs)

It is estimated that India’s public cloud services market is likely to become the largest market in the Asia Pacific behind only China, Australia, and Japan. Ecosystm research shows that 70% of banking organisations in India are looking to increase their cloud spending. Whichever way one looks at it, cloud is likely to remain a large and growing market. The Financial Services industry will be one of the prominent segments and should remain a focus for cloud service providers (CSPs).  

I believe CSPs targeting India’s Banking industry should bucket their strategy under four key themes:

  1. Partnering to Innovate and co-create solutions. CSPs must work with each business within the bank and re-imagine customer journeys and process workflow. This would mean banking domain experts and engineering teams of CSPs working with relevant teams within the bank. For some customer journeys, the teams have to go back to first principles and start from scratch i.e the financial need of the customer and how it is being re-imagined and fulfilled in a digital world.
    CSPs should also continue to engage with all ecosystem partners of banks to co-create cloud-native solutions. These partners could range from fintechs to vendors for HR, Finance, business reporting, regulatory reporting, data providers (which feeds into analytics engine).
    CSPs should partner with banks for experimentation by providing test environments. Some of the themes that are critical for banks right now are CRM, workspace virtualisation and collaboration tools. CSPs could leverage these themes to open the doors. API banking is another area for co-creating solutions. Core systems cannot be ‘lifted & shifted’ to the cloud. That would be the last mile in the digital transformation journey.
  2. Partnering to mitigate ‘fear of the unknown’. As in the case of any key strategic shift, the tone of the executive management is important. A lot of engagement is required with the entire senior management team to build the ‘trust quotient’ of cloud. Understanding the benefits, risks, controls and the concept of ‘shared responsibility’ is important. I am an AWS Certified Cloud Practitioner and I realise how granular the security in the cloud can be (which is the responsibility of the bank and not of the CSP). This knowledge gap can be massive for smaller banks due to the non-availability of talent. If security in the cloud is not managed well, there is an immense risk to the banks.
  3. Partnering for Risk Mitigation. Regulators will expect banks to treat CSPs like any other outsourcing service providers. CSPs should work with banks to create robust cloud governance frameworks for mitigating cloud-related risks such as resiliency, cybersecurity etc. Adequate communication is required to showcase the controls around data privacy (data at rest and transit), data sovereignty, geographic diversity of Availability Zones (to mitigate risks around natural calamities like floods) and Disaster Recovery (DR) site.
  4. Partnering with Regulators. Building regulatory comfort is an equally important factor for the pace and extent of technology adoption in Financial Services. The regulators expect the banks to have a governance framework, detailed policies and operating guidelines covering assessment, contractual consideration, audit, inspection, change management, cybersecurity, exit plan etc. While partnering with regulators on creating the framework is important, it is equally important to demonstrate that banks have the skill sets to run the cloud and manage the risks. Engagement should also be linked to specific use cases which allow banks to effectively compete with fintech’s in the digital world (and expand financial access) and use cases for risk mitigation and fraud management. This would meet the regulator’s dual objective of market development as well as market stability.

Financial Services is a large and growing market for CSPs. Fintechs are cloud-native and certain sectors in the industry (like non-banks and insurance companies) have made progress in cloud adoption. It is well understood that the banks that were early adopters of cloud have clearly gained market share during COVID-19. Banks are keen to adopt cloud but need a partnership approach balancing innovation with risk management so that it is ‘not one step forward and two steps back’ for them.

The views and opinions mentioned in the article are personal.
Anupam Verma is part of the Leadership team at ICICI Bank and his responsibilities have included leading the Bank’s strategy in South East Asia to play a significant role in capturing Investment, NRI remittance, and trade flows between SEA and India.

Cloud Insights
1
Intelligent ‘postcards’ from the Edge: Machine learning model usage

5/5 (2)

5/5 (2)

Organisations have found that it is not always desirable to send data to the cloud due to concerns about latency, connectivity, energy, privacy and security. So why not create learning processes at the Edge? 

What challenges does IoT bring?

Sensors are now generating such an increasing volume of data that it is not practical that all of it be sent to the cloud for processing. From a data privacy perspective, some sensor data is sensitive and sending data and images to the cloud will be subject to privacy and security constraints.

Regardless of the speed of communications, there will always be a demand for more data from more sensors – along with more security checks and higher levels of encryption – causing the potential for communication bottlenecks.

As the network hardware itself consumes power, sending a constant stream of data to the cloud can be taxing for sensor devices. The lag caused by the roundtrip to the cloud can be prohibitive in applications that require real-time response inputs.

Machine learning (ML) at the Edge should be prioritised to leverage that constant flow of data and address the requirement for real-time responses based on that data. This should be aided by both new types of ML algorithms and by visual processing units (VPUs) being added to the network.

By leveraging ML on Edge networks in production facilities, for example, companies can look out for potential warning signs and do scheduled maintenance to avoid any nasty surprises. Remember many sensors are linked intrinsically to public safety concerns such as water processing, supply of gas or oil, and public transportation such as metros or trains.

Ecosystm research shows that deploying IoT has its set of challenges (Figure 1) – many of these challenges can be mitigated by processing data at the Edge.

Challenges of IoT Deployment

Predictive analytics is a fundamental value proposition for IoT, where responding faster to issues or taking action before issues occur, is key to a high return on investment. So, using edge computing for machine learning located within or close to the point of data gathering can in some cases be a more practical or socially beneficial approach. 

In IoT the role of an edge computer is to pre-process data and act before the data is passed on to the main server. This allows a faster, low latency response and minimal traffic between the cloud server processing and the Edge. However, a better understanding of the benefits of edge computing is required if it has to be beneficial for a number of outcomes.

Perception on Edge Analytics in IoT Users
AI Research and Reports

If we can get machine learning happening in the field, at the Edge, then we reduce the time lag and also create an extra trusted layer in unmanned production or automated utilities situations. This can create more trusted environments in terms of possible threats to public services.

What kind of examples of machine learning in the field can we see?

Healthcare

Health systems can improve hospital patient flow through machine learning (ML) at the Edge. ML offers predictive models to assist decision-makers with complex hospital patient flow information based on near real-time data.

For example, an academic medical centre created an ML pipeline that leveraged all its data – patient administration, EHR and clinical and claims data – to create learnings that could predict length of stay, emergency department (ED) arrival models, ED admissions, aggregate discharges, and total bed census. These predictive models proved effective as the medical centre reduced patient wait times and staff overtime and was able to demonstrate improved patient outcomes.  And for a medical centre that use sensors to monitor patients and gather requests for medicine or assistance, Edge processing means keeping private healthcare data in-house rather than sending it off to cloud servers.

Retail

A retail store could use numerous cameras for self-checkout and inventory management and to monitor foot traffic. Such specific interaction details could slow down a network and can be replaced by an on-site Edge server with lower latency and a lower total cost. This is useful for standalone grocery pop-up sites such as in Sweden and Germany.

In Retail, k-nearest neighbours is often used in ML for abnormal activity analysis – this learning algorithm can also be used for visual pattern recognition used as part of retailers’ loss prevention tactics.

Summary

Working with the data locally on the Edge, creates reduced latency, reduced cloud usage and costs, independence from a network connection, more secure data, and increased data privacy.

Cloud and Edge computing that uses machine learning can together provide the best of both worlds: decentralised local storage, processing and reaction, and then uploading to the cloud, enabling additional insights, data backups (redundancy), and remote access.

More Insights to tech Buyer Guidance
1
EU Getting Increasingly Serious about Data Protection

5/5 (1)

5/5 (1)

The Hamburg State Commissioner for Data Protection and Freedom of Information (HmbBfDI) imposed a fine of USD 41.3 million on Swedish MNC, Hennes & Mauritz (H&M) for illegal surveillance of employees in H&M Germany’s service centre at Nuremberg.

The data privacy violations reportedly began in 2014 when the company started collecting employee data including their personal information, holidays, medical records, informal chats and other private details. It was found that the information was unlawfully recorded and stored; and was further opened to managers. The violations were discovered in October 2019 when due to a computing error the data became accessible company-wide for a short span.

Ecosystm Principal Analyst Claus Mortensen says. “This is one of those cases that are so blatant that you cannot really say it is setting a precedent for future cases. All the factors that would constitute a breach of the GDPR are here: it involves several types of data that shouldn’t be collected; poorly managed storage and access control; and to finish it all off, a data leak. So even though the fine is relatively high, H&M should probably be happy that it was not bigger – the GDPR authorises fines of up to 4% of a company’s global annual turnover.”

Mortensen adds, “It should also be said that H&M has handled the aftermath well by accepting full blame and by offering compensation to all affected employees. It is possible that these intentions were considered by the HmbBfDI and prevented an even higher fine.”

The penalty on the Swedish retailer is the highest in Germany linked to the General Data Protection Regulation (GDPR) legislation since it came into effect in 2018 and the second highest throughout the continent. Last year, France’s data protection watchdog fined Google USD 58.7 million for not appropriately disclosing data collection practices to users across its services to personalise advertising.

Talking about the growing significance of fines for data breaches, Ecosystm Principal Advisor Andrew Milroy says, “To be effective, GDPR needs to be enforced consistently across the board and have a significant impact. It is too easy to ‘corner cut’ data protection activities. Some breaches may not have an operational impact. For this reason, the cost of being caught needs to be sufficiently large so that it makes commercial sense to comply.”

According to Milroy, “The sizeable fine meted out to H&M together with the publicity it has generated shows that the regulators are serious about GDPR and enforcing it. Other regulators around the world need to make sure that their jurisdictions don’t become ‘soft touches’ for malicious actors.”

EU Proposing New Data Sharing Rules

We are also seeing the European Union (EU) make moves to regulate digital services and customer data use by technology providers, as part of the European Union Digital Strategy. The EU is drafting new rules under the Digital Services Act to force larger technology providers to share their customer data across the industry, to create an even playing field for smaller providers and SMEs. The aim is to make the data available to all for both commercial use and innovation. This is being driven by the EU’s antitrust arm, aimed to reduce the competitive edge tech giants have over their competition and they may be banned from preferential treatment of their own services on their sites or platforms. The law, which is expected to be formalised later this year, is also expected to prohibit technology providers from pre-installing applications or exclusive services on smartphones, laptops or devices. The measures will support users to move between platforms without losing access to their data.


Click below to get data and insights on our cybersecurity study
Get Started

2