View full paper hereAbstract
Consumer protection and data privacy law and regulation face important challenges from big data and machine learning techniques, particularly where these are used for making decisions about services provided to consumers.
Complying with requirements to notify the consumer as to the purpose of data collection is difficult where, as in machine learning, the purpose may not be known at time of notification. Consent is difficult to obtain when the complexity of big data and machine learning systems is beyond the consumer’s comprehension. The notion of data minimization (collecting and storing only data necessary for the purpose for which it was collected, storing it for the minimum period of time) runs counter to the modus operandi of the industry, which emphasizes maximizing the volumes of data collection over time.
The successful functioning of machine learning models and the accuracy of their outputs depends on the quality of the input data. Data protection and privacy laws increasingly impose legal responsibility on firms to ensure the accuracy of the data they hold and process. However, they do not legislate for accuracy of output from big data and machine learning systems. This raises questions about the regulatory responsibilities of those handling big data, concerning both the accuracy of input data in automated decisions and the data reported in formal credit data reporting systems. In some jurisdictions, this has given rise, among other remedies, to certain rights to object to automated decisions.
Inferences from input data generated by machine learning models determine how individuals are viewed and evaluated for automated decisions. Data protection and privacy laws may be insufficient to deal with the outputs of machine learning models that process such data. One of their concerns is to prevent discrimination, typically protecting special categories of groups (e.g., race, ethnicity, religion, gender). In the era of big data, however, non-sensitive data can be used to infer sensitive data.
Machine learning may lead to discriminatory results where the algorithms’ training relies on historical examples that reflect past discrimination, or the model fails to consider a wide enough set of factors. Addressing bias is challenging, but tests have been developed to assess where it may arise. In some countries, where bias is unintentional, it may nevertheless be unlawful if it has “disparate impact,” which arises where the outcomes from a selection process are widely different for a protected class of persons.
The challenges arising for the treatment of big data and machine learning under legal and regulatory frameworks for data protection and privacy suggest that the development of robust self-regulatory and ethical regimes in the artificial intelligence and financial services community may be particularly important. Facing legal and regulatory uncertainty, businesses may introduce risk management systems, employ privacy by design and develop ethics.
Further exploration and development is needed in relation to standards and procedures, including acceptable inferential analytics, reliability of inferences, ethical standards for artificial intelligence, provision of post-decision counterfactuals, documentation of written policies, privacy principles for design, explanations of automated decisions, access to human intervention, and other accountability mechanisms.