Machine learning models have found their utility in understanding consumer behavour and enhancing customer experience. Most of these decisions require the interpretation of causal relationships from data and the thinking of the counterfactuals. Relying on classc machine learning that is based on correlation pattern recognition is not going to get us far with robust predictions and relaible decision-making. Causal perspective is necessary as a further step in elevating machine intellligence to capture underlying relationships and incorporate recurrent feedback for robust inference as it benefits domain adaptation. For example, what would happen if something in the observation process changes?
The science of causal inference has however, not been considered formally as a subject that could be quantified or studied until recently, through the contributions of Judea Pearl. In causal machine learning, there is a 4-step process highlighted by Judea Pearl in his seminal paper which surveys the development of mathemetical tools for inferring answers to causal queries such as the effect of interventions as well as direct and indirect effects. To formulate a causal problem, it goes through the following stages of
In the estimation stage, the following techniques can be employed to estimate the values that account for causaliy:
Graphical Models and Causality There is a common realization that all causal effects are identifiable when a causal model is Markovian, that is the graph is acyclic and the error terms are jointly independent. In fact, Microsoft built a library or framework, DoWhy to facilitate a principled waay of modelling effcient and robust casual models for critical domains.
Machine learning algorithms lean on the assumption of independent and identically distributed data, abbreviated as i.i.d that has its roots in statistical learning theory framework. This means that if i were to split my dataset into training and test then i am assuming the data distribution is not different and there are no dependencies. It does not mean that random variables need to have same or similar probability. Statistical analysis is typified by regression, estimation and hypothesis techniques of a given distribution. We then infer association among variables and estimate the probabilities of past and future events. Causal analysis however goes a step further to examine and infer the probabilities under changing conditions, that is changes induced by treatments and external interventions. The core idea governing causality can be understood via the following terms :
Change in X implies a change in Y, not vice-versa. It is directional NOT bi-directional
X → Y
Factor X is referred to as the treatment factor and Y is the outcome we are interested in. Lets suppose that D is binary treatment taking values of 0 or 1, then the average treatment effect is :
The fact that one can only observe Y = 0 or Y =1 as potential outcomes in the real world makes establishing causality hard. Lets take an example an assume there is a question in the marketing domain to understand if user generated reviews cause product sales? Should we conclude that with positive reviews and ratings increasing, consumers will opt for a purchase? Would that be sensible? Not really. Imagine if there are other features unknown to a marketing campaign, such as unobserved product quality that may be pushing sales and reviews. Or it could right timing and improved macroeconomic conditions. Correlation will only tell us how strongly these events are related but it cant tell the source of causality. We need to derive causal inferences by ruling out alternative hypotheses.
The path to explainable and generalizable AI is meant to be at the level of functional understanding of the model rather than a low level algorithmic understanding of it.
Causal reasoning is interpreted to have a stronger effect on human thinking than non-causal information.