I didn’t get the association results I was looking for. What else can I do?
Machine Learning and especially unsupervised Machine Learning is like a box of chocolates so you never know exactly what you will get. You can employ various strategies to explore a dataset. That means you will need to employ an iterative approach here. You can try out creative ideas and iterate the models as per your needs. Here are some tips:
- Iterate your model by changing parameters. One of the parameters that can yield different results is the search strategy to be used, i.e., the selected measure to prioritize the associations discovered. You can use leverage, lift, coverage, support, and confidence metrics so that rules with higher values for the chosen measure will float up to the top of your result set. Leverage is one of the key measures for getting relevant results in most use cases apart from measures such as confidence and lift. However, you should consider that there is no surefire rule-of-thumb that can be applicable in all cases. You should select the best strategy according to your main purpose based on your domain knowledge and iterate until you arrive at a satisfactory result.
- Set minimum thresholdsfor measures according to your main goals. For example, you can also specify a minimum support so you can get rid of insignificant rules in your dataset. This lets you obtain the rules that are more relevant according to your needs.
- Modify your dataset by applying feature engineering.
- Stratification is key, try to stratify your data as much as possible. For example, market basket/POS (Point of Sale) data from a supermarket chain can be grouped per store and even per season, while medical records from various patients can be arranged based on contributing factors such as age and gender. By segmenting your data, you will also prevent what’s commonly known as the Simpson’s Paradox.
- Remove categories or groups of itemsthat are returning too many obvious rules. That way other more interesting patterns can emerge on top of your results.
- Remove anomalies to further clean your dataset. Although association rules shouldn’t be affected by outliers, cleaning your dataset can always improve the model as a general principle.