This is the collection screen that creates the variables used later for synthetic data generation, clustering, and Sankey aggregation.
Where did you first notice or learn about the hanging or mounting solution?
Which sources did you use before deciding? Select all that apply.
Which one source most influenced your final product choice?
Where did you buy, and how many days passed from first need to purchase?
The table below shows respondent-level rows generated from the questionnaire structure before clustering and Sankey aggregation.
| ID | J1 | J2 count | J3 | J4 | J6 days | Brand | Assigned group |
|---|
We propose using Latent Class Clustering for the final segmentation. It is well suited to questionnaire data because it estimates hidden shopper groups from observed response patterns and gives each respondent a probability of belonging to each class.
J2 is still used, but not as a raw multi-select string. Each respondent keeps one row, and selected touchpoints are converted into behavioral indicators such as total touchpoint count and channel-family counts for search, social or content, marketplace, offline, and brand-owned sources.
Discovery, research, purchase, J2 touchpoint behavior, and decision duration are used to infer journey classes. Brand, mission, trigger, age, and gender are held out of the model and used only after grouping to profile the classes.
The final number of classes should be selected by balancing statistical fit, class size, and interpretability. For the demonstration, four journey classes are used because they produce a clear and reviewable Sankey story.
The Sankey is generated from synthetic respondent-level questionnaire data after clustering. Filter by group to review how each journey type moves from Discovery through Research and Purchase to Brand.
These cards are generated from the same respondent rows as the Sankey. Names are analyst labels applied after reviewing channel patterns, depth, speed, missions, triggers, brands, and demographics.