graph-hackathon-writeup

graph-hackathon-writeup

Markdown writeup for internal Nordstrom jekyll site.

Hackathon Graph Recommendations

How we built a graph model for product recommendations in two days.

Author: Seth Dimick
The Team: Jim Liu, Aaron Flower, Ola Hungerford, Thomas Fritchman, Fan Tu

The Idea

Using a property graph model to surface relevant content to users is now common practice for many digital experiences, from social media to retail. While no graph technology solution has been implemented at Nordstrom for personalization (as of today), a one-step Markov chain, or transition matrix, was implemented in October 2017 to provide shoppers with a personalized sort of Homepage content on the mobile web experience and yielded significant conversion lift. While the transition matrix implementation proved successful in one instance, the approach is incredibly hard to scale or iterate upon using relational data structures.

To expand upon this proven success our Summer 2018 Nordstrom Hackathon team, Graphathon, built a neo4j graph database with our website clicksteam data. The MVP graph model included product view and purchase data connected by shopper interactions for adult men's shoes. Our demonstrated use case for the graph associates similar shoppers via live clickstream data in order to present personalized product recommendations of various strategies.

The Graph

For our project, we designed a graph that leveraged both the well-documented property graph method and the proven Markov chain method from the Homepage. The result is a schema with three types of nodes and three types of relationships.

Basic Schema

Product nodes (green) represent all adult male shoe Style IDs viewed in the past 30 days. The Product nodes are shared by the property graph of shopper views and purchases and the Markov chain of shoppers' next product view transitions.

The property graph functionality is enabled by relating Product nodes to the ShopperSession nodes (red). ShopperSession nodes represent browsing sessions determined by our existing clickstream logic, and they are related to Product nodes by VIEWED and PURCHASED relationships, or edges.

A Markov chain model was then created by relating Product nodes to each other with an intermediary NextView node (purple). The NextView nodes have shopper and session information and connect to Product nodes with NEXT edges. Counting relevant NextView nodes provides the "probabilities" of a shopper transitioning from one product view to another. (The intermediary NextView nodes are not theoretically necessary, as shopper and session information could be attributed to the NEXT edges, but the nodes were added to the schema to increase query performance when filtering the Markov chain by shopper attributes and counting the number of relationships.)

Schema with Data

The Data

Our graph model for the hackathon was derived from two tables of snowplow clickstream data for product views and order items. To transform these relational tables into a directional graph, the Product Views table is joined to itself, by shopper and session, to create a relationship between each of shoppers' page views and next sequential page view for all adult men's shoes. The data was transformed and saved to s3 with some simple ETL code, and then loaded into neo4j by the Cypher (graph query language) load script.

The data source for our recommendation queries is also notable. If taken to production, graph recommendations will be the first consumer of Personalization's real-time serverless streaming data pipeline that was built to populate the recently viewed products tray. Our graph-based recommendation strategies (cypher queries) are parameterized by this real-time data to provide personalized recommendations based on shoppers' in-session clickstream events leading up to and including the current product page view.

The Strategies

In the short time of the hackathon we developed three recommendation strategies based on the graph model. The first strategy, Previous to Next, is a Markov-centric approach that considers two transitions, or three sequential product views. The second two strategies were designed as competitors for the existing strategies People Also Viewed and People Also Bought.

Previous Product to Next Product

The Previous to Next strategy was designed to leverage information about the shopper's in-session journey in addition to information about the current product view.

In the above graph visualization of the strategy, the Product node (green) in the center represents a current product being viewed. The Product node on the left is the product viewed immediately before the current product. A portion of the NextView nodes (purple) between the previous and current Product nodes represent the shoppers who followed the same transition as our current shopper. The remaining NextView nodes in the graph show us where those same shoppers went for their next product view immediately following the current product view (which includes the possibility of going back to the previous product). These NextView nodes can be traced to all the Product nodes around the perimeter and counted for each associated Style ID, which gives a sorted list of recommended products to view next through the cypher query below.

MATCH (prev_product:Product {styleid:$style2})
MATCH (curr_product:Product {styleid:$style1})
MATCH (prev_product)-->(user_nv1:NextView)-->(curr_product)-->(user_nv2:NextView)-->(rec:Product)
  WHERE user_nv1.shopper_id = user_nv2.shopper_id
RETURN rec.styleid as styleid, count(user_nv2) AS count
  ORDER BY count DESC
  LIMIT 10;

People Also [ Bought | Viewed ]

The People Also Viewed and People Also Bought strategies were designed to compete with existing strategies while utilizing the advantages of the graph model.

Similar to the previous strategy's graph representation, in the above graph the central Product node (green) represents the current product a shopper is viewing. Unlike the previous strategy, we don't consider what previous product this shopper is coming from, but only consider all shoppers who have viewed the same product and return which products they viewed next, represented by the surrounding NextView (purple) and Product nodes. From the products viewed next, we then attain a measurement of popularity to sort our recommended products by counting the number of ShopperSession nodes (red) connected to our Product nodes with a PURCHASED or VIEWED edge. The simple cypher query below accomplishes this logic.

MATCH (curr_product:Product {styleid:$style1})-[:NEXT]->(rec:Product)
MATCH (rec)<-[:PURCHASED]-(sessions:ShopperSession)
RETURN rec.styleid, count(sessions) as buyers
  ORDER BY buyers DESC
  LIMIT 10;

The Implementation

Architecture

During the hackathon we were able to implement our new recommendations strategies utilizing the existing development environments for Recbot (Personalization's recommendation service) and Product Page. After populating the graph, we added added functionality to Recbot that calls the neo4j database with our three strategy queries. The recommendations are triggered by the existing call from Product Page to Recbot with the Shopper ID and the current product's Style ID. Recbot uses the Shopper ID to query it's streaming data for the shopper's recently viewed products' Style IDs, and uses the recently viewed Style IDs along with the current Style ID from the Product Page as parameters for the queries. Recbot then returns the recommended styles from the neo4j response along with offer service data in the same format as it's existing recommendation strategies to the Product Page for rendering.

Development Experience

The new recommendations were surfaced in the Product Page development environment through the tried and true recommendation treatments. Below you can see a rendered adult male shoe page with the new Previous to Next recommendation strategy on the right-hand side of the page.

Outcomes

While our project did not manage to take the podium in the hackathon judging, we managed to build a near production ready, customer facing product on an entirely new data structure in two days. Through the process we found that working with graph provides incredible agility to iterate on both the query strategies and the graph schema itself and that graph models can provide incredibly personalized experiences for shoppers which can optimized for various outcomes. I was glad to hear Ola and Personalization plan to build off our work to test graph strategies in production post-anniversary.

Working with graph visualizations and the cypher query language also sparked countless ideas on how to leverage our existing data while working on our hack, and I look forward to utilizing graph technology across many use cases with many data sets in the future.