Skip to content

Instantly share code, notes, and snippets.

@jarutis
Last active May 16, 2016 10:13
Show Gist options
  • Save jarutis/35ed2b32cf26821761306a426b9af8c2 to your computer and use it in GitHub Desktop.
Save jarutis/35ed2b32cf26821761306a426b9af8c2 to your computer and use it in GitHub Desktop.

Item recommender

Recommend items to member based on previous clicks/transactions, on-boarding info, third party data (Facebook).

Application areas

Personal content box in editorial

Additional channel in feed

Component of search relevancy calculation

Personalized content emails

Implementation

Member segmentation (optional)

Required for New members (using on-boarding, marketing attribution, third party (Facebook) data), optional for Old members for which we can use transactional data to recommend on individual level.

Item segmentation

Additional hardship here is item uniqueness. One possible way to solve this is to put items into buckets. We have size, price, brand, catalog readily available, but I doubt that those features contain enough information. Stichfix uses item description and comments to extract additional topics from unstructured text (code available, but pretty alpha state https://github.com/cemoody/lda2vec).

Item vectoriation

Instead of doing Item segmentation, we go slightly different route and do item vectorisation. Having vectorised representation of our content we could identify a region of that vector space which is of most interest to our members and simply do item search within that region. Train a neural network to categorise images based on Vinted data (or take allready trained network and fine tune it, which is much easier with most of the same benefits). Throw out classification layer and you have item vectorisation based on images. Text could be used as well. https://github.com/AKSHAYUBHAT/VisualSearchServer

Matching

Collaborative filtering

Plenty of available implementations which we could use as a starting point. For example http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html.

Some open questions:

Do we need real-time, or is batch good enough? https://github.com/brkyvz/streaming-matrix-factorization Do we calculate suggestions for all registered members? (not sure about speed)

Nearest neighbor

Requires some sort of data structure to query n-dimensional space. Lucene has some support in 6.0 version, not sure about ElasticSearch. http://stackoverflow.com/questions/5751114/nearest-neighbors-in-high-dimensional-data

Impact

Broad application area. Increased value per impression should results in sales uplift. Higher sales of niche items.

Summary

High effort, high impact case

Similar items

Find similar style items.

Application areas

“Losers” email

For good quality items we have a lot of buyers coming after them. As our transaction flow (or most of flows) is pretty long, where can be a queue of members trying to get one item. At that moment we know those members have high propensity to buy. We could suggest similar items to all who have a failed transaction

Improve similar items channel on item page

Implementation

As before two approaches are possible.

Item segmentation

This is how similar items currently work I believe. We could expand the data that we use to define the same segment.

Item vectorisation

It seems natural that most beneficial source of information for similar items should be item photos. If we vectorise item photos we could search for similar items based on visual appearance.

Impact

Additional sales channel. Could be used to sell items which have low exposure elsewhere (old items). Sales uplift.

Summary

We already have this, but I think it could be greatly improved. Similar items could be used more extensively to sell items which were not sold though other channels.

Suggest price in the upload form

Application areas

Suggest price during item upload

Suggest to decrease price after some time without sale

Like inverse auction

Implementation

Based on current item distribution

A very simple implementation could be just calculating mean and average per segment (brand/catalog) and suggest price based on corresponding normal distribution.

Based on linear regression

Use Bayesian optimisation to suggest then to decrease price

Impact

Reduce number of incorrectly priced items leading to higher sales.

Real time tips for photos on upload

Application areas

Provide realtime feedback during item photo taking in our apps.

Implementation

Collect tagged images from support

We had this during our experiment with third party (Vidmantas). Sadly we deleted the dataset :(

Train a neural network to classify problems with photos.

For this task we will likely have to fully train the network, pretrained will not likely work due to different problem domain. There are lots of information on this and ready made network architectures, but it is not a very easy task I think. (Bleeding edge network tensorflow implementation https://github.com/tensorflow/models/tree/master/inception)

In the best case do image recognition in the phone

Where have been a few attempts to simplify NN memory and computational power requirements. Worth checking out if considering phone implementation. http://songhan.github.io/SqueezeNet-Deep-Compression/

Impact

Higher seller retention, less frustrations from bad photo bans.

Increased uploads

Visual search

Take a photo with a phone, find items on Vinted https://github.com/AKSHAYUBHAT/VisualSearchServer

Churn prediction

Not sure what to do if we know member is likely to leave. http://spark.apache.org/docs/latest/ml-classification-regression.html#survival-regression

Fraud detection

Subtask of anomaly detection

Anomaly detection in event tracking, infrastructure, metrics, member behavior

Monster from Etsy https://anomaly.io/detect-anomalies-skyline/ Said they where working on a cleaner solution, not sure about status

Forum sentiment alert

Alert early about trending negative/positive topics in the forums. Use word2vec and logistic regression. Did some experiments earlier, easy for feedback, harder for forum. https://github.com/linanqiu/word2vec-sentiments/blob/master/word2vec-sentiment.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment