Skip to content

Instantly share code, notes, and snippets.

@cigrainger
Created April 28, 2014 21:54
Show Gist options
  • Select an option

  • Save cigrainger/11385170 to your computer and use it in GitHub Desktop.

Select an option

Save cigrainger/11385170 to your computer and use it in GitHub Desktop.
Determining the novelty of patents using topic models
===
The novelty measure builds on work by Kaplan & Vakili (2013), who use topic models to find 'breakthrough technologies'.
The novelty measure ($\lambda$) for each patent $p$ in each time period $y$ is determined by the sum of the novelty score ($\gamma$) in that time period for each topic $t$ over the cutoff score $c$. This is found by a simple algorithm:
1. For each topic-period, find the sum of patents with a topic proportion over the threshold $c$ (where $\beta_{pt}$ is the proportion of topic $t$ in the distribution of topics over patent $p$): $$\theta_{ty}=\sum^{p}_{i=1}x_{i} \text{ where } x_{i} = \begin{cases} 1 & \text{if} & \beta_{pt} \ge c \\ 0 & \text{if} & \beta_{pt} \lt c \end{cases}$$
3. To find the novelty score for each topic-period ($\gamma_{ty}$), find the period of the first period where $\theta_{ty}\ge 1$ ($y_{init}$) and set $\gamma_{ty}$ to 1, find the period of full diffusion ($y_{\text{max}[\theta_{t}]}$) and set $\gamma_{ty}$ to 0, then set each intervening period to one minus the ratio of the cumulative patent count to the cumulative at $y_{\text{max}[\theta_{t}]}$: $$\gamma_{ty} = \begin{cases} 1 & \text{if} & y = y_{init} \\ 1-\frac{\sum^{y}_{i = y_{init}}\theta_{ti}}{\sum^{y_{\text{max}[\theta_{t}]}}_{i = y_{init}}\theta_{ti}} & \text{if} & y_{init} \lt y \lt y_{\text{max}[\theta_{t}]} \\ 0 & \text{if} & y \ge y_{\text{max}[\theta_{t}]} \end{cases}$$
3. Calculate the novelty measure: $$\lambda_{p} = \sum \gamma_{ty} \forall \{t_{p} : \beta_{pt} \ge c \}$$
This provides us with a scalar novelty measure ($\lambda_{p}$) for each patent $p$.
References
---
Kaplan, S. & Vakili, K. 2013. "Studying Breakthrough Innovations Using Topic Modeling: A Test Using Nanotechnology Patents." *Available at SSRN*.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment