Last active
August 29, 2015 13:59
-
-
Save alexstorer/10993204 to your computer and use it in GitHub Desktop.
Download the GSB profiles and then look at the intersection of their research terms to get a basic idea of whether they are connected. The scraping is in the first half of the document, and the processing is in the second half.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from selenium import webdriver | |
from selenium.common.exceptions import NoSuchElementException | |
from selenium.webdriver.common.keys import Keys | |
import time, csv | |
def isReady(browser): | |
return browser.execute_script("return document.readyState")=="complete" | |
browser = webdriver.Firefox() # Get local session of firefox | |
browser.get("http://www.gsb.stanford.edu/facultyprofiles") # Load page | |
while not isReady(browser): | |
time.sleep(1) | |
print browser.title | |
faculty = browser.find_elements_by_xpath('//span[@class="views-field views-field-field-person-last-name-1"]//a') | |
ranks = browser.find_elements_by_xpath('//span[@class="views-field views-field-field-official-rank"]') | |
allfaculty = [{'Label': f.text, 'Rank': r.text, 'Link': f.get_attribute('href')} for (f,r) in zip(faculty, ranks)] | |
updatedfaculty = [] | |
for (i,d) in enumerate(allfaculty): | |
browser.get(d['Link']) # Load page | |
while not isReady(browser): | |
time.sleep(1) | |
try: | |
d["Blurb"] = browser.find_elements_by_xpath('//div[@id="profile-summary-callout"]')[0].text | |
except: | |
d["Blurb"] = "" | |
d["ID"] = i | |
allfaculty[i] = d | |
dw = csv.DictWriter(open('gsb_nodes.csv','w'),fieldnames=['ID','Label','Rank','Link','Blurb']) | |
dw.writeheader() | |
for d in allfaculty: | |
for k in d: | |
try: | |
d[k] = d[k].encode('UTF-8') | |
except: | |
k | |
dw.writerow(d) | |
from nltk.corpus import stopwords | |
import nltk, string | |
porter = nltk.PorterStemmer() | |
stopwords = nltk.corpus.stopwords.words('english') | |
stopwords.append('research') | |
stopwords.append('interest') | |
for i,d in enumerate(allfaculty): | |
b = d["Blurb"] | |
tokens = nltk.word_tokenize(b.translate(string.maketrans("",""), string.punctuation).lower()) | |
s = set() | |
for t in set(tokens): | |
if t not in stopwords: | |
s.add(porter.stem(t.lower())) | |
allfaculty[i]["blurbset"] = s | |
dw = csv.DictWriter(open('gsb_edges.csv','w'),fieldnames=['Source','Target','Weight']) | |
dw.writeheader() | |
for i,b in enumerate(allfaculty): | |
for j,bb in enumerate(allfaculty[i+1:]): | |
intersection = (b["blurbset"].intersection(bb["blurbset"])) | |
if len(intersection)>0: | |
print i,j,len(intersection) | |
dw.writerow({'Source': b["ID"], 'Target': bb["ID"], 'Weight': len(intersection)}) | |
print i |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ID | Label | Rank | Link | Blurb | |
---|---|---|---|---|---|
0 | Aaker, Jennifer, | Professor | http://www.gsb.stanford.edu/users/jaaker | A social psychologist and marketer, Jennifer Aaker is the General Atlantic Professor of Marketing at Stanford University’s Graduate School of Business.Her research spans time, money and happiness. She focuses on questions such as: What actually makes people happy, as opposed to what they think makes them happy? How can small acts create infectious action, and how can such effects be fueled by social media? She is widely published in the leading scholarly journals in psychology and marketing, and her work has been featured in a variety of media including The Economist, The New York Times, Wall Street Journal, Washington Post, BusinessWeek, Forbes, CBS Money Watch, NPR, Science, Inc, and Cosmopolitan. Aaker teaches in many of Stanford’s Executive Education programs as well as MBA electives including Designing Happiness, How to Tell a Story, as well as Brands, Experience & Social Technology (BEST). Recipient of the Distinguished Teaching Award, Citibank Best Teacher Award, George Robbins Best Teacher Award and both the Spence and Fletcher Jones Faculty Scholar Awards, she has also taught at UC Berkeley, UCLA and Columbia. Most recently she has co-authored the award winning book, The Dragonfly Effect: Quick Effective Powerful Ways to Harness Social Media for Impact | |
1 | Abbey, Douglas, | Lecturer | http://www.gsb.stanford.edu/users/dabbey | Doug Abbey is actively involved in commercial and residential real estate investment and development throughout the U.S. He is a leader in a number of nonprofit organizations related to affordable housing and land use issues. He works with GSB students to evaluate career opportunities in real estate and to expose them to research and educational opportunities in the field. While the focus of his course is real estate investment, students are introduced to broader issues of how land use decisions are created through a combination of market forces, demographics, and regulation, and how resulting land use patterns impact housing affordability and integration or isolation of households by income level. | |
2 | Abrahams, Matt, | Lecturer | http://www.gsb.stanford.edu/users/mattabra | Matt received his undergraduate degree in psychology from Stanford; his graduate degree in communication studies from UC Davis; and his secondary education teaching credential from SFSU. He is currently a member of the Management Communication Association (where he recently received a "Rising Star" award) as well as the National and Western States Communication Associations. | |
3 | Admati, Anat, | Professor | http://www.gsb.stanford.edu/users/admati | Anat Admati is the George G.C. Parker Professor of Finance and Economics at the Graduate School of Business, Stanford University. She has written extensively on information dissemination in financial markets, trading mechanisms, portfolio management, financial contracting, and, most recently, on corporate governance and banking. Since 2010, she has been active in the policy debate on financial regulation, particularly capital regulation, writing research and policy papers and commentary. She is a coauthor of the book, “The Bankers’ New Clothes: What’s Wrong with Banking and What to Do about It". http://bankersnewclothes.com/ Professor Admati received her BS from the Hebrew University in Jerusalem and her MA, MPhil and PhD from Yale University. She is the recipient of a Sloan Research Fellowship, a Batterymarch Fellowship, and multiple research grants. She is a fellow of the Econometric Society, and has served as a board member of the American Finance Association and on multiple editorial boards. She also serves on the FDIC Systemic Resolution Advisory Committee. | |
4 | Allen, Dick, | Lecturer | http://www.gsb.stanford.edu/users/dpallen | I teach "Managing Growing Enterprises", a course offered to students who aspire to manage their own business. The class uses a case discussion format, and the “star” of each case is present to offer feedback and insights. My objective is to get students to think about business situations they will face. The issues are not strategic; they are, instead, the common day-to-day problems that occupy much of a manager’s time. We role-play with some frequency, using this technique to put students in the CEO's shoes. I enjoy contact with the students, and I encourage them to meet with me. This enables me to get to know students individually and allows them the opportunity to discuss issues of interest to them. | |
5 | Arrillaga, Laura, | Lecturer | http://www.gsb.stanford.edu/users/lauraa | Laura Arrillaga-Andreessen is the Founder, Chairman Emeritus and former Chairman (1998-2008) of SV2 (Silicon Valley Social Venture Fund) and the Founder and Chairman of Stanford Center on Philanthropy and Civil Society. Her past research includes both institutional and individual philanthropy, corporate and venture philanthropy, global social investing and cross-sector collaboration, giving circles and community foundations. She is currently focusing on the intersection of technology, innovation and philanthropy. | |
6 | Athey, Susan, | Professor | http://www.gsb.stanford.edu/users/athey | Susan Athey’s research is in the areas of industrial organization, microeconomic theory, and applied econometrics. Her current research focuses on the design of auction-based marketplaces and the economics of the internet, primarily on online advertising and the economics of the news media. She has also studied dynamic mechanisms and games with incomplete information, comparative statics under uncertainty, and econometric methods for analyzing auction models. | |
7 | Aubry, Rick, | Lecturer | http://www.gsb.stanford.edu/users/raubry | Rick Aubry has led one of America’s leading social enterprises, Rubicon Programs, for over 20 years. His work at the GSB focuses on social entrepreneurship to effect positive social change throughout the world. His class focuses on social entrepreneurs creating change in the most challenging communities internationally and the U.S. Social entrepreneurs as guest lecturers are the core classroom experience. Students work directly, in class and on projects, with the world's leading practitioners, learning from these amazing world changers, inspiring unexpected action and thinking amongst students who might never have imagined they could be a part of changing the world. | |
8 | Bannick, Mathew, | Lecturer | http://www.gsb.stanford.edu/users/mbannick | Mr. Matthew J. Bannick is a Managing Partner and Director at Omidyar Network. Mr. Bannick leads all aspects of Omidyar's operations and strategy and works closely with the co-founders and Board of Directors to ensure that Omidyar Network achieves its long-term mission and strategic objectives. He has been the President at eBay International of Ebay Inc. since December 2004. Mr. Bannick has a wide range of executive, international, and multi-sector experience to his leadership at Omidyar Network. He was a member of executive staff and served in a series of senior executive roles at eBay from 1999 to 2007. Among Mr. Bannick’s most significant accomplishments was leading eBay's early international expansion efforts. Mr. Bannick served as eBay Inc.'s General Manager of Global Online Payments. He also served as the Senior Vice President there from October 2002 until December 2004. From December 2000 to October 2002, Mr. Bannick served as eBay's Senior Vice President and the General Manager at eBay International. He also enhanced eBay's position in Europe through the acquisition of Paris-based iBazar and made additional acquisitions and investments in South Korea, China, Australia, and Latin America, laying the foundation for its international success. After eBay acquired PayPal in 2002, Mr. Bannick served as its first post-acquisition General Manager and President. He was also the Chief Executive Officer there. Under Mr. Bannick’s leadership, PayPal's revenue more than tripled in its first two years with eBay. | |
9 | Barnett, William, | Professor | http://www.gsb.stanford.edu/users/fbarnett | William Barnett studies competition among organizations and how organizations and industries evolve globally. He is conducting a large-scale project that seeks to explain why and how some firms grow rapidly in globalizing markets. His prior research includes studies of how strategic differences and strategic change among organizations affect their growth, performance, and survival. This research includes empirical studies of technical, regulatory, and ideological changes among organizations, and how these changes affect competitiveness over time and across markets. His studies span a range of industries and contexts, including organizations in computers, telecommunications, research and development, software, semiconductors, disk drives, newspaper publishing, beer brewing, banking, and concerning the environment. He is best known for his work on "Red Queen competition", where firms learn from competition and so become stronger competitors over time. Follow Professor Barnett on Twitter @BarnettTalks. | |
10 | Baron, David, | Professor Emeritus | http://www.gsb.stanford.edu/users/dbaron | David Baron has published in the fields of industrial organization, economic theory, political science, business strategy, operations research, statistics, and finance. He has authored over 100 articles and 3 books, one of which is in its 5th edition. His principal research interests have been the theory of the firm, the economics of regulation, mechanism design and its applications, political economics, and nonmarket strategy. His current research focuses on political economics and strategy in the business environment. | |
11 | Barth, Mary, | Professor | http://www.gsb.stanford.edu/users/fbarth | Professor Mary Barth’s research focuses on financial accounting and reporting issues, particularly topics of interest to accounting standard setters. Such topics include using fair values in financial reporting, stock-based compensation, recognition versus disclosure, asset securitizations, asset revaluations, the information roles of accruals and cash flows, the relation between financial statement quality and cost of capital, and issues related to global financial reporting and convergence. | |
12 | Bayati, Mohsen, | Assistant Professor | http://www.gsb.stanford.edu/users/bayati | Mohsen Bayati has two main research interests: machine learning and statistical models for large data, and their applications in healthcare. In particular, he designs methods based on graphical models, probability theory, and statistical physics, and applies them in data-driven healthcare (predictive models, optimization, and decisions). | |
13 | Beaver, William, | Professor Emeritus | http://www.gsb.stanford.edu/users/fbeaver |