Dario Taraborelli dartar

Visualizing the topic and accessibility of scholarly articles cited in Wikipedia

Building on a dataset we previously released of citations with identifiers across all Wikipedia language editions, we explore the distribution of DOIs cited in Wikipedia by topic and accessibility.

Topic

We assign a topic to each publication, by looking at the main topic(s) of the Wikipedia article that cites it. Topics are determined by...

Accessibility

We determine the accessibility of each publication (Open Access vs Closed Access) by looking up the DOI in data provided by Unpaywall.

	#! /usr/bin/env python
	"""
	This module provides classes for querying Google Scholar and parsing
	returned results. It currently only processes the first results
	page. It is not a recursive crawler.
	"""
	# Version: 1.3 -- $Date: 2012-02-01 16:51:16 -0800 (Wed, 01 Feb 2012) $
	#
	# ChangeLog
	# ---------

	SELECT
	LEFT(timestamp,7) AS month,
	SUM(CASE WHEN country NOT IN ("US", "Inv") THEN pageviews END) AS ex_us_total,
	SUM(CASE WHEN is_spider = 0 AND `is_automata` = 0 AND country NOT IN ("US", "Inv") THEN pageviews END) AS ex_us_human,
	SUM(CASE WHEN is_spider = 0 AND `is_automata` = 0 AND country NOT IN ("US", "Inv") AND `access_method` = "Desktop" THEN pageviews END) AS ex_us_human_desktop,
	SUM(CASE WHEN is_spider = 0 AND `is_automata` = 0 AND country NOT IN ("US", "Inv") AND `access_method` = "Mobile web" THEN pageviews END) AS ex_us_human_mobile,
	SUM(CASE WHEN is_spider = 0 AND `is_automata` = 0 AND country NOT IN ("US", "Inv") AND `refering_site` = "Google" THEN pageviews END) AS ex_us_human_google,
	SUM(CASE WHEN is_spider = 0 AND `is_automata` = 0 AND country NOT IN ("US", "Inv") AND `access_method` = "Desktop" AND `refering_site` = "Google" THEN pageviews END) AS ex_us_human_desktop_google,
	SUM(CASE WHEN is_spider = 0 AND `is_automata` = 0 AND country NOT IN ("US", "Inv") AND `access_meth

	#!/usr/bin/python

	import httplib
	import sys

	base_url = "dx.doi.org"
	for line in sys.stdin:
	doi = line.strip()
	url = base_url + doi
	conn = httplib.HTTPConnection(base_url)

	The use of content generated by ChatGPT in Wikipedia, Google search results, and for training a new language model could have negative consequences. In the case of Wikipedia, the use of machine-generated content could lead to the inclusion of false or misleading information on the site, undermining its credibility and reliability. Additionally, the use of machine-generated content could make it more difficult for human editors to verify the accuracy and reliability of the information on the site, potentially leading to a decline in the overall quality of Wikipedia's content.

	Similarly, the use of content generated by ChatGPT in Google search results could lead to the inclusion of false or misleading information in search results, undermining the credibility and reliability of the search engine. Additionally, the use of machine-generated content could make it more difficult for users to verify the accuracy and reliability of the information they find through Google, potentially leading to a decline in the ove