rossmounce · August 29, 2015 14:12
diff --git a/reply1.txt b/reply1.txt
 Thanks for your feedback Rod. I really value it.

 I don't pretend to have all the answers. All of the academic content discovery 
 services are fairly murky about how they actually index things, 
 as I'm sure you know (Google Scholar perhaps being the most open-ish about how it does things?).

 > how comparable are PLoS and Zootaxa from the perspective of search engines?

 I am not a search engine. I am a human researcher. Whether a paper is 
 published in Nature, Science, PLOS ONE or Zootaxa, it is the same to me - 
 this is a logical and defensible position. I get what you're asking but as 
 I've never had a job at a search engine I'm afraid I don't have much insight 
 there.

 > you used a complete set of Zootaxa PDFs obtained from the NHM? 

 yes, that information is in the paper. Metadata about those PDFs is in the 
 supplementary materials on figshare. As you know I cannot easily 'prove' I 
 had the full set of PDFs because copyright restrictions do not enable me to 
 repost the entire dataset, publicly online. This would infringe the copyright 
 of Magnolia Press. I can however repost the entire set of PLOS ONE articles 
 analysed as they were all published under CC BY or CC0.

 > articles that are both open access and behind a paywall?

 Yes. This is acknowledged in the paper. Regardless of whether a paper is open 
 access to the general public, it could still be privately indexed by content 
 search providers & that private full-text indexing made available during 
 search. Discoverability is not access. Paywalls can be made semi-permeable, 
 allowing known IP addresses through e.g. Google Scholar's indexing crawlers 
 and bots, whilst denying access to non-subscribers at other IP addresses.

 > Perhaps a better question is how the open access subset of Zootaxa compares to PLoS?

 I'm sorry if I didn't make the hypothesis I was testing clearer. I want to 
 test the discoverability of articles (regardless of OA or not). Yes, it does 
 seem reasonable to pre-suppose that open access articles might be advantaged, 
 but until we prove that with data I can't just make that assumption. If you 
 know of any other research that demonstrates superiority of discoverability 
 of OA research (not citation, views, downloads) then please let me know, I 
 should cite it in this paper.

 > confounding different media (PDF versus HTML) with different degrees of access?

 I agree. This could certainly be one of the causitive mechanisms of the 
 observed low recall of Zootaxa in Google Scholar. The point is, the observed 
 effect (poor discoverability in Google Scholar) is real regardless of the 
 cause [You're welcome to dispute the data given in the tables, but since I 
 did the searches only a few days ago I doubt the results have changed]. If 
 the cause is that Zootaxa does not provide HTML, then the obvious solution is 
 that Zootaxa should provide HTML full-text. Or just accept low 
 discoverability in Google Scholar :S

 > Did you talk to Zhi-Qiang Zhang (editor of Zootaxa)?

 Yes. I emailed him this morning.

 I'm very pleased Magnolia Press have recently adopted DOIs, are moving the to 
 OJS platform, and have adopted the CC BY licence for hybrid open access 
 articles. These are all good moves towards better publishing. Given the 
 results here, perhaps they should also look at providing full text HTML or 
 XML, to continue their progress. They are an extremely important publisher of 
 taxonomy.

 > You are making various statements about how you think search engines access 
 content, it would be interesting to actually know.

 I agree, and also feel uncomfortable about the lack of evidence but services 
 like Scopus, WoK, MAS, MS *are* untransparent, proprietary, opaque systems. I 
 can't really change that. I certainly see that as a problem. Academia sorely 
 needs an open, transparent system of indexing peer-reviewed published content.

 > ...there is a world of difference...

 Yes. I agree there is vast difference in funding between fields. I'm not 
 entirely sure that difference prevents Magnolia Press from publishing full 
 text HTML on their OJS platform. Other, similar "shoe-string" (your words not 
 mine!)  operations also produce full text HTML on OJS, albeit not quite at 
 the scale of Zootaxa & Phytotaxa. But surely this research could be used as 
 evidence to ask for more funding? Here is objective evidence showing that 
 more money is needed to do more useful taxonomic publishing to maximize 
 return on investment. (?)

 Prior to this research I was not aware of anything (aside from cited papers 
 on OA citation, downloads, views advantage) that proves with real data that 
 publishing in PLOS ONE provides excellent discoverability of research (in 
 Google Scholar), substantially better than at other journals. That's why I've 
 published this. I think people need to know about this. I think it's 
 important. Incidentally this paper doesn't directly test whether 
 discoverability has anything to do with OA. That needs follow-up work to 
 demonstrate. 

 This is merely a first-pass demonstration that born-digital journal content 
 can have substantially different discoverability in academic search engines, 
 depending on where it's published (Making a conscious effort here not to 
 overstate what I've done).
	Thanks for your feedback Rod. I really value it.

	I don't pretend to have all the answers. All of the academic content discovery
	services are fairly murky about how they actually index things,
	as I'm sure you know (Google Scholar perhaps being the most open-ish about how it does things?).

	> how comparable are PLoS and Zootaxa from the perspective of search engines?

	I am not a search engine. I am a human researcher. Whether a paper is
	published in Nature, Science, PLOS ONE or Zootaxa, it is the same to me -
	this is a logical and defensible position. I get what you're asking but as
	I've never had a job at a search engine I'm afraid I don't have much insight
	there.

	> you used a complete set of Zootaxa PDFs obtained from the NHM?

	yes, that information is in the paper. Metadata about those PDFs is in the
	supplementary materials on figshare. As you know I cannot easily 'prove' I
	had the full set of PDFs because copyright restrictions do not enable me to
	repost the entire dataset, publicly online. This would infringe the copyright
	of Magnolia Press. I can however repost the entire set of PLOS ONE articles
	analysed as they were all published under CC BY or CC0.

	> articles that are both open access and behind a paywall?

	Yes. This is acknowledged in the paper. Regardless of whether a paper is open
	access to the general public, it could still be privately indexed by content
	search providers & that private full-text indexing made available during
	search. Discoverability is not access. Paywalls can be made semi-permeable,
	allowing known IP addresses through e.g. Google Scholar's indexing crawlers
	and bots, whilst denying access to non-subscribers at other IP addresses.

	> Perhaps a better question is how the open access subset of Zootaxa compares to PLoS?

	I'm sorry if I didn't make the hypothesis I was testing clearer. I want to
	test the discoverability of articles (regardless of OA or not). Yes, it does
	seem reasonable to pre-suppose that open access articles might be advantaged,
	but until we prove that with data I can't just make that assumption. If you
	know of any other research that demonstrates superiority of discoverability
	of OA research (not citation, views, downloads) then please let me know, I
	should cite it in this paper.

	> confounding different media (PDF versus HTML) with different degrees of access?

	I agree. This could certainly be one of the causitive mechanisms of the
	observed low recall of Zootaxa in Google Scholar. The point is, the observed
	effect (poor discoverability in Google Scholar) is real regardless of the
	cause [You're welcome to dispute the data given in the tables, but since I
	did the searches only a few days ago I doubt the results have changed]. If
	the cause is that Zootaxa does not provide HTML, then the obvious solution is
	that Zootaxa should provide HTML full-text. Or just accept low
	discoverability in Google Scholar :S

	> Did you talk to Zhi-Qiang Zhang (editor of Zootaxa)?

	Yes. I emailed him this morning.

	I'm very pleased Magnolia Press have recently adopted DOIs, are moving the to
	OJS platform, and have adopted the CC BY licence for hybrid open access
	articles. These are all good moves towards better publishing. Given the
	results here, perhaps they should also look at providing full text HTML or
	XML, to continue their progress. They are an extremely important publisher of
	taxonomy.

	> You are making various statements about how you think search engines access
	content, it would be interesting to actually know.

	I agree, and also feel uncomfortable about the lack of evidence but services
	like Scopus, WoK, MAS, MS are untransparent, proprietary, opaque systems. I
	can't really change that. I certainly see that as a problem. Academia sorely
	needs an open, transparent system of indexing peer-reviewed published content.

	> ...there is a world of difference...

	Yes. I agree there is vast difference in funding between fields. I'm not
	entirely sure that difference prevents Magnolia Press from publishing full
	text HTML on their OJS platform. Other, similar "shoe-string" (your words not
	mine!) operations also produce full text HTML on OJS, albeit not quite at
	the scale of Zootaxa & Phytotaxa. But surely this research could be used as
	evidence to ask for more funding? Here is objective evidence showing that
	more money is needed to do more useful taxonomic publishing to maximize
	return on investment. (?)

	Prior to this research I was not aware of anything (aside from cited papers
	on OA citation, downloads, views advantage) that proves with real data that
	publishing in PLOS ONE provides excellent discoverability of research (in
	Google Scholar), substantially better than at other journals. That's why I've
	published this. I think people need to know about this. I think it's
	important. Incidentally this paper doesn't directly test whether
	discoverability has anything to do with OA. That needs follow-up work to
	demonstrate.

	This is merely a first-pass demonstration that born-digital journal content
	can have substantially different discoverability in academic search engines,
	depending on where it's published (Making a conscious effort here not to
	overstate what I've done).