jaimeiniesta · February 4, 2012 10:47
diff --git a/01_w3clove_restful_api_draft.txt b/01_w3clove_restful_api_draft.txt
 This is a draft of the upcoming W3Clove RESTful API.

 By now it will allow to submit a sitemap or webpage URL for validation, see the results and ask for re-checking later.

 It doesn't yet allow user authentication, so you can't manage your list of sitemaps as it can be done on the web site.

 URI params will be passed URLencoded; here they appear whithout encoding for legibility purposes.

 Single entry point
 ==================

 Shows the entry points for sitemap and webpage submissions:

 POST /api/sitemaps
 POST /api/web_pages

 Sitemap submission
 ==================

 # POST /api/sitemaps
 # params:
  * url=http://example.com/sitemap.xml

 Creates the sitemap and returns the URL where you can get the resource

 Sitemap data
 ============

 # GET /api/sitemaps?url=http://example.com/sitemap.xml
 Returns the sitemap data

 Sitemap rescraping
 ==================

 # POST /api/sitemaps
 # params:
  * url=http://example.com/sitemap.xml
  * reprocess=true

 Asks for re-scraping of the sitemap, resetting its state so it will be re-scraped

 Webpage submission
 ==================

 # POST /api/web_pages
 # params:
  * url=http://example.com

 Creates the webpage and returns the URL where you can get the resource

 Webpage data
 ============

 # GET /api/web_pages?url=http://example.com
 Returns the web_pages data

 Webpage revalidation
 ====================

 # POST /api/web_pages
 # params:
  * url=http://example.com
  * reprocess=true

 Asks for a re-validation of the webpage, resetting its state so it will be re-scraped
diff --git a/02_sitemap_data.txt b/02_sitemap_data.txt
 A sitemap can show this information:

 * url, text, like: "http://example.com/sitemap.xml"
 * status, string, can be one of:
  - scraping            # sitemap has been created on database and is on the scraping queue
  - scraping_failed     # scraping could not be completed
  - validating          # some webpages of this sitemap are pending validation
  - validated_partially # sitemap validation has finished but some of its webpages could not be validated
  - validated           # sitemap validation has finished and all its webpages could be validated
 * web_pages_count, integer
 * web_pages, array of its scraped web_pages with basic info and links to them
 * web_pages_pending_validation_count, integer
 * validation_errors_count, integer, sum of all validation errors of its web_pages
 * validation_warnings_count, integer, sum of all validation warnings of its web_pages
 * validation_errors, array of errors found for all its web_pages. Each entry in the array will contain:
  - message_id, string, identifies the error type
  - text, string, explains the error
  - times, integer, how many times this error is found on the scraped web_pages of this sitemap
 * validation_warnings, array similar to validation_errors but referring to warnings reported in the validation
 * created_at, datetime
 * updated_at, datetime
 * scraped_at, datetime

 There should also be a way to get the web_pages that have each particular error and warning.
diff --git a/03_web_page_data.txt b/03_web_page_data.txt
 A web_page can show this information:

 * url, text, like: "http://example.com"
 * status, string, can be one of:
  - validating
  - validation_failed
  - validated
 * validation_errors_count, integer
 * validation_warnings_count, integer
 * validation_errors, array of errors reported in the validation. Each entry in the array will contain:
  - message_id, string, identifies the error type
  - text, string, explains the error
  - times, integer, how many times this error is found on the web_page
  - lines, array of integers, line numbers where this error is found on the web_page
 * validation_warnings, array similar to validation_errors but referring to warnings reported in the validation
 * created_at, datetime
 * updated_at, datetime
 * validated_at, datetime
	This is a draft of the upcoming W3Clove RESTful API.

	By now it will allow to submit a sitemap or webpage URL for validation, see the results and ask for re-checking later.

	It doesn't yet allow user authentication, so you can't manage your list of sitemaps as it can be done on the web site.

	URI params will be passed URLencoded; here they appear whithout encoding for legibility purposes.

	Single entry point
	==================

	Shows the entry points for sitemap and webpage submissions:

	POST /api/sitemaps
	POST /api/web_pages

	Sitemap submission
	==================

	# POST /api/sitemaps
	# params:
	* url=http://example.com/sitemap.xml

	Creates the sitemap and returns the URL where you can get the resource

	Sitemap data
	============

	# GET /api/sitemaps?url=http://example.com/sitemap.xml
	Returns the sitemap data

	Sitemap rescraping
	==================

	# POST /api/sitemaps
	# params:
	* url=http://example.com/sitemap.xml
	* reprocess=true

	Asks for re-scraping of the sitemap, resetting its state so it will be re-scraped

	Webpage submission
	==================

	# POST /api/web_pages
	# params:
	* url=http://example.com

	Creates the webpage and returns the URL where you can get the resource

	Webpage data
	============

	# GET /api/web_pages?url=http://example.com
	Returns the web_pages data

	Webpage revalidation
	====================

	# POST /api/web_pages
	# params:
	* url=http://example.com
	* reprocess=true

	Asks for a re-validation of the webpage, resetting its state so it will be re-scraped
	A sitemap can show this information:

	* url, text, like: "http://example.com/sitemap.xml"
	* status, string, can be one of:
	- scraping # sitemap has been created on database and is on the scraping queue
	- scraping_failed # scraping could not be completed
	- validating # some webpages of this sitemap are pending validation
	- validated_partially # sitemap validation has finished but some of its webpages could not be validated
	- validated # sitemap validation has finished and all its webpages could be validated
	* web_pages_count, integer
	* web_pages, array of its scraped web_pages with basic info and links to them
	* web_pages_pending_validation_count, integer
	* validation_errors_count, integer, sum of all validation errors of its web_pages
	* validation_warnings_count, integer, sum of all validation warnings of its web_pages
	* validation_errors, array of errors found for all its web_pages. Each entry in the array will contain:
	- message_id, string, identifies the error type
	- text, string, explains the error
	- times, integer, how many times this error is found on the scraped web_pages of this sitemap
	* validation_warnings, array similar to validation_errors but referring to warnings reported in the validation
	* created_at, datetime
	* updated_at, datetime
	* scraped_at, datetime

	There should also be a way to get the web_pages that have each particular error and warning.
	A web_page can show this information:

	* url, text, like: "http://example.com"
	* status, string, can be one of:
	- validating
	- validation_failed
	- validated
	* validation_errors_count, integer
	* validation_warnings_count, integer
	* validation_errors, array of errors reported in the validation. Each entry in the array will contain:
	- message_id, string, identifies the error type
	- text, string, explains the error
	- times, integer, how many times this error is found on the web_page
	- lines, array of integers, line numbers where this error is found on the web_page
	* validation_warnings, array similar to validation_errors but referring to warnings reported in the validation
	* created_at, datetime
	* updated_at, datetime
	* validated_at, datetime