Created
February 4, 2012 10:47
-
-
Save jaimeiniesta/1737044 to your computer and use it in GitHub Desktop.
W3Clove RESTful API draft
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This is a draft of the upcoming W3Clove RESTful API. | |
| By now it will allow to submit a sitemap or webpage URL for validation, see the results and ask for re-checking later. | |
| It doesn't yet allow user authentication, so you can't manage your list of sitemaps as it can be done on the web site. | |
| URI params will be passed URLencoded; here they appear whithout encoding for legibility purposes. | |
| Single entry point | |
| ================== | |
| Shows the entry points for sitemap and webpage submissions: | |
| POST /api/sitemaps | |
| POST /api/web_pages | |
| Sitemap submission | |
| ================== | |
| # POST /api/sitemaps | |
| # params: | |
| * url=http://example.com/sitemap.xml | |
| Creates the sitemap and returns the URL where you can get the resource | |
| Sitemap data | |
| ============ | |
| # GET /api/sitemaps?url=http://example.com/sitemap.xml | |
| Returns the sitemap data | |
| Sitemap rescraping | |
| ================== | |
| # POST /api/sitemaps | |
| # params: | |
| * url=http://example.com/sitemap.xml | |
| * reprocess=true | |
| Asks for re-scraping of the sitemap, resetting its state so it will be re-scraped | |
| Webpage submission | |
| ================== | |
| # POST /api/web_pages | |
| # params: | |
| * url=http://example.com | |
| Creates the webpage and returns the URL where you can get the resource | |
| Webpage data | |
| ============ | |
| # GET /api/web_pages?url=http://example.com | |
| Returns the web_pages data | |
| Webpage revalidation | |
| ==================== | |
| # POST /api/web_pages | |
| # params: | |
| * url=http://example.com | |
| * reprocess=true | |
| Asks for a re-validation of the webpage, resetting its state so it will be re-scraped |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| A sitemap can show this information: | |
| * url, text, like: "http://example.com/sitemap.xml" | |
| * status, string, can be one of: | |
| - scraping # sitemap has been created on database and is on the scraping queue | |
| - scraping_failed # scraping could not be completed | |
| - validating # some webpages of this sitemap are pending validation | |
| - validated_partially # sitemap validation has finished but some of its webpages could not be validated | |
| - validated # sitemap validation has finished and all its webpages could be validated | |
| * web_pages_count, integer | |
| * web_pages, array of its scraped web_pages with basic info and links to them | |
| * web_pages_pending_validation_count, integer | |
| * validation_errors_count, integer, sum of all validation errors of its web_pages | |
| * validation_warnings_count, integer, sum of all validation warnings of its web_pages | |
| * validation_errors, array of errors found for all its web_pages. Each entry in the array will contain: | |
| - message_id, string, identifies the error type | |
| - text, string, explains the error | |
| - times, integer, how many times this error is found on the scraped web_pages of this sitemap | |
| * validation_warnings, array similar to validation_errors but referring to warnings reported in the validation | |
| * created_at, datetime | |
| * updated_at, datetime | |
| * scraped_at, datetime | |
| There should also be a way to get the web_pages that have each particular error and warning. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| A web_page can show this information: | |
| * url, text, like: "http://example.com" | |
| * status, string, can be one of: | |
| - validating | |
| - validation_failed | |
| - validated | |
| * validation_errors_count, integer | |
| * validation_warnings_count, integer | |
| * validation_errors, array of errors reported in the validation. Each entry in the array will contain: | |
| - message_id, string, identifies the error type | |
| - text, string, explains the error | |
| - times, integer, how many times this error is found on the web_page | |
| - lines, array of integers, line numbers where this error is found on the web_page | |
| * validation_warnings, array similar to validation_errors but referring to warnings reported in the validation | |
| * created_at, datetime | |
| * updated_at, datetime | |
| * validated_at, datetime |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment