First, you must obtain a Session ID. This can be done by logging in to the site with email and password. The Session ID is contained in the session_id
field of the response. The session_id
field is contained in the result
object.
Set the FORUM_DOMAIN
variable to the domain of the forum you wish to scrape.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
"email": "[email protected]",
"password": "XXX"
},
"method":"User.login"
}
The Session ID is contained in the session_id
field of the response. The session_id
field is contained in the result
object.
Before obtaining individual forum IDs, you must obtain the Module ID for the forum module (herin {{$ForumModuleID}}
) you wish to scrape. This can be obtained in the admin panel of your site under "Modules". Using the left side panel, you can filter to the type "Forum Board". Make a list of the Module IDs you wish to scrape. You can then follow this process for each module.
Send a request to the Forum.getCategoriesAndForums
method. The preset_id
field is the Module ID of the forum module you wish to scrape. The session_id
field is the Session ID you obtained in the previous step.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
"preset_id": "{{$ForumModuleID}}",
"session_id": "{{$SessionID}}"
},
"method":"Forum.getCategoriesAndForums"
}
Forum IDs are contained in the forum_id
field of the response. The forum_id
field is contained in the categories
object, which is contained in the result
object. The categories
object contains a list of categories, which are objects containing a list of forums, which are objects containing the forum_id
field. The categories
object is keyed by the category ID, and the forums are keyed by the forum ID. The subforums
object contains a list of subforums, which are lists of forums, which are objects containing the forum_id
field. The subforums
object is keyed by the forum ID.
This should comprise all Forum IDs for the module.
{
"result": {
"subforums": {
"4511155": [
{
//...
"forum_id": "0000000"
//...
}
]
},
"categories": {
"0000000": {
"0000001": {
//...
"forum_id": "0000001"
},
"0000002": {
//...
"forum_id": "0000002"
//...
}
},
"0000001": {
"0000003": {
//...
"forum_id": "0000003"
//...
}
}
}
},
"id": "12345",
"jsonrpc": "2.0"
}
Once you have scraped the Forum IDs, these can be used to scrape the Thread IDs. Starting at this point in the API, we now have to worry about the responses being paginated. Luckily, the API provides a pages
field in the response, which tells us how many pages there are. For each response, we can store this value and test if it is greater than the page value we are about to request. We can then loop through each page, and scrape the Thread IDs from each page.
Send a request to the Forum.getForum
method. The forum_id
field is the Forum ID you wish to scrape. The session_id
field is the Session ID you obtained in the previous step. The page
field is the page number you wish to scrape.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
"session_id": "{{$SessionID}}",
"forum_id": "{{$ForumID}}",
"page": "1"
},
"method":"Forum.getForum"
}
Thread IDs are contained in the thread_id
field of the response. The thread_id
field is contained in the threads
object, which is contained in the result
object. The threads
object contains a list of threads, which are objects containing the thread_id
field. The sticky
object contains a list of sticky threads, which are objects containing the thread_id
field.
Be sure to continue to scrape the next page until the pages
field is equal to the page
field.
{
"result": {
"sticky": [
{
"thread_id": "0000000"
//...
}
],
"threads": [
{
"thread_id": "0000001"
//...
},
{
"thread_id": "0000002"
//...
}
],
"page": "1",
"pages": 1
},
"id": "12345",
"jsonrpc": "2.0"
}
Once you have scraped the Thread IDs, these can be used to scrape the Thread data. This process will need to be repeated for each obtained Thread ID.
These responses will also be paginated. Again, the API provides a pages
field in the response, which tells us how many pages there are. For each response, we can store this value and test if it is greater than the page value we are about to request. We can then loop through each page, and scrape the Thread data from each page.
Send a request to the Forum.getThread
method. The thread_id
field is the Thread ID you wish to scrape. The session_id
field is the Session ID you obtained in the previous step. The page
field is the page number you wish to scrape.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
"session_id": "{{$SessionID}}",
"thread_id": "{{$ThreadID}}",
"page": "1"
},
"method":"Forum.getThread"
}
Be sure to continue to scrape the next page until the pages
field is equal to the page
field.
It would likely be advisable to download any assets contained in the post as well, as those hosted on Enjin servers will likely be deleted when the site goes down.
{
"result": {
"thread": {
//...
},
"posts": [
{
"post_id": "000000001",
"post_time": "1500574914",
"post_content": "...",
"post_content_html": "...",
"post_content_clean": "...",
"post_user_id": "00000001",
"show_signature": "0",
"last_edit_time": "0",
"post_votes": "0",
"post_unhidden": "0",
"post_admin_hidden": "0",
"post_locked": "0",
"last_edit_user": "0",
"votes": null,
"post_username": "...",
"avatar": "https:\/\/assets-cloud.enjin.com\/users\/00000001\/avatar\/medium.00000001.jpeg",
"user_online": false,
"user_votes": "0",
"user_posts": "8",
"url": "https:\/\/www.enjin.com\/ajax.php?s=redirect&cmd=forum-post&mobile=1&preset=00000001&id=00000001"
}
],
"total_items": "2",
"pages": 1
},
"id": "12345",
"jsonrpc": "2.0"
}
Before obtaining news posts, you must obtain the Module ID for the forum module (herin {{$NewsModuleID}}
) you wish to scrape. This can be obtained in the admin panel of your site under "Modules". Using the left side panel, you can filter to the type "News / Blog". Make a list of the Module IDs you wish to scrape. You can then follow this process for each module.
These responses will also be paginated. The responses here, however, do not contain a page value. Instead, for each response, we must test the length of result[]
. If this is of length 0, we have reached the end of the news posts. We should then delete the most recent response since it contains no useful data. If it is greater than 0, we can continue to scrape the next page and save the current response.
Send a request to the News.getNews
method. The preset_id
field is the Module ID you wish to scrape. The session_id
field is the Session ID you obtained in the previous step. The page
field is the page number you wish to scrape.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
"session_id": "{{$SessionID}}",
"preset_id": "{{$NewsModuleID}}",
"page": "1"
},
"method":"News.getNews"
}
Be sure to continue to scrape the next page until the result[]
field is of length 0.
{
"result": [
{
"preset_id": "{{$NewsModuleID}}",
"article_id": "0000001",
"user_id": "0000001",
"num_comments": "0",
"timestamp": "1347150882",
"status": "1",
"title": "...",
"content": "...",
"commenting_mode": "0",
"ordering": "101",
"sticky": "0",
"last_updated": null,
"username": "",
"displayname": "..."
}
],
"id": "12345",
"jsonrpc": "2.0"
}
For whatever reason, unlike all other module types, the Enjin API provides a method to obtain all ticket modules for a given site. This makes our job slightly easier, as we do not have to manually obtain these IDs from the admin panel. We can instead obtain them from the endpoint, and then obtain the tickets for each module.
Obtaining the tickets can only be done with a site API key (herein {{$APIKey}}
). Per Enjin's documentation, this is found in the same place in which the API is enabled.
To enable your API, visit your admin panel / settings / API area. The content on this page includes your base API URL, your secret API key, and the API mode. Ensure that the API mode is set to "Public".
Send a request to the Tickets.getModules
method. The api_key field
is the API key you obtained in the previous step.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
"api_key": "{{$APIKey}}"
},
"method":"Tickets.getModules"
}
The Ticket Module IDs are listed twice, once as the keys of the result
field, and once as the preset_id
of each member of ...questions[]
.
{
"result": {
"{{$TicketModuleID}}": {
"module_name": "...",
"questions": [
{
"id": "0001",
"site_id": "{{$SiteID}}",
"preset_id": "{{$TicketModuleID}}",
"type": "text",
"label": "...",
"required": "1",
"bold": "0",
"help_text": "...",
"order": "0",
"other_options": {
"bbcode": "0",
"lines": "4",
"min": "1",
"max": "100"
},
"options": null,
"conditions": null,
"condition_qualify": "all_true",
"system": "1"
}
]
}
},
"id": "12345",
"jsonrpc": "2.0"
}
Using each obtained Ticket Module ID, the tickets for each module can be obtained. These responses will also be paginated. The responses here do contain a page value. We can simply continue until our page value is greater than the total number of pages.
Send a request to the Tickets.getTickets
method. The preset_id
field is the Module ID you wish to scrape. The session_id
field is the Session ID you obtained in the previous step. The page
field is the page number you wish to scrape.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
"session_id": "{{$SessionID}}",
"preset_id": "{{$TicketModuleID}}",
"status": "all",
"page" : "1"
},
"method":"Tickets.getTickets"
}
Be sure to continue until result.pagination.last_page
is less than the page you are about to scrape.
{
"result": {
"results": [
{
"id": "0000001",
"code": "a3d65661",
"site_id": "{{$SiteID}}",
"preset_id": "{{$NewsModuleID}}",
"subject": "...",
"created": "1673888558",
"status": "open",
"assignee": "00000001",
"updated": "1673888558",
"requester": "00000002",
"priority": "1",
"extra_questions": "...",
"status_change": "1673888558",
"email": null,
"viewers": false,
"createdHTML": "8 hours ago",
"updatedHTML": "8 hours ago",
"requesterHTML": "...",
"assigneeText": "...",
"assigneeHTML": "...",
"priority_name": "Low",
"replies_count": 0,
"private_reply_count": 0
}
],
"pagination": {
"page": "1",
"nr_pages": 4,
"nr_results": "92",
"first_page": 1,
"last_page": 4
}
},
"id": "12345",
"jsonrpc": "2.0"
}
Before obtaining applications, you must obtain the possible application types. This is the only way to ensure all Application IDs are later obtained, as this endpoint requires an input type.
Send a request to the Applications.getTypes
method. No paramters are needed for this endpoint.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
},
"method":"Applications.getTypes"
}
The types used later will be the keys listed under result
, not the values.
{
"result": {
"open": "Open",
"approved": "Approved",
"rejected": "Rejected",
"general": "General",
"archive": "Archive",
"trash": "Trash",
"my-applications": "My Apps"
},
"id": "12345",
"jsonrpc": "2.0"
}
The Site ID (herein {{$SiteID}}) is not required for the Applications.getList
endpoint, but it will ensure that we only obtain applications associated with the site we are scraping.
Send a request to the Site.getStats
method. No paramters are needed for this endpoint.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
},
"method":"Site.getStats"
}
The Site ID is listed under result.latest_user.site_id
.
{
"result": {
"total_users": "1",
"latest_user": {
"site_id": "{{$SiteID}}",
"user_id": "0000001",
"access": "2",
"datejoined": "1673893006",
"lastseen": "1673897992",
"post_count": "0",
"forum_votes": "0",
"forum_up_votes": "0",
"forum_down_votes": "0",
"banned_date": "0",
"banned_expiration": "0",
"allow_issue_warnings": null,
"allow_issue_punishments": null,
"banned_by": "system",
"banned_by_id": "0",
"banned_reason": ""
}
},
"id": "12345",
"jsonrpc": "2.0"
}
Once you have the Site ID and the Application Types, you can obtain the Application IDs. To ensure all Application IDs are obtained, this process must be repeated for each application type.
Furthermore, the Applications.getList
endpoint is paginated. To ensure all results are obtained, this process must be repeated for each page. The endpoint does not return the total number of pages, so the process must be repeated until the results.items[]
is of length 0. This final response of length 0 should then be deleted, and the process should be repeated for the next application type.
Send a request to the Applications.getList
method. The session_id
is the Session ID obtained earlier. The type
is the Application Type obtained earlier. The site_id
is the Site ID obtained earlier. The page
is the page number, starting at 1.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
"session_id": "{{$SessionID}}",
"type": "{{$type}}",
"site_id": "{{$SiteID}}",
"page": "1"
},
"method":"Applications.getList"
}
The Application ID is listed at the beginning of each application under result.items[].application_id
.
{
"result": {
"items": [
{
"application_id": "0000001",
"site_id": "{{$SiteID}}",
"preset_id": "0000001",
"title": "...",
"user_ip": "xxx.xxx.xxx.xxx",
"is_mine": false,
"can_manage": true,
"created": "1668401890",
"updated": "1671750293",
"read": true,
"comments": 0,
"read_comments": null,
"app_comments": "1",
"admin_comments": "1",
"site_name": "...",
"user_id": "0000001",
"is_online": false,
"admin_online": false,
"username": "Belle!",
"avatar": "https:\/\/s3.amazonaws.com\/files.enjin.com\/{{$SiteID}}\/site_logo\/medium.png",
"admin_user_id": "0000002",
"admin_username": "...",
"admin_avatar": "https:\/\/cravatar.eu\/helmavatar\/autumn_carrots\/74.png",
"site_logo": "https:\/\/s3.amazonaws.com\/files.enjin.com\/{{$SiteID}}\/site_logo\/medium.png"
}
],
"total": "1"
},
"id": "12345",
"jsonrpc": "2.0"
}
Using the Application IDs obtained from the previous endpoint, you can obtain the applications themselves. To ensure all applications are obtained, this process must be repeated for each application ID.
Send a request to the Applications.getApplication
method. The session_id
is the Session ID obtained earlier. The application_id
is the Application ID obtained earlier.
POST https://www.FORUM_DOMAIN.com/api/v1/api.php
content-type: application/json
{
"jsonrpc":"2.0",
"id":"12345",
"params":{
"session_id": "{{$SessionID}}",
"application_id": "{{$ApplicationID}}"
},
"method":"Applications.getApplication"
}
An example response is given below. It should be noted that the fields in user data correspond to fields in the form. The raw HMTL of any form pages should be saved before Enjin goes offline to ensure that the original questions can be correlated with these responses.
{
"result": {
"application_id": "{{$ApplicationID}}",
"site_id": "{{$SiteID}}",
"preset_id": "0000001",
"title": "...",
"user_ip": "xxx.xxx.xxx.xxx",
"is_mine": false,
"can_manage": true,
"created": "1668401890",
"updated": "1671750293",
"read": true,
"comments": 0,
"app_comments": "1",
"admin_comments": "1",
"site_name": "...",
"user_id": "0000001",
"is_online": false,
"username": "...",
"avatar": "https:\/\/s3.amazonaws.com\/files.enjin.com\/{{$SiteID}}\/site_logo\/medium.png",
"admin_user_id": "0000002",
"admin_online": false,
"admin_username": "...",
"admin_avatar": "...",
"site_logo": "https:\/\/s3.amazonaws.com\/files.enjin.com\/{{$SiteID}}\/site_logo\/medium.png",
"user_data": {
"8szicgnohx": [
"..."
],
"xy8oih250y": "...",
"51md9eq5q5": 19,
"nicd7towfu": "...",
"aiaypctyj6": [
"..."
],
"io0odjma55": "...",
"aor8go2of7": "...",
"q62x889j70": "...",
"guysviozjo": [
"..."
],
"5osw7rr977": "...",
"vuxwi9zygo": [
"..."
],
"27s0622qqw": "..."
},
"is_archived": false,
"is_trashed": false,
"allow_app_comments": "1",
"post_app_comments": true,
"allow_admin_comments": true
},
"id": "12345",
"jsonrpc": "2.0"
}