2024 Edit - In the comments, there are good alternatives. When this gist was written, there were few alternatives to the Slack Bolt package.
Some gotchas from my recent experience of building a serverless Next.JS + Bolt.JS Slack App on Vercel.
Note that if you're building an app that you want to distribute to other workspaces, AFAIK you need to build an API. So, Next.JS is used here to help with the public API. The alternative to an API is using "socket mode".
- When building out the API, Bolt ONLY uses the /slack/events endpoint. The Slack config settings will suggest you provide a different endpoint, like /slack/commands for Slash Commands. That would work if you weren't using the Node API (via Bolt), such as the Python API. However, Bolt uses the Node API which ONLY uses /slack/events for everything. You can still use Bolt functions
app.command()
and similar, just remember to put the/slack/events
endpoint in the Slack config.
- Severless is not officially supported by Slack with the Bolt API. It is possible if you take a look at this Vercel serverless Next.JS + Bolt app. However, beware that the project is only good for an app that always responds IMMEDIATELY. If your app uses any 3rd party endpoints or does anything that take a second or two, Slack will throw an "operation_timeout" error.
- Serverless with Bolt puts you in a bind
- If you respond immediately to an event (to avoid Slack timing out after 3 seconds), any time-consuming code you have will get prematurely terminated by the serverless function ending. This happens because Next + Bolt sees that the endpoint gave a response, thinks the work is done, and terminates anything you have running in the callback. Also, note that Bolt with
processBeforeResponse: true
will purposefully delay the ack(), until the entire callback is done. On the one hand, this is good to make sure your function does not early terminate, on the other hand, ack() may not get sent within the 3 second Slack timeout period. - If you avoid responding immediately to give your serverless function time to finish, after 3 seconds Slack will timeout and send the user an error. Strangely, with Vercel in this case you might be able to still run everything you wanted to in the callback and post a new message, but Slack will also show the user an error all the same.
- If you respond immediately to an event (to avoid Slack timing out after 3 seconds), any time-consuming code you have will get prematurely terminated by the serverless function ending. This happens because Next + Bolt sees that the endpoint gave a response, thinks the work is done, and terminates anything you have running in the callback. Also, note that Bolt with
- Fixes for the serverless issue above (with long-running tasks)
- Send time-consuming work to a queue'ing system (ie AWS SQS), but that's adding a lot of complexity with Bolt, a framework that was supposed to make things simple!
- I ultimately ended up using a variation of this method as described at the bottom of this document. Send the long-running job to a separate Next.JS endpoint.
- Ditch Slack's Bolt framework. Slack has no plans to fix this long-running task issue with serverless Bolt. However, Slack's Python API does have a feature that makes serverless with long tasks actually work.
- Alternatively - Vercel is right now working on an example Slack App that runs on Vercel serverlessly, and has all the required auth functions that Bolt has.
- Send time-consuming work to a queue'ing system (ie AWS SQS), but that's adding a lot of complexity with Bolt, a framework that was supposed to make things simple!
StackOverflow - "How to avoid slack command timeout error?"
StackOverflow - ack() does not send immediately, waits for entire workflow to finish before sending
GitHub Issues - Preventing AWS Lambdas from self-terminating when an ack() is sent
Vercel SlackBot WITHOUT using Bolt. Beware that this is a simple example, and does not do Slack Install (OAuth) for you, Bolt would be able to handle this automatically. Vercel says they're working on another example Slack app that would also do Oauth for you.
Modified Bolt.JS for web frameworks Not sure how well this works with serverless. But, it seems to be built to better work with Next.JS and similar. Built by a guy who works on Bolt.
Next.JS + Bolt boilerplate I linked this further above. The Next.JS seems to work nicely. It works with serverless only if your app responds instantly to events. Note that this is slightly different from the "Modified Bolt.JS" project, in that it does use a "custom receiver". The modified Bolt.JS project purposefully avoids a custom receiver.
I did manage to get long-running serverless tasks to work using the "Next.JS + Bolt boilerplate" linked above. The approach is basically - create a separate job to allow the function to return quickly, do this with a Next.JS endpoint to simplify infrastructure.
- My project is in Vercel world, while this does use AWS Lambda behind the scenes, I was not interested in setting up extra AWS infrastructure to do AWS SQS jobs.
- The work-around in Vercel was to create another Next.JS endpoint which would run a separate Vercel function. This separate "worker" function would still use Bolt, but only for sending events to Slack, not listening to events.
- The initial API function would send a network request to this second "worker" function with all the data from Slack about the event. It does NOT
await
theaxios.post
, this is because awaiting the post would mean waiting for the entire "worker" function to finish, defeating the whole point. Instead it "fires and forgets" about the function.- Something important to note - the
axios.post
request getting sent can actually get interrupted by the Lambda terminating before the request was sent. So, the hacky fix is to useawait new Promise(resolve => setTimeout(resolve, 500))
afteraxios.post
to ensure the request is sent off.
- Something important to note - the
- For the "worker" function I used
res.end()
to end the function. Don't useres.status(200)
, it would just hang since the initial function was already terminated and the worker function would end up timing out after 30 or 60 seconds. - Don't forget that this worker function is also a public endpoint, so validation should be treated the same for both endpoints.
- In the example below I crudely used an arbitrary
INTERNAL_WORKER_TOKEN
to only accept requests coming from the internal function. There's probably a more robust way to do this.
- In the example below I crudely used an arbitrary
A portion of the file at /pages/api/[[...route]].ts
// Slack Slash Command for /command-a
app.command('/command-a', async ({ ack, body, context, say }) => {
// Let the user know we're working on it
const workingOnItMessage = await say({
text: `:building_construction: Working on this long running task.`
})
// Run a post request to /api/worker with the arguments as a JSON string
axios.post(
'https://your-app-url-here.vercel.app/api/worker',
{
command: '/command-a',
body,
context,
workingOnItMessage,
internalWorkerToken: process.env.INTERNAL_WORKER_TOKEN,
},
{
headers: {
'Content-Type': 'application/json',
},
}
);
ack()
// HACK - Ensure that the axios.post request gets sent out
await new Promise(resolve => setTimeout(resolve, 500))
})
A portion of the (same) file at /pages/api/[[...route]].ts
router.post('/api/worker', async (req: NextApiRequest, res: NextApiResponse) => {
if (req.method === 'POST') {
// Check that the request is coming from an internal serverless function
if (!req.body.internalWorkerToken || req.body.internalWorkerToken !== process.env.INTERNAL_WORKER_TOKEN) {
return res.end()
}
let command: string = req.body.command
let workingOnItMessage: any = req.body.workingOnItMessage
let slackReqBody: any = req.body.body
let context: any = req.body.context
if (command === '/command-a') {
await runCommandA({ body: slackReqBody, context, workingOnItMessage })
}
// Force this worker function to terminate now
res.end()
}
}
For the installation store database, Upstash seemed like the quickest and easiest to set up with Vercel. It has a Vercel integration that worked nicely. The free plan looks perfect for small slack apps, 10K commands a day. Paired that with ioredis for the fetchInstallation, storeInstallation, and deleteInstallation functions. I did run into the issue that while the serverless function fetches the installation on startup without it being a timeout issue, if you want to fetch the installation it would re-run the network request to Upstash, which pushed the app to start having Slack timeout issues. The crude solution was to locally store the installation on the serverless function. So, from Bolt auto-running fetchInstallation to an event handler like app.command running, you would still have the installation object handy without re-running a network request to Upstash.
Using a Next.JS endpoint isn't the most robust way to run a job. ServerlessQ looks pretty nice for serverless with a good Vercel integration. AWS SQS looks like overkill, so if I transition away from a Next.JS endpoint, ServerlessQ is the way I'm leaning.
Additional notes
Having used this method for a little while now. The job doesn't always trigger, I'm pretty sure this is just due to the /api function ending before the request was sent to the job. So, the setTimeout waiting period likely needs to be increased.
I'm personally just going to skip straight to using a queue'ing service since that code is already delicate timing-wise. But, if this is not an issue - it does work 80% of the time, particularly when it is in regular use.
Also, the
ack()
+setTimeout()
lines at the end of /api/slack/events can probably be combined into anawait Promise.all()
. This guarantees that theack()
runs, potentially reducing the instances of the Slack timeout message showing up.Curious if there's a "smarter" way to check the status axios.post and end the function once the network request was sent (before receiving a response).