Skip to content

Instantly share code, notes, and snippets.

@ottokruse
Created May 25, 2022 07:55
Show Gist options
  • Save ottokruse/da6e86135365ae46965f3485c16cbd92 to your computer and use it in GitHub Desktop.
Save ottokruse/da6e86135365ae46965f3485c16cbd92 to your computer and use it in GitHub Desktop.
Create an AWS Lambda Layer that includes pandas and pyarrow
// Creating an AWS Lambda Layer with pandas and pyarrow is harder than it might seem,
// as simply `pip install pandas pyarrow` will lead to a deployment package that is > 250 MB
// which is not allowed by AWS Lambda.
// In this snippet, that deployment package is trimmed down, to make it fit (and still work)
import * as lambda from "aws-cdk-lib/aws-lambda";
const layerInstallCommand = [
"bash",
"-c",
[
"mkdir /asset-output/python",
"pip install -r requirements.txt -t /asset-output/python",
"rm -rf /asset-output/python/botocore", // Lambda runtime already has boto itself
"rm -rf /asset-output/python/boto3", // Lambda runtime already has boto itself
"rm -rf /asset-output/python/bin", // No need to execute CLI binaries
"find /asset-output/python -name '*.so' -type f -exec strip \"{}\" \\;", // Not sure why this is needed, copied it from AWS Data Wrangler build
"find /asset-output/python -d -regex '.*/tests' -exec rm -r {} +", // Get rid of tests
"find /asset-output/python -d -regex '.*/__pycache__' -exec rm -r {} +", // Get rid of pycache
"find /asset-output/python -type f -regex '^.*\\.py[co]$' -delete", // Get rid of pycache
].join(" && "),
];
const pandasLayer = new lambda.LayerVersion(self, "PandasLayer", {
code: lambda.Code.fromAsset(
"../pandas_layer", // Point this at a directory with a requirements.txt file with pandas and pyarrow
{
bundling: {
image: lambda.Runtime.PYTHON_3_9.bundlingImage,
command: layerInstallCommand,
},
}),
compatibleRuntimes: [lambda.Runtime.PYTHON_3_9],
compatibleArchitectures: [
lambda.Architecture.X86_64, // Match this to the architecture of the machine where you are CDK synthing on (!)
],
});
@varunvilva-kickdrum
Copy link

@ottokruse could you provide your requirements.txt.I tried manually to put pyArrow after refering stackoverflow and also many medium articles. But I have failed, and still getting Cannot find the pyarrow.libs this is not coming in the layers made by this?
I am using python3.13 and also wanting to use pyArrow ^20.0.0

@ottokruse
Copy link
Author

@varunvilva-kickdrum Unfortunately I don't have the requirements.txt file anymore, but I have a better solution for you:

Instead of building a custom layer, use the AWS SDK for pandas (formerly AWS Data Wrangler) managed layer. It includes pyArrow pre-built and optimized for Lambda, and it now supports Python 3.13!

For Python 3.13, add this layer ARN to your Lambda function:

arn:aws:lambda::336392948345:layer:AWSSDKPandas-Python313:4

Just replace with your actual region (e.g., us-east-1).

This is much simpler than building a custom pyArrow layer and should resolve your "Cannot find the pyarrow.libs" error.

Documentation: https://aws-sdk-pandas.readthedocs.io/en/stable/layers.html

Note: Make sure your Lambda function has at least 512MB of memory when using this layer.

@varunvilva-kickdrum
Copy link

@varunvilva-kickdrum Unfortunately I don't have the requirements.txt file anymore, but I have a better solution for you:

Instead of building a custom layer, use the AWS SDK for pandas (formerly AWS Data Wrangler) managed layer. It includes pyArrow pre-built and optimized for Lambda, and it now supports Python 3.13!

For Python 3.13, add this layer ARN to your Lambda function:

arn:aws:lambda::336392948345:layer:AWSSDKPandas-Python313:4

Just replace with your actual region (e.g., us-east-1).

This is much simpler than building a custom pyArrow layer and should resolve your "Cannot find the pyarrow.libs" error.

Documentation: https://aws-sdk-pandas.readthedocs.io/en/stable/layers.html

Note: Make sure your Lambda function has at least 512MB of memory when using this layer.

Thanks so much, @ottokruse ! Really appreciate your help and the suggestion — it worked perfectly. 🙏

@ottokruse
Copy link
Author

Nice!

@khoa-tran-hds
Copy link

@varunvilva-kickdrum Unfortunately I don't have the requirements.txt file anymore, but I have a better solution for you:

Instead of building a custom layer, use the AWS SDK for pandas (formerly AWS Data Wrangler) managed layer. It includes pyArrow pre-built and optimized for Lambda, and it now supports Python 3.13!

For Python 3.13, add this layer ARN to your Lambda function:

arn:aws:lambda::336392948345:layer:AWSSDKPandas-Python313:4

Just replace with your actual region (e.g., us-east-1).

This is much simpler than building a custom pyArrow layer and should resolve your "Cannot find the pyarrow.libs" error.

Documentation: https://aws-sdk-pandas.readthedocs.io/en/stable/layers.html

Note: Make sure your Lambda function has at least 512MB of memory when using this layer.

Great solution, thank you for the detailed info!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment