-
-
Save ottokruse/da6e86135365ae46965f3485c16cbd92 to your computer and use it in GitHub Desktop.
| // Creating an AWS Lambda Layer with pandas and pyarrow is harder than it might seem, | |
| // as simply `pip install pandas pyarrow` will lead to a deployment package that is > 250 MB | |
| // which is not allowed by AWS Lambda. | |
| // In this snippet, that deployment package is trimmed down, to make it fit (and still work) | |
| import * as lambda from "aws-cdk-lib/aws-lambda"; | |
| const layerInstallCommand = [ | |
| "bash", | |
| "-c", | |
| [ | |
| "mkdir /asset-output/python", | |
| "pip install -r requirements.txt -t /asset-output/python", | |
| "rm -rf /asset-output/python/botocore", // Lambda runtime already has boto itself | |
| "rm -rf /asset-output/python/boto3", // Lambda runtime already has boto itself | |
| "rm -rf /asset-output/python/bin", // No need to execute CLI binaries | |
| "find /asset-output/python -name '*.so' -type f -exec strip \"{}\" \\;", // Not sure why this is needed, copied it from AWS Data Wrangler build | |
| "find /asset-output/python -d -regex '.*/tests' -exec rm -r {} +", // Get rid of tests | |
| "find /asset-output/python -d -regex '.*/__pycache__' -exec rm -r {} +", // Get rid of pycache | |
| "find /asset-output/python -type f -regex '^.*\\.py[co]$' -delete", // Get rid of pycache | |
| ].join(" && "), | |
| ]; | |
| const pandasLayer = new lambda.LayerVersion(self, "PandasLayer", { | |
| code: lambda.Code.fromAsset( | |
| "../pandas_layer", // Point this at a directory with a requirements.txt file with pandas and pyarrow | |
| { | |
| bundling: { | |
| image: lambda.Runtime.PYTHON_3_9.bundlingImage, | |
| command: layerInstallCommand, | |
| }, | |
| }), | |
| compatibleRuntimes: [lambda.Runtime.PYTHON_3_9], | |
| compatibleArchitectures: [ | |
| lambda.Architecture.X86_64, // Match this to the architecture of the machine where you are CDK synthing on (!) | |
| ], | |
| }); |
Nice!
@varunvilva-kickdrum Unfortunately I don't have the requirements.txt file anymore, but I have a better solution for you:
Instead of building a custom layer, use the AWS SDK for pandas (formerly AWS Data Wrangler) managed layer. It includes pyArrow pre-built and optimized for Lambda, and it now supports Python 3.13!
For Python 3.13, add this layer ARN to your Lambda function:
arn:aws:lambda::336392948345:layer:AWSSDKPandas-Python313:4
Just replace with your actual region (e.g., us-east-1).
This is much simpler than building a custom pyArrow layer and should resolve your "Cannot find the pyarrow.libs" error.
Documentation: https://aws-sdk-pandas.readthedocs.io/en/stable/layers.html
Note: Make sure your Lambda function has at least 512MB of memory when using this layer.
Great solution, thank you for the detailed info!
Thanks so much, @ottokruse ! Really appreciate your help and the suggestion — it worked perfectly. 🙏