timesler/deploy_dolly_v2.ipynb

Created April 21, 2023 23:03

Star (38) You must be signed in to star a gist
Fork (6) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/timesler/4b244a6b73d6e02d17fd220fd92dfaec.js"></script>
Save timesler/4b244a6b73d6e02d17fd220fd92dfaec to your computer and use it in GitHub Desktop.

Download ZIP

Deploy Dolly v2.0 to SageMaker

Raw

deploy_dolly_v2.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

wllamb commented Apr 25, 2023

Did you have any timeout issues when using this deployment? My deployed endpoint answered one question successfully and most other requests ended up timing out. But thank you, prior to finding the additions to requirements.txt and running in 8bit my previous deployments were running out of memory

Author

timesler commented Apr 25, 2023

Did you have any timeout issues when using this deployment? My deployed endpoint answered one question successfully and most other requests ended up timing out. But thank you, prior to finding the additions to requirements.txt and running in 8bit my previous deployments were running out of memory

Yep, I've had similar problems for certain prompts. I plan on testing the 7B version to see if it can respond to more complex prompts fast enough to avoid SageMaker's timeout limit. I think it should be as simple as changing the 12b's to 7b's in the notebook, and it probably doesn't need to be loaded in 8bit either.

hooman650 commented Apr 29, 2023

Thanks for the example. We have deployed many conversational models on Sagemaker. The challenge is that this way the endpoint does not stream the response and a lot of times for longer responses it times out.

ulisseshen commented May 3, 2023

Thanks for the example. We have deployed many conversational models on Sagemaker. The challenge is that this way the endpoint does not stream the response and a lot of times for longer responses it times out.

You can try another conversational pattern for your server/client like a websocket

IChr1 commented May 30, 2023 •

edited

Loading

Has anyone used an inference config for the code as seen above so that the model can handle embeddings ?

ybm11 commented Jul 24, 2023

Thanks for sharing, this is helping me a lot in trying to figure this topic out.
One question - why is there a mismatch between the transformers version in the requirements.txt file and in the Sagmaker model creation command? What is the difference, and how does it make sense that they will be different?

wd021 commented Jul 8, 2025

hey prompters, sharing a new resource for 🧠 prompts, God Tier Prompts!