Skip to content

Instantly share code, notes, and snippets.

View nov05's full-sized avatar
πŸ’­
Homo Sapiens

nov05

πŸ’­
Homo Sapiens
View GitHub Profile
@nov05
nov05 / 20241122_AWS SageMaker JupyterLab (or any other IDE), set up GitHub username and password.md
Last active November 24, 2024 11:03
20241122_AWS SageMaker JupyterLab (or any other IDE), set up GitHub username and password
  • Don't use the email you registered with GitHub for commits. Instead, GitHub provides you with a proxy email for this purpose. Just go to 'Settings - Emails' in your GitHub account, and you'll find the proxy email there.
  • Don't use your GitHub login password for commits. Instead, go to 'Settings - Developer Settings - Personal access tokens', create a token, and use that as your password for commits. Since Fine-grained tokens are still in Preview, I'm using a classic token for now.

🟒 Different Levels of AWS Resources for Machine Learning Model Training and Deployment

  1. πŸ‘‰ EC2 Instances: Full User Control (Least Pre-built Content)
    With EC2, you have complete control over the entire setup. You need to:
    • Start an EC2 instance (e.g., GPU-enabled for training deep learning models).
    • Install dependencies manually (e.g., Python, ML libraries like PyTorch or TensorFlow).
    • Copy or configure the training script, and handle the training data management (downloading data from S3 or other sources).
    • Run the training process manually using your own code.
    • Manage all aspects of the environment, scaling, and resource management.

To apply distributed training for the AWS SageMaker Linear Learner algorithm, you would typically rely on SageMaker's built-in distributed training capabilities. The Linear Learner algorithm supports distributed training by scaling across multiple instances and using multiple GPUs or CPU cores.

How to Apply Distributed Training for Linear Learner Algorithm in SageMaker

1. Using SageMaker Pre-built Containers with Distributed Training

SageMaker Linear Learner algorithm provides a straightforward approach to use distributed training across multiple instances by setting the instance_count parameter to more than 1.

Steps:
  • Uninstall all VS Code extensions
    Delete C:\Users\*\.vscode\extensions folder
    Reinstall extensions

  • Remove Jupyter kernels

(base) PS D:\github\udacity-nd009t-capstone-starter> jupyter kernelspec list
Available kernels:
  • ⚠️🟒 Issue: training error
[1,mpirank:0,algo-1]<stderr>:../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
[1,mpirank:0,algo-1]<stderr>:../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [6[1,mpirank:0,algo-1]<stderr>:,0,0] Assertion `t >= 0 && t < n_classes` failed.
[1,mpirank:0,algo-1]<stderr>:../aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [30,0,0] Assertion `t >= 0 && t < n_classes` failed.
...
[1,mpirank:1,algo-2]<stdout>:  File "train.py", line 675, in <module>
[1,mpirank:1,algo-2]<stdout>:    main(task)
[1,mpirank:1,algo-2]<stdout>:  File "train.py", line 572, in main

βœ…βœ…βœ… My working code: Create WebDataset from local data files to local .tar files

## example code for webdataset
import webdataset as wds
import io
import json

⚠️ Issue

(awsmle_py310) PS D:\github\udacity-cd13926-Building-Apps-Amazon-Bedrock-exercises\Experiments> aws bedrock invoke-model `
>> --model-id anthropic.claude-3-5-sonnet-20240620-v1:0 `
>> --body file://claude_input.json output.json

usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:
  • ☝️ Check my Google Docs


🟒⚠️ It is hard to ROSLaunch that Gazebo world that I created via udacity_office.launch.

🟒 ChatGPT output (could be wrong. verify carefully)

To export SAP HANA data in real time continuously, you can use several methods depending on the target system and the purpose. Here are some of the most common approaches:

1. Smart Data Integration (SDI)

  • Use Case: Real-time data replication and transformation.
  • How: SDI allows you to create real-time data replication tasks between SAP HANA and other systems. You can define data flows that continuously export data from HANA and send it to another system, such as another HANA instance or a non-HANA database.
  • Steps:
    1. Set up a Data Provisioning Agent.
  1. Configure the SDI connection to the target system.