Created
January 20, 2019 05:44
-
-
Save j-mprabhakaran/64dd67d9846b7afeae0459845225b68e to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Dataflow lifecycle | |
migration concerns from migrating from on-premises over into google cloud | |
code snippet to troubleshoot and diagnose | |
Part 2 - Hands-on with tools | |
Role of Cloud Architect | |
plans, designs and builds the infrastructure for an org to host their workload on GCP; able to plan to scale; | |
scalability and automation | |
The Importance of Hands-on Practice | |
Practice | |
Core Management Services | |
Cloud Resource Manager(Quotas, IAM, Billing) | |
Management Services: (IMPORTANT FOR EXAM) | |
Organization Node and Folders | |
Org -> Folders -> Projects -> Resources | |
Org - Highest root node for all GCP resources; Org Admin(Highest level, useful for Auditing), Org Owner(reserved for G suite super admin) | |
Folder - Group projects under org; share common IAM policies; Roles granted to folder | |
Quotas | |
caps on resources you can create; ex: 48 CPU per region, 5 static IP's per project; prevent unexpected spikes in usage; | |
3 Types - Resources per project, API rate limit requests per project, Per region | |
Increasing Quota caps - soft caps can be raised by request; support ticket or self service form; quota can be viewed on console; proactively request | |
Labels | |
Method of organization and segregation(projects & folders); Labels are tool for organizing GCP resources; any resouce can be labeled(via console, gcloud or API) | |
64 labels per resource; key:value pair; Ex: Environment - env:prod, env:test; Owner or POC - owner:matt, contact:devops; Team or cost center - team:research, | |
team:marketing; App component - component:backend, component:frontend; Resource set - state:readyfordeletion, state:inuse | |
Tags(only for network/VPC resources, affect resource operations) | |
IAM & Admin -> Select entire Org -> Add -> Select Role -> Resource Manager, Select Folder Admin and Org Admin -> Create Folder | |
Viewer, Editor, Owner - Primitive Permissions for GCP resources | |
Using CLI: | |
gcloud projects get-iam-policy pwnet-test1 --format json > iam.json | |
ls | |
nano iam.json | |
gcloud projects set-iam-policy pwnet-test1 iam.json | |
Create Custom role for additional permissions | |
Service accounts | |
Project Creator, Billing Account Creator access required for creating new projects | |
gcloud config list | |
gsutil ls gs://pwnet-bucket1 | |
gsutil cp gs://pwnet-bucket1/* | |
gsutil cp file.txt gs://pwnet-bucket1 # access denied if the API permission is only READ, edit instance provide READ WRITE access | |
IAM Best Practices | |
Use Principle of least privilege | |
Restrict service account access | |
Restrit Service account admin role | |
Careful with Owner role (Owner can change IAM policy) | |
Rotate service account keys periodically | |
Auditing - Cloud Audit Logs, Export Logs to cloud storage, restrict log access | |
Billing: | |
Bigger you scale, greater in number of resources | |
Billing roles defined in IAM | |
Org is top in Hierarchy, Billing accounts linked to projects(Required Billing Account User) | |
view billing info - 1.web console 2.export to cloud storage and big query 3.set budgets and alerts | |
find all charges that were more than 3 dollars: | |
SELECT product, resource_type, start_time, end_time, | |
cost, project_id, project_name, project_labels_key, currency, currency_conversion_rate, | |
usage_amount, usage_unit | |
FROM `cloud-training-prod-bucket.arch_infra.billing_data` | |
WHERE (cost > 3) | |
find which product had the highest total number of records: | |
SELECT product, COUNT(*) | |
FROM `cloud-training-prod-bucket.arch_infra.billing_data` | |
GROUP BY product | |
LIMIT 200 | |
which product most frequently cost more than a dollar: | |
SELECT product, cost, COUNT(*) | |
FROM `cloud-training-prod-bucket.arch_infra.billing_data` | |
WHERE (cost > 1) | |
GROUP BY cost, product | |
LIMIT 200 | |
Stackdriver | |
suite of tools for monitoring, logging, and tracking diagnostics for apps; native monitoring of both GCP and AWS; Dynamically discover all GCP resources | |
1.Monitoring - monitor metrics, health checks, dashboard and alerts etc | |
2.Logging - audit of activity | |
3.Error Reporting - identify and understand app errors | |
4.Trace - app engine find bottlenecks | |
5.Debugger - find/fix code errors in prod | |
Benefits - Multicloud monitoring, Identify trends and prevent problems before they occur, Centralized logging, Better signal-noise ratio, Find & fix problems faster | |
3rd party integrations (SRE vendors) - BMC, Splunk, PagerDuty, Tenable, HipChat, netskope | |
Best practice - single project for stackdriver monitoring, determine monitoring needs in advance, IAM controls | |
Concepts: | |
Pricing - Basic and Premium(Seperate from GCP account status); Applies only to monitoring; new accounts 30 day free trial | |
$8 per month per resource; 30 days log retention, 500 time series per chargeable resource, 250 metric types per project | |
Stackdriver agent - software installed on VMs; recommended not required, agentless gets CPU, disk/network traffic, and uptime info; agent access | |
addition resource and application info; requires premium tier, monitor many 3rd party apps(Apache, Kafka, MySQL, Nginx, Tomcat etc) | |
$ curl -sSO https://dl.google.com/cloudagents/install-monitoring-agent.sh | |
$ sudo bash install-monitoring-agent.sh | |
Resources -> Metric Explorer, Cloud Storage etc | |
Groups -> Create Groups (select a project with group of instances) | |
Dashboard -> Create Dashboard | |
Explore resource, Alerting, uptime checks | |
Stackdriver logging: | |
Concepts - repository for log data and events; store, search, analyze, monitor and alert; collect platform, system and app logs(agent); realtime/batch | |
Associated by project; Log entry - record status or event; Log - named collection of log entries; retention period | |
Audit Log Types - 1.Admin Activity(automatically turned on, requires IAM role logging/Logs viewer or Project viewer, always enabled no charge), | |
2.Data Access(create modify or read user-provided data, requires IAM role logging/Private Logs viewer or Project Owner, Disabled charged on usage) | |
Retention - Admin activity 400 days; Data access logs 7/30 days, Non audit logs 7/30 days | |
Allotment - 50Gb per project / 50+14.25MB premium, overage charge $0.50 per GB per project | |
Exporting Logging date - 1.Cloud storage 2.BigQuery 3.stream to other source(pub/sub); requires project/destination bucket; create a filter; | |
choose destination; filter and destination held in a sink | |
Best practices - search for specific values, use adv filters, use adv viewing interface | |
HandsOn: | |
view logs | |
filter(basic/advanced views) | |
turn on real time viewing | |
export logs to cloud storage/big query | |
enable data access logs | |
gcloud projects get-iam-policy pwnet-test2 --format json > policy.yaml | |
ls | |
nano policy.yaml # add auditConfigs: | |
gcloud projects set-iam-policy pwnet-test2 policy.yaml | |
Trace, Error Reporting, and Debugger Concepts | |
Error reporting - real time error monitoring; automatic and real time analysis; automatically enabled in App Engine; | |
Trace - find performance bottlenecks(latency); collect data from GAE, LB, or apps with Stackdriver Trace SDK;automatically enabled in App Engine | |
Debugger - Inspect app state without stopping or slowing app; doesnt req additional log statement; automatically enabled in App Engine standard | |
GCP Core Building Blocks | |
Google Cloud Storage - Unstructured data, virtually limitless size, Pay per use not allocation, primary unit is bucket, object inside bucket | |
Storage Class - Regional, Multi-regional, Nearline, Coldline | |
Changing storage class - cannot change from multi-regional to regional vice versa; gsutil to change class of existing object or move obj to another bucket | |
gsutil(FOR CLOUD STORAGE) | |
https://cloud.google.com/storage/docs/gsutil | |
gsutil mb -l us-central1 -c nearline gs://pwnet-test1-test | |
gsutil ls -l gs://pwnet-test1-test | |
Cloud Storage Security | |
Access Management principles - IAM and ACL | |
IAM - granted at projects, resource or bucket level; Roles - Primitive, Standard Storage roles (independently from ACLs), Legacy roles (work with ACLs) | |
ACLs - can be applied to buckets/objects; Objects inherit ACS from default bucket ACL | |
Best Practice - use IAM over ACL(enterprise grade access control, leaves audit trail); use ACL to grant access to obj without access to bucket | |
signed URLs - times access to object data (temporary access without google account) | |
storage.cloud.google.com/bucketname | |
Assign IAM role to bucket | |
via console | |
gsutil iam ch user"[email protected]:objectCreator,objectViewer gs://pwnet-test1-test | |
gsutil iam -d user"[email protected]:objectCreator,objectViewer gs://pwnet-test1-test | |
Assign ACL role to bucket and objects | |
via console | |
gsutil acl ch -u [email protected]:O gs://pwnet-test1-test | |
gsutil acl ch -d [email protected] gs://pwnet-test1-test | |
gsutil acl ch -u [email protected]:O gs://pwnet-test1-test/3.png # only access to object | |
gsutil -m acl ch -u [email protected]:O gs://pwnet-test1-test/* | |
Mixed owner/read permissions | |
Storage Legacy Bucket Owner - create, upload, delete file but cannot view the contents | |
Storage Object Creator, Storage Object viewer - create, upload and view but cannot delete | |
signed URLs | |
APIs & Services -> Create Credentials -> Service account key -> New service account -> select name and role -> create (JSON file downloaded) | |
ssh session -> upload file -> mv pwnet-test1-xedefdefefe.json pwnet-cert.json | |
gsutil signurl -d 10m pwnet-cert.json gs://pwnet-test1-test/3.png | |
get the URL from output and give it to user who need access to the object | |
Object Versioning and Lifecycle Management Concepts | |
Object versioning - retrieve objects that are deleted or overwritten; applied at bucket level; disabled by default; when enabled objects archived | |
version increase bucket size, archive version retains ACLs; Versioing properties - Generation (obj content change), Metageneration | |
Object Lifecycle management | |
Sets TTL on an object(to delete version/downgrade storage class); Applied to bucket level ; implemented with combination of rules, conditions, actions | |
Rule - Specify set of conditions in order to take action | |
Condition - criteria to meet before action; Age, CreateBefore, IsLive, MatchesStorgaeClass, NumberOfNewerVersions | |
Actions - Delete, SetStorageClass | |
gsutil versioning help | |
gsutil versioning get gs://pwnet-test1-test | |
gsutil versioning set on gs://pwnet-test1-test | |
gsutil ls -a gs://pwnet-test1-test | |
gsutil lifecycle get gs://pwnet-test1-test > policy.json | |
edit the file to change the rule | |
gsutil lifecycle set policy.json gs://pwnet-test1-test | |
Bucket and Object Command Line A-Z | |
gsutil ls -al gs://pwnet-test1-test #gets metageneration | |
gsutil -m rewrite -s NEARLINE gs://pwnet-test1-test/* # set off versioning before, to move diff storage class | |
gsutil acl ch -u AllUsers:R gs://pwnet-test1-test/file.txt # shows as public link on console | |
Interconnecting Networks | |
Worldwide private network; communication between regions and on-premises never touches public internet; networking handled differently than others. | |
SDN - traditional network(manage network hardware, high mgmt overhead req) SDN(Everything is virtualized) | |
single global/cross region VPC; global internal DNS/load balancing/firewalls/routes; global public DNS; Rapid scaling with global LB(Layer 7/HTTP); | |
Subnets within VPC group resources by region/zone; IP range between subnets dynamically expandable. | |
Extend Google Private Network to On-premises - VPN, Cloud Interconnect, Direct Peering | |
Connecting your Network to Google | |
1. Dedicated Interconnect - Physically connect on-premise network to GCP VPC via Google Edge location; Useful for Hybrid env, High bandwidth traffic; | |
Must be at supported peering location; can be direct with Google or ISP; $1700 per 10Gbps link, upto 80 Gbps total; Reduced egress fees | |
Use Cases - On-premise data processing, low latency needs, | |
2. Peering - connect business directly to google; 70+ location in 33 countries for Direct peering; Exchange BGP routes; Direct and Carrier Peering; | |
Does not connect to internet; Also save on egress fees; 10GBps per link(direct), variable for carrier; Use case Ex: Private API excess | |
3. Cloud VPN - Site to site VPN connection over IPSec; connect internal network to GCP over encrypted tunnel over public internet; Up to 1.5 Gbps per tunnel; | |
Can use multiple tunnels for increased performance; Static and dynamic routes(using Cloud Router); Supports IKEv1 and IKEv2 using shared secret; | |
connect on-premises to GCP or connect twoo different VPC's on GCP; No site to client option available. | |
CloudVPN | |
connect on-premise network to GCP VPC; IPSec connection over VPN over public internet; traffic encrypted by one gateway, decrypted by other gateway. | |
99.9% SLA, Site-to-site only; Upto 1.5Gbps per tunnel, can have multiple tunnel; Static and Dynamic routes | |
Use case - Connect to on-premises or connect 2 different VPC network on GCP | |
Requirement - VPN Gateway on both ends(peer), Peer Gateway must have static IP; Non conflicting CIDR range/subnet with rest of network | |
Cloud Router - Static vs Dynamic routing; Static:create routing table for existing and new routes, Can't re-route if link fails; Dynamic:networks | |
automatically discovery topology changes via BGP; Can re-route if link fails | |
To use Dynamic routing, change dynamic routing mode to Global on VPC network. | |
Google ASN(65000-65001) and BGP address(169.254.0.1-169.254.0.2) required | |
Tunnel IP is static IP of other VPN Gateway | |
Add BGP session for Dynamic Routing | |
gsutil cp gs://gcp-course-exercise-scripts/vpn-exercise-script.sh . | |
bash vpn-exercise-script.sh | |
Virtual Networks | |
VPC Concepts | |
subnets are region bound and can span span multiple zones | |
isolated per project; but can share between projects with Shared VPC | |
Quotas - Hardcap of 7000 VMs in a VPC; IPv4 unicast traffic only; Most other quotas can be increased by request | |
Network Tags - primary method of segmenting network traffic access; apply to firewall and network routes; individual instances are tagged | |
Firewall - single firewall for entire VPC; manage both ingress and egress traffic; Deny all Ingress, Allow all egress; Conditions - source/target, port, protocols, Tags | |
create firewall rules - ssh-icmp-instance2(Tag: restrict-access); internal-allow-all; ssh-allow; ping-allow | |
Firewall Rules via Command Line - vnc-desktop | |
firewall rule for port 5901 | |
vnc-allow; Target tag vnc-server; tcp:5901 # get the command line and paste in CLI | |
gcloud compute firewall-rules create vnc-allow ...... | |
gcloud compute instances add-tags vnc-desktop --tags vnc-server | |
Routes - software based, not limited by hardware; routes traffic leaving VMs; special case for advanced routing Many-to-one route, Proxy server; | |
Routes+firewall rules combine to determine traffic access | |
Shared VPC Concepts | |
share VPC across projects within Org(Cross Project Networking) | |
Host project - project hosting the shared VPC; Service project - project with permission to shared VPC; Standalone project - project not using shared VPC; | |
Shared VPC admin - IAM role for admin of shared VPC; Service project admin - project admin of shared VPC service project | |
Use cases - Seperation of projects for access control/billing, but need access to same VPC environment; 2 tier web service; Hybrid cloud scenario | |
IAM roles - Org Admin, Shared VPC admin(Org level role), Network user-compute.networkUser(Project level role) | |
Compute Shared VPC admin role required to the user to enable Shared VPC; | |
gsutil cp gs://gcp-course-exercise-scripts/firewall-exercise-script.sh . | |
Compute Engine Deep Dive | |
GCE, GKE, GAE all run on VMs; Single VM, Force multipliers, Automation, Autoscaling, Managed Instance Groups, Load Balancer, Custom Image, Disk manipulation, | |
Metadata, Startup/Shutdown scripts,Snapshots, Persistent Disks, gcloud commands | |
Disk concepts - Single root disk for OS; Persistent(most common, default, Not Directly attached) or Local SSD (Directly attached) or Cloud Storage Buckets | |
Persistent - 64 TB in total, Scope of access zone, no RAID config necessary | |
Local SSD - cannot be boot device, encrypted(Google Supplied Keys only), 375GB in size (can attach upto 8), must create on instance creation | |
Cloud Storage Bucket - Not a root disk, Encrypted, Lower performance | |
Disks are zone bound | |
gcloud compute disks create disk03 --size 50GB --type pd-standard | |
sudo lsblk | |
sudo growpart /dev/sda 1 | |
sudo resize2fs /dev/sda1 | |
Images Concepts | |
Images - create new instances, configure instance templates; access across projects | |
Snapshots - periodic incremental backup of existing disk/instance, access only from within same project | |
Images created from Persistent disk, another image in same project, image shared from another project, compressed image from cloud storage | |
Image families(group related images together); Deprecating images - transition user away from older unsupported version in manageable way, Deprecation | |
states: Deprecated, Obsolete, Deleted, Active(command only) | |
Sharing and moving images - Require Compute Engine Image User role to host project; For managed instance group, service account must be granted role | |
Export image to cloud storage - export image as a tar.gz to cloud storage(only linux); share with Image User role is preferable | |
Hands On - Custom Images | |
gcloud compute images describe-from-family webserver | |
gcloud compute images deprecate webserver-base --state ACTIVE | |
Snapshot Concepts | |
For windows use VSS snapshots; run fstrim before taking snapshot(linux) | |
gcloud compute disks snapshot website --zone us-central1-a --snapshot-names=website-backup-2 | |
gcloud compute snapshots list | |
gcloud compute snapshots describe website-backup | |
Startup and Shutdown scripts | |
ease managemet of large no of VM's; easily and programmatically customize VM; key component to instance group and scaling capabilities | |
always run as root/administrator; Input methods - Direct (script field in instance properties), Link to script on Cloud Storage | |
Shutdown scripts - great for managed instance group/autoscaler; Ex: copy processed data to cloud storage, backup logs etc; Good to pair with preemptible | |
Metadata server - Built into GCP; Manage config and env variables programmatically; Default and custom values; Key/value pair | |
Metadata -> startup-script-url gs://pwnet-bucket1/startup_script.sh | |
Elastic Cloud Infrastructure: Scaling and Automation | |
Load Balancing and Instance Groups | |
Force Multipliers - Automation and Scaling - Scalable, Automatic | |
Repeatabale, documented, scalable, necessary for large architecture, reduce complexity | |
Load Balancer, Instance Group, Autoscaling | |
Load Balancer - distributes user network requests among a pool of instances; single frontend point of access; SDN; Global or regional in scope; | |
traffic subject to firewalls; Types - Global External LB(HTTP/s, SSL/TCP Proxy), Regional External LB(Network TCP/UDP), Regional Internal LB | |
HTTP(S) LB - Manages HTTP(s) requests; Global scope, IPv4 and IPv6, Distribute traffic by location or content requested; Paired with IG for backend; | |
Native support for websocket protocol | |
Network LB - non HTTP(s); balance requests by IP protocol data; Forwarding rules, Target pool | |
Network Internal LB - private LB; used with multi-tier app; affects cloud router dynamic routing | |
Instance Group and Autoscaling | |
IG - group of instances; Manages as a group not one at a time; Managed IG and Unmanaged IG; | |
Features - Autoscale, work with LB, Health check-ASG; Require Instance templates(Define Group config, Global); From template create Managed IG | |
Networking - subject to firewall rules for allowed traffic; essential for LB; LB->Backend Service->Backend->IG ; | |
Health checks - Auto healing; Managed IG only; if instance or service fails, delete and recreate identical | |
Updating Managed IG - Managed Instance Group updater; | |
Autoscaling - automatically scales IG; Managed IG only; Set by autoscaling policy; Set metric and threshold; set min and max instance count | |
AS based on CPU usage, HTTP load balancing usage, Stackdriver monitoring metric, Multiple metrics | |
simulate Load Testing via instance | |
ab -n 500000 -c 1000 http://xx.xxx.xxx.xxx/ | |
Cloud Deployment Manager Concepts | |
Infra deployment service; Automates creation/deployment of GCP resources(configuration files and templates); Standardize and repeatable; | |
Used by Cloud Launcher to create, easy one click deployments | |
How it works - Deploy with command line only; IaC; calls on API resources; Configuration file YAML format; contain resource section followed by list | |
of resources; Resource components - Name, Type, Properties; Templates - config file contains templates; Python or JINJA2 format; reusable | |
Manifest - Read only output of final config; Includes config Yaml, imported templates, expanded resource list; use for troubleshooting | |
vm.yaml | |
API call needs project name | |
gcloud deployment-manager deployments create test-deployment --config vm.yaml | |
gcloud deployment-manager deployments delete test-deployment | |
GKE/GAE Exam Perspective | |
Infrastructure, how to build, how to manage, best practices | |
GKE/GAE are managed Infra, developer/code focused; High level understanding of GKE/GAE; when to choose one of them over other options; | |
Managing app engine versions, resizing k8s engine cluster | |
Containers | |
Container Resources | |
Container Builder, Container Registry, GKE | |
Container/Kubernetes Engine Cluster | |
gcloud container clusters create bookshelf --zone uns-central1-a --machine-type f1-micro --num-nodes 3 | |
gcloud command to input for changing the size of node pool | |
gcloud container clusters resize bookshelf --size 5 | |
gcloud command to change machine type without stopping cluster | |
migrate the instances to different node | |
gcloud container clusters delete bookshelf | |
App Engine Resources and Management | |
Cloud Source Repository - Private Git repo hosted on GCP; Use with stackdriver to debug info alongside your code; connect to github/bitbucket; | |
source code browser | |
GAE Management - Cloud shell(preview in local env without deploying); versions+split traffic(Rollout update slowly); | |
Firewall rules act differently(Default allow all, control access from IP ranges, cannot filter traffic type, Block malicious IP); | |
Best Practices - Break app into microservices; Rollout update slowly with split traffic; Use blue-green deployment model | |
Go to App Engine directory, create sandbox env using "dev_appserver.py ./app.yaml" | |
Build and Deploy a Scalable Company Website | |
Deploy a Cloud Network Monitoring Service to Monitor On-Premises Network |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment