Google Cloud Certified - Professional Cloud Developer
Cloud computing has five fundamentals attributes
- Customers get resources on demand and self-service
- Customers can access the resources over the network
- The provider has a pool of resources so the customer doesn’t know the exact location of the resource.
- The customer can rapidly scale them
- Customers pay only for what they use
Waves of the trend towards cloud computing
- Colocation: You host the servers
- Virtualized data centers: You still own the infrastructure, but it is virtualized
- Container based architecture: More versatile than virtualization
Kind of services
Traditional On-Premises Legacy
Infrastructure as a service (IaaS)
- You hire the hardware but you still are in charge of everything else. You deploy your app in a virtual machine.
- You pay for what you allocate
- (amazon EC2, digitalOcean)
Platform as a service (PaaS)
- You hire the hardware and the software, but you still have to manage it. You build your software on top of the platform
- You pay for what you use
- (heroku, salesforce)
Software as a Service (SaaS)
- You hire everything, you just use the software (paypal, facebook)
Regions and Zones
Regions are independent areas that contain zones. The round-trip latency between two points in the same region is under 5 milliseconds. A zone is the minimum area for a failure, so to get a resilient application, it must be deployed across multiple zones.
In 2019 there are 20 regions.
Google Cloud Compute Engine VM instance resides in a specific zone
Resources
There are regional and multiregional resources. Those have a higher latency but are more fault-tolerant. Those services have regional and multi-regional deployments:
- Google App Engine
- Google Cloud Datastore
- Google Cloud Storage
- Google Big Query
Resource hierarchy
- A policy can be defined
- When created, all users can create projects and billing accounts
- Folder (optional)
- A policy can be defined
- Can contain more Folders
- If one exists, a org node must exist as well
- Project
- A policy can be defined
- It is the basis for enabling and using services
- It can have different owners and users
- They are billed and managed separately.
- It has to have:
- Project ID (custom-unique-immutable)
- Project Name
- Project number (assigned-unique-immutable)
- Some resources allow to define a policy
- A resource belongs to only one project
Policies are inherited across the tree
Policies cannot remove access of something granted in a more specific rule. If there are several rules at the same level, the less restrictive has prevalence.
Google Cloud billing
- Billed in seconds (compute and data processing):
- Compute Engine
- Kubernetes Engine
- Cloud Dataproc (open source Big Data system Hadoop as a service)
- App Engine flexible environment VMs
- Discount for every machine used more than 25% in a month
- Discount for long-term workloads
- Discount for preemptive use
- Pay only for the resources you need
Open APIs
- BigTable uses Apache HBase interface
- Cloud Dataproc uses Hadoop as managed Service
- TensorFlow: software library for machine learning
- Kubernetes and Google Kubernetes Engine can mix microservices across different clouds.
- Google Stackdriver lets customers monitor workloads across multiple cloud providers
Security
- cryptographic signatures over the BIOS
- bootloader
- base operating system image
- Physical security in data centers
- Host some servers in third party data centers
- Encryption of inter-service-communication
- cryptographic privacy for RPC
- Services communicate with RPC calls
- User of hardware cryptographic accelerators
- login and password
- logging device
- location of login
- Second factor authentication
- Encryption of the storage media
- Google Front End (GFE)
- TLS encryption
- Protection against Denial of Service
- absorb many DoS attacks by nature of the scale
- multi-tier and multi-layer protection
- Intrusion detection
- Software development practices:
- central review control
- two-party review of new code
Service categorization
- Networking
- Operations/Tools
Budgets and alerts
- Budgets are defined at the billing account
- Alerts at a % of the budget
- Billing export for example Big Query and Cloud Storage Budget
- Reports defined for a project or service
- Quotas applied at the project level
- Rate quota resets after a specific time
- Allocation quotas is applied for a number of resources
- Both of them are requested to the Google Cloud Support
Google Cloud Identity and Access Management
Like an ACL allows somebody do an action on a resource.
Who:
- Google account
- Google group
- Service account
- They authenticate using keys
- Cloud Identity or G Suite Domain
Can do what:
- Managed by roles (service, resource, verb)
- Owner
- Editor
- Viewer
- Billing Administrator
- particular permissions on particular services
- They can only be used at the project level
On which resource
Permissions can be exported from LDAP (one way)
Interaction with Google Cloud Platform
- Cloud Platform Console
- Cloud Shell/Cloud Sdk
- gcloud tool
- gsutil tool
- bq tool
- It gives a temporary compute engine virtual machine instance running Debian
- 5GB of persistent disk storage mounted in $HOME
- Built in authorization for access to projects and resources
- Cloud Console Mobile App
- REST-based API
- use JSON as interchange format
- use OAuth 2.0 for authentication and authorization
- Cloud Client Libraries / Google API Client Libraries (Latest)
- Cloud marketplace
- offered by Google and third parties
Virtual Private Cloud Networking
They are global. Subnets are regionals.
A VPC belongs to a Google Cloud Platform Project.
Virtual Private Cloud Networks have routing tables to forward traffic between instances. There is a global firewall and it can be configured by compute engine instance, defined by metadata tags (for example: all instances with “WEB” tag are allowed as incoming traffic on ports 80 and 443)
With VPC peering you can add visibility across different Google Cloud Platform projects
- Putting a router in the same google network and route traffic to a on-premise system
- It is not covered by the Google SLA
- Carrier Peering
- Dedicated Interconnect
- You get a dedicated connection to Google. It is covered up by up to 99.99% SLA
- connection through a supported service provider
- Useful if your physical connection cannot reach a dedicated interconnect
- Downtime is tolerated
Compute Engine
- CPU, Memory, amount of storage and OS selection and can be changed
- Persistent storage and it can be rescaled with no downtime
- A Windows or Linux premade image as well as a custom image can be run
- Billed per second
- Discount per incremental minute if you run 25% of the month (30% discount if you run the entire month)
- 57% discount if you use continuously a 1 or 3 years of cpu usage
- Discount if you use a preemptible machine (is stoppable if needed elsewhere)
In 2019 the maximum number of virtual CPUs was 96 (zone dependent), the maximum memory size was 624 GB. A mega memory machine can handle 1.4 TB
Auto Scaling allows to add and remove Virtual Machines for your applications based on load metrics.
Cloud Load Balancing
Provides a cross-region load balancing, including automatic multi region failover
- Layer 4. SSL not HTTP
- Specific port numbers
- Layer 4. No HTTP, no SSL
- Specific port numbers
- UDP Traffic
- Any port number
Cloud DNS
8.8.8.8 DNS for de www free
Cloud DNS is a managed DNS Service, It is programmable using de GCP Console, the command line interface or de API
Cloud CDN (Content delivery Network)
Enabled by a single checkbox in the Load Balancer
Cloud Storage
- binary large-object storage addressed by unique keys
- Information saved encrypted, from server side
- The objects are immutable
- It is useful when large-object storage is needed
- There is available a service to send large amounts of offline data, hdd or usb flash drives
- The files are organized into buckets (location and name are picked by the user)
- There is availability (turned off by default) of versioning
- Offers lifecycle management policy
Cloud Storage Classes
- Store your data in at least two geographical locations separated by at least 160km
- Used for frequently accessed storing data
- Lets you store your data in a Region
- Used with Compute Engine and Kubernetes Engine
- Ideal when you modify your data once a month
- Ideal to access your data once a year
Ways to bring data to Cloud Storage
- gsutil
- drag&drop
- Online Storage Transfer Service
- Schedule batch transfer for another endpoint (G Cloud or another provider)
- Offline Transfer Appliance
Cloud Storage integration
- Import and export tables to/from Big Query
- Startup scripts, images and objects from Computer Engine
- Logs and images storage from App Engine
- Datastore Backups
- Import and export tables from Cloud SQL
Cloud Bigtable
It is a full managed NoSQL database for Terabytes Applications
- It uses HBase API
- Compatible with Hadoop ecosystems
- Streams to Cloud Dataflow streaming, Spark Streaming and Storm or batch processes
- It is the same database that use Search, Analytics, Maps and Gmail
When to use Cloud Bigtable
- There is large amount of data ( Petabytes)
- Data is changing fast
- strong ralational relationships are not required
- Data is natural ordered by time
- You run asynchronous batch on real time
- You run machine learning algorithms.
- You don’t need multi-row transactions
To sum up, it handles massive workloads, has low latency and high throughput. It is apropriate for operacional and analytical application and IoT
Cloud SQL
Managed RDBMS (Relational Data Base Management System).
Offers MySQL and PostgreSQL database as a service:
- From Google instances to Google instances
- From non Google instances to Google Instances
- From Google Instances to non Google Instances
- Up to 7 backups for instance
- encrypted data
- Vertical Scalling (Read and Write)
- Horizontal Scalling (read)
- Google security
- network firewall
- Up to 10 TB of storage
Cloud SQL integration
- With App Engine standard drives
- Compute Engine using external IP Addresses
- With external applications and clients
Cloud Spanner
- Horizontal and scalable RDBMS
- Automatic replication
- Strong and global consistency
- Managed instances with high availability
- use SQL
It is appropriate if you need:
- A RDBMS with joins and secondary indexes
- High availability
- Strong global consistency
- Database size up to Petabytes
- Many IOPS
Cloud Datastore
Fully managed NoSql Database designed for application bakends
High-scalable
Automatic scaling
support for Databases with Terabytes
support multirow transactions
Benefits of Cloud Datastore
Local development tools
Includes a free daily quota
Restful interface
Atomic transactions (ACID)
High availability of reads and writes
Massive scalling with high performance
Flexible storage and querying of data (SQL - like language)
Encryption at rest
Fully managed with no downtime
Google Kubernetes Engine
- managed, production ready environment for developing containerized applications
- Grants high availability
- Runs Kubernetes, thus enusres portability across clouds and on-premises
- Includes auto node-repair, auto upgrade, auto scalling
- Regional clusters with multiple masters and node storage replication across multiple zones
Google Kuberenetes Engine GKE On-Prem
It is a GKE to run On Premise
- kubernetes best practices pre-loaded
- easy update to latest Kubernetes Engine
Stackdriver
Built-in logging and monitoring solution for Google Cloud Platform
- view, filter, search logs
- Define metrics
- Incorporate in alerts
- Export logs
- Cloud BigQuery
- Cloud Storage
- Cloud Pub/Sub
- Metrics collection
- Monitoring
- Dashboarding
- Alerting solutions
- Debugging
- Connects the production code application whit the source code and takes snapshots of the values
- Tracks and group the errors, and notify when new ones are detected
- Observe the call parameters between functions, cpu, memory
App Engine
It is a Platform as a service for scalable Applications
Designed for Backend applications and mobile backends
There is a free daily use quota
Provides:
- NoSQL Datastore
- memcache
- load balancing
- health checks
- application logging
- User autentication API
Scales automatically depending on the amount of traffic
Preconfigured with:
- Java 7
- Python 2.7
- Go
- PHP
- (Specific versions are supported)
Persistent storage with queries, sorting and transactions
Restrictions:
- No writing to local file system
- all request time out at 60 seconds
- Third party software is limited
There is a simulated sandbox to emulate app engine in your local computer. From there you can launch a deploy in App Engine, in production
Security scanner
Automatically scans and detects common vulnerabilities
App Engine Flexible
Runs in a container instead of a sandbox (Docker inside Compute Engine)
Customizable container
Instances are auto health-checked
Critical backward compatibility operating system updates are automatially aplied
instances are restarted every week
App Engine Flexible can access App Engine services
Support for:
- Java 8
- Servlet 3.1
- Jetty 9
- Python 2.7
- Node.js
- Go
Cloud Endpoints
Distributed API management system. It works with those APIs that implements Open API specification (Former swagger)
- Uses autentication
- Automated deployment
- Logging and monitoring
- API Keys
- Easy integration
Supported platforms for Cloud Endpoints
- App Engine Flexible environment
- Kubernetes Engine
- Compute Engine
- Android
- iOS
- Javascript
Apigee
Platform for developing and managing API proxies
- Helps you to secure and monetize APIs
Cloud Source Repository
It is a Git Repository hosted on Google Cloud Platform
Includes integration with Stackdriver Debugger without slowing down the users
Allows any number of Git repositories
Integration with Github and Bitbucket repositories
Cloud Functions
Single purpose functions that respond to events without a server or runtime:
- from Cloud Storage
- from Pub/Sub
- HTTP invocations for synchronous exceution
Created in Javascript, Phyton or Go and executed in a Node.js environment
You ar billed to the nearest 100 milliseconds, only when the code is running.
Deployment Manager
Infrastructure management service that automates the creation and management of resources.
You create a .yaml file or python and the deployment manager do the actions needed to deploy the environment your template describes
Cloud Dataproc
A managed way to run Hadoop, Spark, Hive and Pig on Google Cloud Platform. A Hadoop cluster will be built in 90 seconds or less.
It can be monitored with Stackdriver
Peemtible instances can be used to make them cheaper.
When the data is in your cluster, you can use Spark to mine it. It discover patterns through machine learning.
Cloud Dataflow
When the data shows up in real time or has unpredictible size Dataflow is a good choice. It is used to build data pipelines in batch and in streaming models:
- Resource Management
- On demand (autoscale)
- Intelligent work-scheduling
- Autoscaling (horizontal)
- Unified programming Model
- Open Source
- Monitoring
- Integrated
- Cloud Storage
- Cloud Pub/Sub
- Cloud Datastore
- Cloud Bigtable
- BigQuery
- extensions to Kafka and HDFS
- Reliable & Consistent Processing
BigQuery
Is a fully managed Data warehouse. Provides nearly rea-time analysis of hundreds of TB
Use of SQL.
Features:
- Cloud Datastore
- Cloud Storage
- Streaming
- Globl availability
- Security and permissions
- Cost controls
- Hihgly available
- Super fast performance
- Integrations:
- Cloud dataflow
- Spark
- Hadoop
- Export to google products
- There are some limitations in the databases
- Discount for continous usage
- Petabytes of database size
Cloud Pub/Sub
Many to many asynchronous messages
Applications subscribe to topics
Integration with Cloud Dataflow
Grants at least one time delivery at low latency
- Highly scalable
- Encryption
- Replicated storage (replicated in multiple servers and in multiple zones)
- Message queue by topic
- end to end acknowledgement
- Fan out
Suitable for:
- building blocks in Dataflow, IoT or Marketing analytics
- Push notification for cloud-based applications
- Connect applications (Compute Engine and App Engine)
Cloud Datalab
Lets you useu Jupyter notebooks to explore, analyze and visualize data on Google Cloud Platform
Shows an interactive Pyton interface ready to use for data exploration
Integrations:
- BigQuery
- Compute Engine
- Cloud Storage
Multilanguage Support:
Pay per use pricing
Interactive data visualization
Git-based control version, linkable with GitHub and Bitbucket
Open Source
IPhyton support
When to use:
- Documentation
- visualization
- Analize BigQuery, Compute Engine and Cloud Storage using Pyton, SQL and Javascrip
TensorFlow
TensorFlow is an open source software library for machine learning.
Cloud Vision API
Analyzes images with a REST API
- Detect inapropriate content
- Analize Sentiment
- Extract text
- Get keyworks
Cloud Speech API
- Recognizes over 80 languages
- Can return text in real time
- Highly accurate
- Access from any device
Cloud Natural languge API
- Reveal structure and meaning of the text
- syntax analysis
- identify nouns, verbs, adjetives
- recognize people, places
- Extract information about items showed in texts
- Integrate with Cloud Storage
- Available in English, Spanish and Japanese
- Integrated in REST API
- Sentiment analysis
Cloud Translation API
- Translate arbitrary strings between thousands of language pairs
- Language detection
Cloud Video Intelligence API
- Annotate the contents of video
- Detects scene changes
- Flag inappropriate content
Cloud CDN (Content Delivery Network)
- Cache load-balanced frontend content that comes from Compute Engine
- Cache static content that is served from Cloud Storage