Sunday, January 17, 2021

A brief introduction to Cloud Computing Concepts

Learn about cloud computing concepts, advantages and disadvantages of cloud computing, cloud computing deployment models like IaaS, PaaS, SaaS

Cloud computing concepts, advantages and disadvantages, IaaS, PaaS, SaaS

What Is Cloud Computing?

Cloud computing is the on-demand delivery of compute power, database storage, applications, and other IT resources through a cloud services platform via the internet with pay-as-you-go pricing. Cloud computing is providing developers and IT departments with the ability to focus on what matters most and avoid undifferentiated work like procurement, maintenance, and capacity planning. As cloud computing has grown in popularity, several different models and deployment strategies have emerged to help meet specific needs of different users. Each type of cloud service, and deployment method, provides you with different levels of control, flexibility, and management. Understanding the differences between Infrastructure as a Service, Platform as a Service, and Software as a Service, as well as what deployment strategies you can use, can help you decide what set of services is right for your needs.

Cloud computing-

Uses Internet technologies to offer scalable and elastic services. The term “elastic computing” refers to the ability of dynamically acquiring computing resources and supporting a variable workload.
The resources used for these services can be metered and the users can be charged only for the resources they used.
The maintenance and security are ensured by service providers.
The service providers can operate more efficiently due to specialization and centralization.
Lower costs for the cloud service provider are past to the cloud users.
Data is stored: closer to the site where it is used but in a device and in a location-independent manner.
The data storage strategy can increase reliability, as well as security, and can lower communication costs.

Who is using cloud computing?

Organizations of every type, size, and industry are using the cloud for a wide variety of use cases, such as data backup, disaster recovery, email, virtual desktops, software development and testing, big data analytics, and customer-facing web applications. For example, healthcare companies are using the cloud to develop more personalized treatments for patients. Financial services companies are using the cloud to power real-time fraud detection and prevention. And video game makers are using the cloud to deliver online games to millions of players around the world.

Six Advantages and Benefits of Cloud Computing

Trade capital expenses for variable expense
Benefit from massive economies of scale
Stop guessing capacity
Increase speed and agility
Stop spending money on running and maintaining data centers
Go global in minutes

Why cloud computing could be successful when other paradigms have failed?

It is in a better position to exploit recent advances in software, networking, storage, and processor technologies promoted by the same companies who provide cloud services.
It is focused on enterprise computing; its adoption by industrial organizations, financial institutions, government, and so on could have a huge impact on the economy.
A cloud consists of a homogeneous set of hardware and software resources.
The resources are in a single administrative domain (AD). Security, resource management, fault-tolerance, and quality of service are less challenging than in a heterogeneous environment with resources in multiple ADs.

Types of Cloud Computing

The three main types of cloud computing include Infrastructure as a Service, Platform as a Service, and Software as a Service. Each type of cloud computing provides different levels of control, flexibility, and management so that you can select the right set of services for your needs.

Infrastructure as a Service (IaaS)

Infrastructure as a Service, sometimes abbreviated as IaaS, contains the basic building blocks for cloud IT and typically provide access to networking features, computers (virtual or on dedicated hardware), and data storage space. Infrastructure as a Service provides you with the highest level of flexibility and management control over your IT resources and is most similar to existing IT resources that many IT departments and developers are familiar with today.

The user is able to deploy and run arbitrary software, which can include operating systems and applications.
The user does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of some networking components, e.g., host firewalls.
Services offered by this delivery model include: server hosting, Web servers, storage, computing hardware, operating systems, virtual instances, load balancing, Internet access, and bandwidth provisioning.

Platform as a Service (PaaS)

Platforms as a service remove the need for organizations to manage the underlying infrastructure (usually hardware and operating systems) and allow you to focus on the deployment and management of your applications. This helps you be more efficient as you don’t need to worry about resource procurement, capacity planning, software maintenance, patching, or any of the other undifferentiated heavy lifting involved in running your application.

Allows a cloud user to deploy consumer-created or acquired applications using programming languages and tools supported by the service provider.
The user:

Has control over the deployed applications and, possibly, application hosting environment configurations.
Does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage.

Not particularly useful when:

The application must be portable.
Proprietary programming languages are used.

The hardware and software must be customized to improve the performance of the application.

Software as a Service (SaaS)

Software as a Service provides you with a completed product that is run and managed by the service provider. In most cases, people referring to Software as a Service are referring to end-user applications. With a SaaS offering you do not have to think about how the service is maintained or how the underlying infrastructure is managed; you only need to think about how you will use that particular piece of software. A common example of a SaaS application is web-based email where you can send and receive email without having to manage feature additions to the email product or maintaining the servers and operating systems that the email program is running on.

Applications are supplied by the service provider.
The user does not manage or control the underlying cloud infrastructure or individual application capabilities.
Services offered include:

Enterprise services such as: workflow management, group-ware and collaborative, supply chain, communications, digital signature, customer relationship management (CRM), desktop software, financial management, geo-spatial, and search.
Web 2.0 applications such as: metadata management, social networking, blogs, wiki services, and portal services.

Not suitable for real-time applications or for those where data is not allowed to be hosted externally.
Examples: Gmail, Google search engine.

Cloud Deployment Models

There is a range of deployment models, from all on-premises to fully deployed in the cloud. Many users begin with a new project in the cloud, and they might integrate some on-premises applications with these new projects in a hybrid architecture. They might decide to keep some legacy systems on-premises. Over time, they might migrate more and more of their infrastructure to the cloud, and they might eventually reach an all-in-the-cloud deployment.

Cloud

A cloud-based application is fully deployed in the cloud and all parts of the application run in the cloud. Applications in the cloud have either been created in the cloud or have been migrated from an existing infrastructure to take advantage of the benefits of cloud computing. Cloud-based applications can be built on low-level infrastructure pieces or can use higher level services that provide abstraction from the management, architecting, and scaling requirements of core infrastructure.

Hybrid

A hybrid deployment is a way to connect infrastructure and applications between cloud-based resources and existing resources that are not located in the cloud. The most common method of hybrid deployment is between the cloud and existing on-premises infrastructure to extend, and grow, an organization's infrastructure into the cloud while connecting cloud resources to internal system. For more information on how AWS can help you with your hybrid deployment, please visit our hybrid page.

On-premises

Deploying resources on-premises, using virtualization and resource management tools, is sometimes called “private cloud”. On-premises deployment does not provide many of the benefits of cloud computing but is sometimes sought for its ability to provide dedicated resources. In most cases this deployment model is the same as legacy IT infrastructure while using application management and virtualization technologies to try and increase resource utilization.

Community Cloud

The infrastructure is shared by several organizations and supports a community that has shared concerns.

Challenges for cloud computing

Availability of service; what happens when the service provider cannot deliver?
Diversity of services, data organization, user interfaces available at different service providers limit user mobility; once a customer is hooked to one provider it is hard to move to another. Standardization efforts at NIST!
Data confidentiality and auditability, a serious problem.
Data transfer bottleneck; many applications are data-intensive.
Performance unpredictability, one of the consequences of resource sharing.
How to use resource virtualization and performance isolation for QoS guarantees?
How to support elasticity, the ability to scale up and down quickly?
Resource management; are self-organization and self-management the solution?
Security and confidentiality; major concern.
Addressing these challenges provides good research opportunities!!

Sunday, January 10, 2021

What is inclusive leadership? What are the characteristics of inclusive leadership?

Inclusive leadership and the characteristics of inclusive leadership

An inclusive leader uses EACH attributes (Empowerment, Accountability, Courage, and Humility) to both increase awareness of these challenges and to overcome them with appropriate action. The components of EACH support inclusion by valuing people's individuality and finding common ground:

Empowerment allows people to do things their way.
Accountability holds people responsible for their own actions.
Courage helps people put group interests above personal ones.
Humility fosters connections by encouraging people to learn from one another and demonstrate vulnerability and trust.

Characteristics of inclusive leaderships:

Leadership is about influencing others to achieve a common goal.
Often, leadership is not as complicated as we think it is.
Anyone can lead- whether you have authority over others or not. No matter where you are in your career or whatever your status or position, you can lead.
Leadership requires simple action that anyone can do-for example, be willing to stand out from the crowd or support a new idea or ask a difficult question when no one else is asking.
And most notably, "followers" are also leaders. The first follower turns a lone nut into a leader! Followers are leaders in their own right and in fact, inclusive leaders make space for others to lead, by following them.
Inclusive leadership positively impacts everyone-on matter whether you are a man or a woman, old or young or of a particular race, color or nationality. Anyone can be an inclusive leader and everyone benefits from inclusion.
Inclusive leaders value the diverse talents and experiences of people they influence or who are their teams.
Inclusive leaders do not stereotype or alienate people they influence or who are on their teams or make them feel reluctant to share ideas that set them apart, which can lead to group think.
When inclusive leadership is effective, people feel more included and are more likely to go above and beyond call of duty, snuggest new ideas and ways of getting work done.
Inclusive leader are aware of their own biases and assumptions, taken action and execute the EACH method: Empower your direct reports and team members, hold them Accountable, be Courageous, and show Humility as a leader.

Inclusion values both:

Uniqueness: Standing out from the crowd (coworkers, colleagues, team members, peers) and being and feeling recognized for what's distinct about you.
Belongingness: Being and feeling accepted as part of the crowd, regardless of your differences or similarities with others.

Sunday, December 20, 2020

Data Cleaning, Normalization, and Enhancement

Data cleaning, normalization, and enhancement techniques aim to address the quality of data sets. This can be measured in a number of ways; we will define each of them below by referring to the concepts we have seen in the previous sections.

Validity refers to whether the values in the data set are of acceptable data types (e.g., integer, fractional number, or text), fall within acceptable ranges (e.g., between 0 and 100), are from an approved list of options (e.g., "Approved" or "Rejected"), are non-empty, and so on.
Consistency refers to whether there are contradictory entries within a single data set or across data sets (e.g., if the same customer identifier is associated with different values in the an address column).
Uniformity refers to whether the values found in records represent measurements in the same units (within the data set or across data sets).
Accuracy refers to how well the values in each record represent the properties of the real-world object to which the record corresponds. In general, improving accuracy requires some external reference against which the data can be compared.
Completeness refers to whether there are any missing values in the records. Missing data is very difficult to replace without going back and collecting it again; however, it is possible to introduce new values (such as "Unknown") as placeholders that reflect the fact that information is missing.

Common Data Transformations

Forms of common data transformation

The basic data transformation types are enumerated and discussed in detail below:

Union transformations take two data sets as their input and produces an output data set that contains all entries found in both data sets. The output data set must have at least as many records as one of the two input data sets.

Intersection transformations take two data sets as their input and produces an output data set that contains only those entries found in both input data sets. The output data set must have at most as many records as one of the two input data sets.

Difference transformations take two data sets as their input and produce a data set that contains only those records found in the first data set but not the second data set. The output data set must have at most as many records as the number of records in the first input data set.

Selection transformations involve extracting some portion of the data based on zero or more filtering conditions or criteria. A selection transformation might return the entire original data set (e.g., if the criteria are already satisfied by all the records in the input data set), but it cannot return a result that is larger than the original data set.

A filtering condition within a selection transformation usually consists of a logical expression that is either true or false for each record. The condition can reference the values found in each record using their corresponding attribute/column names; it can also contain arithmetic operators (addition, subtraction, multiplication, division, and so on), relational operators (equality, comparison, and so on), and logical operations (such as "and" and "or"). In some database management systems, more complex conditions can be defined (e.g., ones that do text search).

Projection transformations involve converting every record in a data set in some way to produce a new data set. A projection transformation always produces the same number of records in the output data set as there were in the input data set. The conversion itself might throw away some attributes/columns or might introduce new ones. The definition of a projection transformation can use arithmetic and other operations to transform the values inside the input data set's records into the values within the records of the output data set.

Renaming transformations simply rename one or more of the attributes/columns. They are usually combined with projection and selection transformations so that the output data sets can have informative attribute/column names.

The advanced data transformation types are enumerated and discussed in detail below for your reference and review.

Aggregation transformations involve combining the values within a particular attribute/column across all records in a data set. Examples of tasks for which this may be useful include counting the number of records in a data set, taking the sum of all values in a column, finding the maximum value across all values in a column, and so on. In its basic form, an aggregation transformation produces a data set with exactly one record.

In some languages and database management systems, it is possible to group the records using a particular attribute (which we call the grouping attribute) when performing an aggregation. In this case, the aggregation operation is only applied to those collections of records that have the same grouping attribute. In this case, the number of records in the output data set corresponds to the number of unique values found in the grouping attribute/column.

Join transformations take two input data sets and return their Cartesian product. Thus, the number of entries in the output data set may be larger (even significantly larger) than the number of entries in each of the two input data sets. It is common to combine join transformations with selection transformations in order to pair corresponding records using their identifiers (or other attributes) even if the records are found in different data sets. One example of this might be matching all purchase records in a purchases data set with all customer records in a customers data set.

Sources of data available both inside and outside of the organization and Data source terminologies

Internal and External data sources available in a company

Potential sources of data can vary across scenarios, and it would be easy to miss an opportunity. One way this can be alleviated is by keeping in mind a comprehensive taxonomy of common data sources.

Internal Data Sources Available to an Organization

Internal data sets and data sources are those that can be derived in whole from the existing data or activities that exist entirely within the organization. Breakdowns of the different categories of potential data sources within an organization are reproduced here for your review and reference.

Existing data sets already being generated and/or stored in digital form can include the following.

Structured data (e.g., personnel or accounting records, sales and transactions)
Semi-structured or unstructured data (e.g., a data warehouse, or social media posts made by the organization)
Metadata of existing data sets

Definition: The term metadata typically refers to information about a data set or individual entries in that data set. Most data sets have at least some metadata associated with them. Examples include the date and time of creation, how the data is structured or organized, or permissions that determine who can access or modify it.

Assets and business activities within the organization that can potentially be surveyed, measured, and/or tracked to generate new data sets include those enumerated below.

Tracking information and measurements

Existing assets (e.g., current inventory of manufactured goods)
Internal events (e.g., sales figures for products)
Interactions with other organizations (e.g., subcontractors or partner organizations)
External opportunities

Exploratory or diagnostic experiments conducted within the organization
Crowd-sourced data

External Data Sources Available to an Organization

Breakdowns of the different categories of potential external data sources are reproduced here for your review and reference.

Acquired or purchased data sets

Data sets provided in structured form by commercial organizations (e.g., Nielsen Holdings) that may be relevant to the business question
Data streams of structured information to which it may be possible to subscribe for a fee (e.g., website traffic analysis services such as Google Analytics)

Data provided by customers

Social networking and social media services often provide APIs that can be used to collect information posted by customers (both on their own accounts and on the organization’s accounts)
Direct communications from customers, including email, can be a rich data source

Free, open, or publicly accessible data sources

Some private organizations provide data sets via online portals (though it is important to check any restrictions on the use of that data in the license that accompanies it)
Many data sets are provided by governments, government agencies (such as the US Census Bureau) and non-profit organizations via open data portals

Other data published or publicly accessible (e.g., online) in unstructured form, as long as its use does not violate the applicable terms and licenses

Data that can be manually collected and curated into a structured form
Data published online that can be automatically parsed and collected via web scraping (usually a workflow that a data engineer is best-suited to implement)

Once we have chosen the assets and activities of interest within the organization that can act as data sources, we need to identify the means available to the organization to collect and possibly store the desired data. The question of what resources are required to collect and store the new or existing data is driven in part by the characteristics of the business question being addressed. For example, is a one-time decision being made, or is a new and ongoing process being introduced within the organization? Will a new unit within the organization be responsible for acting on these data sources? We introduce several characterizations of data sources that can help navigate these issues.

Definition: A static or one-time data source or data set consists of a fixed quantity of data that can be retrieved once. Such a data set may have been collected via a one-time survey or study. It may also have been obtained via the commissioning of an outside consulting firm or through a one-time purchase from a vendor.

Definition: A transactional data source or data set typically consists of entries that describe events (i.e., changes that occur as a result of a transaction) that have a specified time and may refer to one or more reference objects. These events normally correspond to verbs.

Typical categories of transactional data include financial (e.g., invoices and payments), operational (e.g., tasks assigned or completed), and logistical (e.g., orders and deliveries).

Definition: A real-time or streaming source of data (also known as a data feed) is one from which data is being generated continuously at some (possibly high) rate.

In some cases, the streaming data may be delivered directly to the organization, in which case the organization must maintain an automated infrastructure that can determine where to store this data by provisioning internal (or cloud-based) storage resources as appropriate. In other cases, an organization may have the option to sample a data stream as necessary.

Definition: A data warehouse is a system used for retrieving, integrating, and storing data from multiple sources so that it may be used for reporting and analysis.

A data warehouse normally integrates data from a number of sources (including static, transactional, and streaming data), and some amount of quality control or cleansing may be performed on this data before it is used.

Definition: The provenance of a data set or stream (or an item therein) is a record of its origin and, possibly, its lifespan. This includes to whom the data set can be attributed, at what location and time it was created, from what other data sets it was derived, and by what process this derivation was accomplished.

Data provenance can be tracked at a coarse granularity (i.e., for entire data sets) or at a fine granularity (i.e., for every individual entry within the data set). The provenance information associated with a data set, or an entry within a data set, could constitute a part of its metadata.

The World Wide Web Consortium’s PROV standard lays out a well-defined format for provenance documents (which has been implemented in many machine-readable digital representations and for which many software tools exist that allow users to edit, combine, and visualize provenance documents). Each document can be a record of the provenance of one or more data sets, and is broken down into entities (e.g., data sets, individual data entries, reports, and so on), actors (e.g., data analysts, or automated algorithms making decisions according to a schedule and/or some set of rules), and activities (e.g., events that generate data entries or data sets, data analyses executed using certain tools, data-driven operational decisions made by the organization, and so on).

Saturday, December 19, 2020

Descriptive, Predictive, and Prescriptive analytics classification questions

Please read about Descriptive, Predictive, and Prescriptive analytics here: https://triksbuddy.blogspot.com/2020/12/what-are-different-types-of-analytics.html

And try to answer the following questions and match with correct answer below questions:

1. A classification of our customers into four quantiles according to their profitability in the last four quarters.