bregman-arie/devops-exercises

GitHub: bregman-arie/devops-exercises

一个涵盖 2600 多道题目的 DevOps 和 SRE 综合练习题库，帮助工程师系统性学习从 Linux、网络到云平台和容器编排的完整技术栈。

Stars: 83234 | Forks: 19752

:information_source: 本仓库包含有关各种技术主题的问答和练习，有时会与 DevOps 和 SRE 相关。 :bar_chart: 目前共有 **2624** 道练习题和问答题。 :warning: 你可以使用这些资料来准备面试，但大部分问题和练习并不代表真实的面试题。详情请阅读 [FAQ 页面](faq.md)。 :stop_sign: 如果你有志于从事 DevOps 工程师的职业，学习这里提到的一些概念会很有帮助，但你应该明白，这并不是要学习本仓库中提到的所有主题和技术。 ## :pencil: 你可以通过提交 Pull Request 来添加更多练习 :) 在[此处](CONTRIBUTING.md)阅读贡献指南。

DevOps	Git	Network	Hardware	Kubernetes
Software Development	Python	Go	Perl	Regex
Cloud	AWS	Azure	Google Cloud Platform	OpenStack
Operating System	Linux	Virtualization	DNS	Shell Scripting
Databases	SQL	Mongo	Testing	Big Data
CI/CD	Certificates	Containers	OpenShift	Storage
Terraform	Puppet	Distributed	Questions you can ask	Ansible
Observability	Prometheus	Circle CI		Grafana
Argo	Soft Skills	Security	System Design
Chaos Engineering	Misc	Elastic	Kafka	NodeJs

## DevOps 应用

KubePrep

Linux Master

System Design Hero

## 网络

通常来说，要进行通信你需要什么？

- 一种共同的语言（以便两端能够互相理解） - 一种寻址你所要通信对象的方式 - 一个连接（以便通信内容能够到达接收者）

什么是 TCP/IP？

一组定义了两个或多个设备如何相互通信的协议。想了解更多关于 TCP/IP 的信息，请阅读[这里](http://www.penguintutor.com/linux/basic-network-reference)

什么是以太网？

以太网通常指的是当今最常用的局域网 (LAN) 类型。与跨越更大地理区域的广域网 (WAN) 相对，局域网是一个小区域（如你的办公室、大学校园甚至家里）内相连的计算机网络。

什么是 MAC 地址？它的作用是什么？

MAC 地址是用于识别网络上各个设备的唯一标识号或代码。在以太网上发送的数据包总是来自一个 MAC 地址并发送到另一个 MAC 地址。如果网络适配器接收到一个数据包，它会将该数据包的目标 MAC 地址与适配器自身的 MAC 地址进行比较。

这个 MAC 地址何时使用？: ff:ff:ff:ff:ff:ff

当设备向广播 MAC 地址 (FF:FF:FF:FF:FF:FF) 发送数据包时，它会被传送到本地网络上的所有站点。以太网广播用于在数据链路层将 IP 地址解析为 MAC 地址（通过 ARP）。

什么是 IP 地址？

互联网协议地址（IP 地址）是分配给连接到使用互联网协议进行通信的计算机网络的每个设备的数字标签。IP 地址有两个主要功能：主机或网络接口标识和位置寻址。

解释子网掩码并举个例子

子网掩码是一个 32 位的数字，用于掩盖 IP 地址并将 IP 地址划分为网络地址和主机地址。子网掩码是通过将网络位全部设为 "1"，将主机位全部设为 "0" 来创建的。在给定的网络中，在所有可用的主机地址中，始终有两个地址被保留用于特定目的，不能分配给任何主机。分别是作为网络地址（又称网络 ID）保留的第一个地址，以及用于网络广播的最后一个地址。 [示例](https://github.com/philemonnwanne/projects/tree/main/exercises/exe-09)

什么是私有 IP 地址？在哪些场景/系统设计中应该使用它？

私有 IP 地址分配给同一网络中的主机，以便它们相互通信。顾名思义，“私有”意味着分配了私有 IP 地址的设备无法被来自任何外部网络的设备访问。例如，如果我住在宿舍，并且我想让我的室友加入我搭建的游戏服务器，我会让他们通过我服务器的私有 IP 地址加入，因为该网络是宿舍的本地网络。

什么是公有 IP 地址？在哪些场景/系统设计中应该使用它？

公有 IP 地址是公网 IP 地址。如果你托管了一个游戏服务器并希望你的朋友加入，你将把你的公有 IP 地址给你的朋友，以允许他们的计算机识别和定位你的网络和服务器，从而建立连接。如果你不需要使用公网 IP 地址的一种情况是，你与连接到同一网络的朋友一起玩，在这种情况下，你将使用私有 IP 地址。为了让别人能够连接到你位于内部的服务器，你必须设置端口转发，告诉你的路由器允许来自公共域的流量进入你的网络，反之亦然。

解释 OSI 模型。它有哪些层？每层负责什么？

- 应用层：用户端（HTTP 在这里） - 表示层：在应用层实体之间建立上下文（加密在这里） - 会话层：建立、管理和终止连接 - 传输层：将可变长度的数据序列从源主机传输到目标主机（TCP 和 UDP 在这里） - 网络层：将数据报从一个网络传输到另一个网络（IP 在这里） - 数据链路层：提供两个直接连接节点之间的链路（MAC 在这里） - 物理层：数据连接的电气和物理规范（比特在这里）你可以在 [penguintutor.com](http://www.penguintutor.com/linux/basic-network-reference) 阅读更多关于 OSI 模型的信息。

对于以下每一项，确定它属于 OSI 的哪一层： * 错误纠正 * 数据包路由 * 电缆和电信号 * MAC 地址 * IP 地址 * 终止连接 * 三次握手

* 错误纠正 - 数据链路层 * 数据包路由 - 网络层 * 电缆和电信号 - 物理层 * MAC 地址 - 数据链路层 * IP 地址 - 网络层 * 终止连接 - 会话层 * 三次握手 - 传输层

你熟悉哪些交付方案？

单播：一对一通信，其中有一个发送方和一个接收方。广播：向网络中的每个人发送消息。地址 ff:ff:ff:ff:ff:ff 用于广播。使用广播的两种常见协议是 ARP 和 DHCP。多播：向一组订阅者发送消息。它可以是一对多或多对多。

什么是 CSMA/CD？现代以太网中还在使用它吗？

CSMA/CD 代表载波侦听多路访问/冲突检测。它的主要焦点是管理对共享介质/总线的访问，在特定时间点只能有一台主机进行传输。 CSMA/CD 算法： 1. 在发送帧之前，它会检查是否有另一台主机已经在传输帧。 2. 如果没有人在传输，它就开始传输帧。 3. 如果两台主机同时传输，就会发生冲突。 4. 两台主机都停止发送帧，并向所有人发送一个“干扰信号”，通知所有人发生了冲突。 5. 它们在再次发送之前等待一段随机的时间。 6. 一旦每台主机都等待了一段随机的时间，它们会尝试再次发送帧，如此循环。

描述以下网络设备以及它们之间的区别： * 路由器 * 交换机 * 集线器

路由器、交换机和集线器都是用于在局域网 (LAN) 中连接设备的网络设备。然而，每种设备的操作方式不同，并且有其特定的使用场景。以下是每种设备的简要描述以及它们之间的区别： 1. 路由器：一种将多个网段连接在一起的网络设备。它工作在 OSI 模型的网络层（第三层），并使用路由协议在网络之间定向数据。路由器使用 IP 地址来识别设备并将数据包路由到正确的目的地。 2. 交换机：一种在 LAN 上连接多个设备的网络设备。它工作在 OSI 模型的数据链路层（第二层），并使用 MAC 地址来识别设备并将数据包定向到正确的目的地。交换机允许同一网络上的设备更高效地相互通信，并能防止在多个设备同时发送数据时可能发生的数据冲突。 3. 集线器：一种通过单根电缆连接多个设备的网络设备，用于在不分段网络的情况下连接多个设备。然而，与交换机不同的是，它工作在 OSI 模型的物理层（第一层），只是将数据包广播到连接到它的所有设备，无论该设备是否是预期的接收者。这意味着可能会发生数据冲突，网络的效率也会因此受到影响。集线器通常不用于现代网络设置中，因为交换机更高效并能提供更好的网络性能。

什么是“冲突域”？

冲突域是一个网段，其中的设备可能会因为试图同时传输数据而相互干扰。当两个设备同时传输数据时，可能会导致冲突，从而导致数据丢失或损坏。在冲突域中，所有设备共享相同的带宽，任何设备都有可能干扰其他设备的数据传输。

什么是“广播域”？

广播域是一个网段，其中的所有设备都可以通过发送广播消息来相互通信。广播消息是发送给网络中所有设备而不是特定设备的消息。在广播域中，所有设备都可以接收和处理广播消息，无论该消息是否是发给它们的。

三台电脑连接到一台交换机。有多少个冲突域？有多少个广播域？

三个冲突域和一个广播域

路由器是如何工作的？

路由器是一种物理或虚拟设备，它在两个或多个分组交换计算机网络之间传递信息。路由器检查给定数据包的目标互联网协议地址（IP 地址），计算到达目标的最佳路径，然后相应地转发它。

什么是 NAT？

网络地址转换 (NAT) 是一个将一个或多个本地 IP 地址转换为一个或多个全局 IP 地址（反之亦然）的过程，目的是为本地主机提供互联网访问。

什么是代理？它是如何工作的？我们为什么需要它？

代理服务器充当你和互联网之间的网关。它是一个中间服务器，将最终用户与他们浏览的网站分离开来。如果你使用代理服务器，互联网流量会在前往你请求的地址的途中通过代理服务器。然后请求通过同一个代理服务器返回（此规则也有例外），然后代理服务器将从网站接收到的数据转发给你。代理服务器根据你的用例、需求或公司政策提供不同级别的功能、安全性和隐私。

什么是 TCP？它是如何工作的？什么是三次握手？

TCP 三次握手是 TCP/IP 网络中用于在服务器和客户端之间建立连接的过程。三次握手主要用于创建 TCP socket 连接。它的工作方式如下： - 客户端节点通过 IP 网络向同一或外部网络上的服务器发送 SYN 数据包。此数据包的目的是询问/推断服务器是否开放了新的连接。 - 目标服务器必须拥有可以接受和发起新连接的开放端口。当服务器收到来自客户端节点的 SYN 数据包时，它会响应并返回一个确认回执 – ACK 数据包或 SYN/ACK 数据包。 - 客户端节点收到来自服务器的 SYN/ACK 并以 ACK 数据包作为响应。

什么是往返延迟或往返时间？

来自 [维基百科](https://en.wikipedia.org/wiki/Round-trip_delay)：“发送信号所需的时间长度加上收到该信号的确认所需的时间长度” 附加题：LAN 的 RTT 是多少？

SSL 握手是如何工作的？

SSL 握手是在客户端和服务器之间建立安全连接的过程。 1. 客户端向服务器发送 Client Hello 消息，其中包括客户端的 SSL/TLS 协议版本、客户端支持的加密算法列表以及一个随机值。 2. 服务器以 Server Hello 消息作为响应，其中包含服务器的 SSL/TLS 协议版本、一个随机值和一个会话 ID。 3. 服务器发送 Certificate 消息，其中包含服务器的证书。 4. 服务器发送 Server Hello Done 消息，表明服务器已完成 Server Hello 阶段的消息发送。 5. 客户端发送 Client Key Exchange 消息，其中包含客户端的公钥。 6. 客户端发送 Change Cipher Spec 消息，通知服务器客户端即将发送使用新密码规范加密的消息。 7. 客户端发送 Encrypted Handshake Message，其中包含使用服务器公钥加密的预主密钥。 8. 服务器发送 Change Cipher Spec 消息，通知客户端服务器即将发送使用新密码规范加密的消息。 9. 服务器发送 Encrypted Handshake Message，其中包含使用客户端公钥加密的预主密钥。 10. 客户端和现在可以交换应用数据了。

TCP 和 UDP 有什么区别？

TCP 在客户端和服务器之间建立连接以保证数据包的顺序，而 UDP 不在客户端和服务器之间建立连接，也不处理数据包的顺序。这使得 UDP 比 TCP 更轻量，成为流媒体等服务的完美候选。 [Penguintutor.com](http://www.penguintutor.com/linux/basic-network-reference) 提供了很好的解释。

你熟悉哪些 TCP/IP 协议？

解释“默认网关”

默认网关充当网络计算机用来向另一个网络或互联网中的计算机发送信息的接入点或 IP 路由器。

什么是 ARP？它是如何工作的？

ARP 代表地址解析协议。当你尝试 ping 本地网络上的 IP 地址（例如 192.168.1.1）时，你的系统必须将 IP 地址 192.168.1.1 转换为 MAC 地址。这涉及到使用 ARP 来解析地址，因此得名。系统维护一个 ARP 查找表，用于存储哪些 IP 地址与哪些 MAC 地址相关联的信息。当尝试向某个 IP 地址发送数据包时，系统将首先查询此表以查看它是否已经知道 MAC 地址。如果存在缓存值，则不会使用 ARP。

什么是 TTL？它有助于防止什么？

- TTL (生存时间) 是 IP（互联网协议）数据包中的一个值，它决定了数据包在被丢弃之前可以经过多少跳或路由器。每次数据包被路由器转发时，TTL 值都会减一。当 TTL 值达到零时，数据包将被丢弃，并向发送方发送一条 ICMP（互联网控制消息协议）消息，指示数据包已过期。 - TTL 用于防止数据包在网络中无限期地循环，这可能会导致拥塞并降低网络性能。 - 它还有助于防止数据包被困在路由环路中，在这种情况下，数据包会在同一组路由器之间不断传输，而永远无法到达目的地。 - 此外，TTL 可用于帮助检测和防止 IP 欺骗攻击，攻击者试图通过使用虚假或伪造的 IP 地址来冒充网络上的另一台设备。通过限制数据包可以经过的跳数，TTL 可以帮助防止数据包被路由到不合法的目的地。

什么是 DHCP？它是如何工作的？

它代表动态主机配置协议，用于为主机分配 IP 地址、子网掩码和网关。它的工作方式如下： * 主机在进入网络时会广播一条消息以寻找 DHCP 服务器 (DHCP DISCOVER) * DHCP 服务器将包含租约时间、子网掩码、IP 地址等信息的数据包作为提议消息发送回来 (DHCP OFFER) * 根据接受了哪个提议，客户端会发送一个回复广播，让所有 DHCP 服务器知道 (DHCP REQUEST) * 服务器发送确认 (DHCP ACK) 在[此处](https://linuxjourney.com/lesson/dhcp-overview)阅读更多内容

同一网络上可以有两个 DHCP 服务器吗？它是如何工作的？

在同一网络上设置两个 DHCP 服务器是可能的，但不建议这样做，并且必须仔细配置它们以防止冲突和配置问题。 - 当在同一网络上配置了两个 DHCP 服务器时，存在两台服务器都向同一设备分配 IP 地址和其他网络配置设置的风险，这可能导致冲突和连接问题。此外，如果 DHCP 服务器配置了不同的网络设置或选项，网络上的设备可能会收到相互冲突或不一致的配置设置。 - 然而，在某些情况下，同一网络上可能需要两台 DHCP 服务器，例如在大型网络中，一台 DHCP 服务器可能无法处理所有请求。在这种情况下，可以将 DHCP 服务器配置为服务于不同的 IP 地址范围或不同的子网，这样它们就不会相互干扰。

什么是 SSL 隧道？它是如何工作的？

- SSL（安全套接字层）隧道是一种用于在不安全的网络（如互联网）上两个端点之间建立安全、加密连接的技术。SSL 隧道是通过将流量封装在 SSL 连接中来创建的，这提供了机密性、完整性和身份验证。 SSL 隧道的工作方式如下： 1. 客户端向服务器发起 SSL 连接，这涉及到一个握手过程以建立 SSL 会话。 2. 一旦建立了 SSL 会话，客户端和服务器就会协商加密参数（如加密算法和密钥长度），然后交换数字证书以相互验证。 3. 然后客户端通过 SSL 隧道向服务器发送流量，服务器对流量进行解密并将其转发到目的地。 4. 服务器通过 SSL 隧道将流量发送回客户端，客户端对流量进行解密并将其转发给应用程序。

什么是 Socket？你在哪里可以看到系统中的 socket 列表？

- Socket 是一种软件端点，可启用网络上进程之间的双向通信。Socket 提供了用于网络通信的标准化接口，允许应用程序通过网络发送和接收数据。要在 Linux 系统上查看打开的 socket 列表： ***netstat -an*** - 此命令显示所有打开的 socket 列表，以及它们的协议、本地地址、外部地址和状态。

什么是 IPv6？如果我们已经有了 IPv4，为什么还要考虑使用它？

- IPv6（互联网协议第 6 版）是互联网协议 (IP) 的最新版本，用于识别网络上的设备并与它们通信。IPv6 地址是 128 位地址，以十六进制表示法表示，例如 2001:0db8:85a3:0000:0000:8a2e:0370:7334。我们应该考虑使用 IPv6 而不是 IPv4 的原因有以下几个： 1. 地址空间：IPv4 的地址空间有限，在世界许多地方已经耗尽。IPv6 提供了更大的地址空间，允许数万亿个唯一的 IP 地址。 2. 安全性：IPv6 包含对 IPsec 的内置支持，为网络流量提供端到端的加密和身份验证。 3. 性能：IPv6 包含可以帮助提高网络性能的特性，例如多播路由，它允许将单个数据包同时发送到多个目的地。 4. 简化网络配置：IPv6 包含可以简化网络配置的特性，例如无状态自动配置，它允许设备自动配置其 IPv6 地址，而无需 DHCP 服务器。 5. 更好的移动性支持：IPv6 包含可以改善移动性支持的特性，例如 Mobile IPv6，它允许设备在不同网络之间移动时保持其 IPv6 地址。

什么是 VLAN？

- VLAN（虚拟局域网）是一种逻辑网络，它将物理网络上的一组设备组合在一起，而不管它们的物理位置如何。VLAN 是通过配置网络交换机以将特定的 VLAN ID 分配给连接到交换机上特定端口或端口组的设备发送的帧来创建的。

什么是 MTU？

MTU 代表最大传输单元。它是可以在单个事务中发送的最大 PDU（协议数据单元）的大小。

如果你发送的数据包大于 MTU 会怎样？

使用 IPv4 协议时，路由器可以对 PDU 进行分片，然后通过传输发送所有分片的 PDU。使用 IPv6 协议时，它会向用户的计算机发出错误提示。

对还是错？Ping 使用 UDP，因为它不关心可靠的连接

错。Ping 实际上使用的是 ICMP（互联网控制消息协议），这是一种网络协议，用于发送诊断消息和与网络通信相关的控制消息。

什么是 SDN？

- SDN 代表软件定义网络。它是一种强调网络控制集中化的网络管理方法，使管理员能够通过软件抽象来管理网络行为。 - 在传统网络中，路由器、交换机和防火墙等网络设备是使用专用软件或命令行界面单独配置和管理的。相比之下，SDN 将网络控制平面与数据平面分离，允许管理员通过集中的软件控制器来管理网络行为。

什么是 ICMP？它的作用是什么？

- ICMP 代表互联网控制消息协议。它是 IP 网络中用于诊断和控制目的的协议。它是互联网协议套件的一部分，运行在网络层。 ICMP 消息用于多种目的，包括： 1. 错误报告：ICMP 消息用于报告网络中发生的错误，例如无法将其传送到目的地的数据包。 2. Ping：ICMP 用于发送 ping 消息，用于测试主机或网络是否可达以及测量数据包的往返时间。 3. 路径 MTU 发现：ICMP 用于发现路径的最大传输单元 (MTU)，即可以在不分片的情况下传输的最大数据包。 4. Traceroute：ICMP 被 traceroute 实用程序用于跟踪数据包通过网络所经过的路径。 5. 路由器发现：ICMP 用于发现网络中的路由器。

什么是 NAT？它是如何工作的？

NAT 代表网络地址转换。这是一种在传输信息之前将多个本地私有地址映射到一个公共地址的方法。希望多个设备使用单一 IP 地址的组织使用 NAT，大多数家用路由器也是如此。例如，你计算机的私有 IP 可能是 192.168.1.100，但你的路由器将流量映射到其公共 IP（例如 1.1.1.1）。互联网上的任何设备都会看到流量来自你的公共 IP (1.1.1.1)，而不是你的私有 IP (192.168.1.100)。

以下协议分别使用哪个端口号？： * SSH * SMTP * HTTP * DNS * HTTPS * FTP * SFTP

* SSH - 22 * SMTP - 25 * HTTP - 80 * DNS - 53 * HTTPS - 443 * FTP - 21 * SFTP - 22

哪些因素会影响网络性能？

有几个因素会影响网络性能，包括： 1. 带宽：网络连接的可用带宽会显著影响其性能。带宽有限的网络可能会遇到数据传输速率缓慢、高延迟和响应性差的情况。 2. 延迟：延迟是指数据从网络中的一个点传输到另一个点时发生的延迟。高延迟会导致网络性能下降，尤其是对于视频会议和在线游戏等实时应用。 3. 网络拥塞：当太多设备同时使用网络时，可能会发生网络拥塞，导致数据传输速率变慢和网络性能下降。 4. 丢包：丢包是指在传输过程中数据包被丢弃的情况。这会导致网络速度变慢和整体网络性能下降。 5. 网络拓扑：网络的物理布局，包括交换机、路由器和其他网络设备的放置，会影响网络性能。 6. 网络协议：不同的网络协议有不同的性能特征，这可能会影响网络性能。例如，TCP 是一种可靠的协议，可以保证数据的传递，但由于用于错误检查和重传的开销，也可能导致性能变慢。 7. 网络安全：防火墙和加密等安全措施会影响网络性能，特别是如果它们需要大量的处理能力或引入额外的延迟。 8. 距离：网络上设备之间的物理距离会影响网络性能，特别是对于无线网络，信号强度和干扰会影响连接性和数据传输速率。

什么是 APIPA？

APIPA 是当主 DHCP 服务器不可达时分配给设备的一组 IP 地址

APIPA 使用什么 IP 范围？

APIPA 使用的 IP 范围是：169.254.0.1 - 169.254.255.254。

#### 控制平面和数据平面

“控制平面”指的是什么？

控制平面是网络的一部分，它决定如何将数据包路由和转发到不同的位置。

“数据平面”指的是什么？

数据平面是网络中实际转发数据/数据包的部分。

“管理平面”指的是什么？

它指的是监控和管理功能。

创建路由表属于哪个平面（数据、控制……）？

控制平面。

解释生成树协议 (STP)。

什么是链路聚合？为什么要使用它？

什么是不对称路由？如何处理它？

你熟悉哪些覆盖（隧道）协议？

什么是 GRE？它是如何工作的？

什么是 VXLAN？它是如何工作的？

什么是 SNAT？

解释 OSPF。

OSPF（开放式最短路径优先）是一种可以在各种类型的路由器上实现的路由协议。通常，大多数现代路由器都支持 OSPF，包括来自 Cisco、Juniper 和华为等厂商的路由器。该协议旨在与基于 IP 的网络配合使用，包括 IPv4 和 IPv6。此外，它采用分层网络设计，将路由器分组到区域中，每个区域都有自己的拓扑图和路由表。这种设计有助于减少需要在路由器之间交换的路由信息量，并提高网络的可扩展性。 OSPF 的 4 种路由器类型是： * 内部路由器 * 区域边界路由器 * 自治系统边界路由器 * 骨干路由器了解更多关于 OSPF 路由器类型的信息：https://www.educba.com/ospf-router-types/

什么是延迟？

延迟是信息从源头到达目的地所花费的时间。

什么是带宽？

带宽是通信信道的容量，用于衡量后者在特定时间段内可以处理多少数据。更多的带宽意味着更多的流量处理，从而有更多的数据传输。

什么是吞吐量？

吞吐量是指在任何传输通道上，一段时间内传输的真实数据量的测量值。

执行搜索查询时，什么更重要，延迟还是吞吐量？以及如何确保我们管理好全球基础设施？

延迟。为了获得良好的延迟，搜索查询应被转发到最近的数据中心。

上传视频时，什么更重要，延迟还是吞吐量？如何确保这一点？

吞吐量。为了获得良好的吞吐量，上传流应被路由到未充分利用的链路。

在转发请求时还有哪些其他考虑因素（除了延迟和吞吐量）？

* 保持缓存更新（这意味着请求可能不会被转发到最近的数据中心）

解释 Spine & Leaf

“Spine & Leaf”是一种常用于数据中心环境的网络拓扑，用于连接多个交换机并有效地管理网络流量。它也被称为“spine-leaf”架构或“leaf-spine”拓扑。这种设计提供了高带宽、低延迟和可扩展性，使其成为处理大量数据和流量的现代数据中心的理想选择。在 Spine & Leaf 网络中，主要有两种类型的交换机： * Spine 交换机：Spine 交换机是排列在 spine 层的高性能交换机。这些交换机充当网络的核心，通常与每个 leaf 交换机相互连接。每个 spine 交换机都连接到数据中心中的所有 leaf 交换机。 * Leaf 交换机：Leaf 交换机连接到终端设备，如服务器、存储阵列和其他网络设备。每个 leaf 交换机都连接到数据中心中的每个 spine 交换机。这在 leaf 和 spine 交换机之间创建了非阻塞的全网格连接，确保任何 leaf 交换机都可以与任何其他 leaf 交换机以最大吞吐量进行通信。 Spine & Leaf 架构在数据中心变得越来越受欢迎，因为它能够满足现代云计算、虚拟化和大数据应用的需求，提供了可扩展、高性能且可靠的网络基础设施。

什么是网络拥塞？什么会导致它？

当网络上有过多的数据需要传输，而没有足够的容量来处理需求时，就会发生网络拥塞。
这可能会导致延迟增加和丢包。原因可能有多种，例如网络使用率高、大文件传输、恶意软件、硬件问题或网络设计问题。
为了防止网络拥塞，监控你的网络使用情况并实施限制或管理需求的策略非常重要。

你能告诉我关于 UDP 数据包格式的什么信息？TCP 数据包格式呢？它有什么不同？

什么是指数退避算法？它用在哪里？

使用汉明码，以下数据字的码字是什么 100111010001101？

00110011110100011101

举例说明应用层中的协议

* 超文本传输协议 (HTTP) - 用于互联网上的网页 * 简单邮件传输协议 (SMTP) - 电子邮件传输 * 电信网络 - (TELNET) - 终端仿真，允许客户端访问 telnet 服务器 * 文件传输协议 (FTP) - 促进任意两台机器之间的文件传输 * 域名系统 (DNS) - 域名转换 * 动态主机配置协议 (DHCP) - 为主机分配 IP 地址、子网掩码和网关 * 简单网络管理协议 (SNMP) - 收集网络上设备的数据

举例说明网络层中的协议

* 互联网协议 (IP) - 协助将数据包从一台机器路由到另一台机器 * 互联网控制消息协议 (ICMP) - 让你了解发生了什么，例如错误消息和调试信息

什么是 HSTS？

HTTP 严格传输安全是一项 Web 服务器指令，它通过在最初发送回浏览器的响应头，告知用户代理和 Web 浏览器如何处理其连接。这会强制通过 HTTPS 加密进行连接，忽略任何脚本在该域中通过 HTTP 加载任何资源的调用。在[此处](https://www.globalsign.com/en/blog/what-is-hsts-and-how-do-i-use-it#:~:text=HTTP%20Strict%20Transport%20Security%20(HSTS,and%20back%20to%20the%20browser.)阅读更多内容

#### 网络 - 杂项

什么是互联网？它和万维网一样吗？

互联网指的是网络中的网络，在全球范围内传输大量数据。
万维网是在互联网上的数百万台服务器上运行的应用程序，通过被称为 Web 浏览器的工具进行访问。

什么是 ISP？

ISP（互联网服务提供商）是本地的互联网公司提供商。

## 操作系统 ### 操作系统练习 |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| |Fork 101|Fork|[链接](topics/os/fork_101.md)|[链接](topics/os/solutions/fork_101_solution.md) |Fork 102|Fork|[链接](topics/os/fork_102.md)|[链接](topics/os/solutions/fork_102_solution.md) ### 操作系统 - 自我评估

什么是操作系统？

摘自《操作系统导论》(Operating Systems: Three Easy Pieces) 一书： “负责让程序易于运行（甚至允许你看起来像是在同一时间运行多个程序），允许程序共享内存，使程序能够与设备交互，以及其他有趣的事情”。

#### 操作系统 - 进程

你能解释一下什么是进程吗？

进程是正在运行的程序。程序是一个或多个指令，而程序（或进程）由操作系统执行。

如果你要为操作系统中的进程设计一个 API，这个 API 会是什么样子的？

它将支持以下内容： * 创建 - 允许创建新进程 * 删除 - 允许移除/销毁进程 * 状态 - 允许检查进程的状态，无论它是正在运行、已停止、等待中等。 * 停止 - 允许停止正在运行的进程

进程是如何创建的？

* 操作系统读取程序的代码和任何相关的附加数据 * 程序的代码被加载到内存中，或者更具体地说，加载到进程的地址空间中。 * 为程序的栈（即运行时栈）分配内存。操作系统还使用诸如 argv、argc 和 main() 的参数等数据来初始化栈。 * 为程序的堆分配内存，这是动态分配数据（如链表和哈希表等数据结构）所必需的。 * 执行 I/O 初始化任务，例如在基于 Unix/Linux 的系统中，每个进程有 3 个文件描述符（输入、输出和错误）。 * 操作系统从 main() 开始运行程序。

对还是错？将程序加载到内存中是急切地（一次性全部）完成的

错。过去是这样的，但今天的操作系统执行延迟加载，这意味着首先只加载进程运行所需的相关片段。

进程有哪些不同的状态？

* 运行 - 正在执行指令 * 就绪 - 准备运行，但由于各种原因被搁置 * 阻塞 - 正在等待某个操作完成，例如 I/O 磁盘请求

进程被阻塞的某些原因是什么？

- I/O 操作（例如从磁盘读取） - 等待来自网络的数据包

什么是进程间通信 (IPC)？

进程间通信 (IPC) 指的是操作系统提供的允许进程管理共享数据的机制。

什么是“分时”？

即使使用只有一个物理 CPU 的系统，也可以允许多个用户在上面工作并运行程序。这可以通过分时来实现，即以在用户看来系统拥有多个 CPU 的方式共享计算资源，但实际上它只是一个通过应用多道程序设计和多任务处理来共享的 CPU。

什么是“空间共享”？

在某种程度上与分时相反。在分时中，一个资源由一个实体使用一段时间，然后相同的资源可以由另一个资源使用，而在空间共享中，空间由多个实体共享，但并不是在它们之间进行传递。
它由一个实体使用，直到该实体决定放弃它。以存储为例。在存储中，一个文件是你的，直到你决定删除它。

哪个组件决定了在给定时刻运行哪个进程？

CPU 调度器

#### 操作系统 - 内存

什么是“虚拟内存”，它的作用是什么？

虚拟内存将你计算机的 RAM 与硬盘上的临时空间结合起来。当 RAM 不足时，虚拟内存有助于将数据从 RAM 移动到称为页面文件的空间。将数据移动到页面文件可以释放 RAM，以便你的计算机可以完成其工作。通常，计算机的 RAM 越多，程序运行得越快。 https://www.minitool.com/lib/virtual-memory.html

什么是按需分页？

按需分页是一种内存管理技术，仅在进程访问页面时才将其加载到物理内存中。它通过按需加载页面来优化内存使用，从而减少了启动延迟和空间开销。然而，它在首次访问页面时会引入一些延迟。总的来说，这是一种用于管理操作系统中内存资源的具有成本效益的方法。

什么是写时复制？

写时复制 (COW) 是一种资源管理概念，旨在减少不必要的信息复制。它是一个概念，例如在 POSIX fork 系统调用中实现，该调用创建调用进程的重复进程。核心思想： 1. 如果资源在 2 个或多个实体之间共享（例如 2 个进程之间的共享内存段），则不需要为每个实体复制资源，而是每个实体对共享资源拥有 READ 操作访问权限。（共享段被标记为只读）（可以想象每个实体都有一个指向共享资源位置的指针，可以对其进行解引用以读取其值） 2. 如果一个实体对共享资源执行 WRITE 操作，就会出现问题，因为该资源也会对所有其他共享它的实体进行永久更改。（想象一个进程在栈上修改了一些变量，或者在堆上动态分配了一些数据，这些对共享资源的更改也将应用于所有其他进程，这绝对是一种不受欢迎的行为） 3. 作为一种解决方案，只有在即将对共享资源执行 WRITE 操作时，该资源才会被首先复制，然后再应用更改。

什么是内核，它的作用是什么？

内核是操作系统的一部分，负责以下任务： * 分配内存 * 调度进程 * 控制 CPU

对还是错？内核中的某些代码片段被加载到内存的保护区域，因此应用程序无法覆盖它们。

对

什么是 POSIX？

POSIX（可移植操作系统接口）是一组定义类 Unix 操作系统与应用程序之间接口的标准。

解释什么是信号量以及它在操作系统中的作用。

信号量是操作系统和并发编程中使用的同步原语，用于控制对共享资源的访问。它是一种变量或抽象数据类型，充当计数器或信号机制，用于管理多个进程或线程对资源的访问。

什么是缓存？什么是缓冲区？

缓存：通常在进程读写磁盘时使用，通过让不同程序使用的相似数据易于访问，从而使过程更快。缓冲区：RAM 中的保留空间，用于临时保存数据。

## 虚拟化

什么是虚拟化？

虚拟化使用软件在计算机硬件上创建一个抽象层，允许将单台计算机的硬件元素——处理器、内存、存储等——划分为多台虚拟计算机，通常称为虚拟机 (VM)。

什么是 Hypervisor？

Red Hat：“Hypervisor 是一种创建和运行虚拟机 (VM) 的软件。Hypervisor 有时被称为虚拟机监视器 (VMM)，它将 Hypervisor 操作系统和资源与虚拟机隔离开来，并允许创建和管理这些 VM。” 在[此处](https://www.redhat.com/en/topics/virtualization/what-is-a-hypervisor)阅读更多内容

Hypervisor 有哪些类型？

托管型 Hypervisor 和裸金属 Hypervisor。

裸金属 Hypervisor 相对于托管型 Hypervisor 有哪些优点和缺点？

由于拥有自己的驱动程序并可以直接访问硬件组件，裸金属 Hypervisor 通常会具有更好的性能，以及稳定性和可扩展性。另一方面，在加载（任何）驱动程序方面可能存在一些限制，因此托管型 Hypervisor 通常会受益于拥有更好的硬件兼容性。

有哪些类型的虚拟化？

操作系统虚拟化网络功能虚拟化桌面虚拟化

容器化是一种虚拟化吗？

是的，它是一种操作系统级虚拟化，其中内核是共享的，并允许使用多个隔离的用户空间实例。

虚拟机的引入如何改变了行业和应用程序的部署方式？

虚拟机的引入使得公司能够在相同的硬件上部署多个业务应用程序，而每个应用程序都以安全的方式相互隔离，并且每个都在其独立的操作系统上运行。

#### 虚拟机

在容器时代我们还需要虚拟机吗？它们还有意义吗？

是的，即使在容器时代，虚拟机仍然有意义。虽然容器为虚拟机提供了一种轻量级且可移植的替代方案，但它们确实存在某些局限性。虚拟机仍然重要，因为它们提供了隔离和安全性，可以运行不同的操作系统，并且非常适合遗留应用程序。例如，容器的局限性在于它们共享宿主机内核。

## Prometheus

什么是 Prometheus？Prometheus 的主要功能有哪些？

Prometheus 是一个流行的开源系统监控和警报工具包，最初由 SoundCloud 开发。它旨在收集和存储时间序列数据，并允许使用称为 PromQL 的强大查询语言对这些数据进行查询和分析。Prometheus 经常被用于监控云原生应用程序、微服务和其他现代基础设施。 Prometheus 的一些主要功能包括： ``` 1. Data model: Prometheus uses a flexible data model that allows users to organize and label their time-series data in a way that makes sense for their particular use case. Labels are used to identify different dimensions of the data, such as the source of the data or the environment in which it was collected. 2. Pull-based architecture: Prometheus uses a pull-based model to collect data from targets, meaning that the Prometheus server actively queries its targets for metrics data at regular intervals. This architecture is more scalable and reliable than a push-based model, which would require every target to push data to the server. 3. Time-series database: Prometheus stores all of its data in a time-series database, which allows users to perform queries over time ranges and to aggregate and analyze their data in various ways. The database is optimized for write-heavy workloads, and can handle a high volume of data with low latency. 4. Alerting: Prometheus includes a powerful alerting system that allows users to define rules based on their metrics data and to send alerts when certain conditions are met. Alerts can be sent via email, chat, or other channels, and can be customized to include specific details about the problem. 5. Visualization: Prometheus has a built-in graphing and visualization tool, called PromDash, which allows users to create custom dashboards to monitor their systems and applications. PromDash supports a variety of graph types and visualization options, and can be customized using CSS and JavaScript. ``` 总的来说，Prometheus 是一个强大而灵活的工具，用于监控和分析系统及应用程序，并在业界被广泛用于云原生监控和可观测性。

在什么场景下可能最好不要使用 Prometheus？

摘自 Prometheus 文档：“如果你需要 100% 的准确性，例如按请求计费”。

描述 Prometheus 的架构和组件

Prometheus 架构由四个主要组件组成： ``` 1. Prometheus Server: The Prometheus server is responsible for collecting and storing metrics data. It has a simple built-in storage layer that allows it to store time-series data in a time-ordered database. 2. Client Libraries: Prometheus provides a range of client libraries that enable applications to expose their metrics data in a format that can be ingested by the Prometheus server. These libraries are available for a range of programming languages, including Java, Python, and Go. 3. Exporters: Exporters are software components that expose existing metrics from third-party systems and make them available for ingestion by the Prometheus server. Prometheus provides exporters for a range of popular technologies, including MySQL, PostgreSQL, and Apache. 4. Alertmanager: The Alertmanager component is responsible for processing alerts generated by the Prometheus server. It can handle alerts from multiple sources and provides a range of features for deduplicating, grouping, and routing alerts to appropriate channels. ``` 总的来说，Prometheus 的架构旨在具有高度的可扩展性和弹性。服务器和客户端库可以以分布式方式进行部署，以支持跨大规模、高度动态环境的监控。

你能把 Prometheus 和其他解决方案（例如 InfluxDB）进行比较吗？

与其他监控解决方案（例如 InfluxDB）相比，Prometheus 以其高性能和可扩展性而闻名。它可以处理大量数据，并且可以轻松地与监控生态系统中的其他工具集成。而 InfluxDB 则以其易用性和简单性而闻名。它拥有用户友好的界面，并提供易于使用的 API 来收集和查询数据。另一个流行的解决方案 Nagios 是一种更传统的监控系统，它依赖于基于推送的模型来收集数据。Nagios 已经存在了很长时间，并以其稳定性和可靠性而闻名。然而，与 Prometheus 相比，Nagios 缺少一些更高级的功能，例如多维数据模型和强大的查询语言。总的来说，监控解决方案的选择取决于组织的具体需求和要求。虽然 Prometheus 是大规模监控和警报的绝佳选择，但对于需要易用性和简单性的小型环境，InfluxDB 可能更合适。对于优先考虑稳定性和可靠性而不是高级功能的组织来说，Nagios 仍然是一个可靠的选择。

什么是告警？

在 Prometheus 中，告警是在满足特定条件或阈值时触发的通知。可以将告警配置为在某些指标超过特定阈值或发生特定事件时触发。一旦触发了告警，就可以将其路由到各种渠道，例如电子邮件、寻呼机或聊天，以通知相关的团队或个人采取适当的措施。告警是任何监控系统的关键组件，因为它们允许团队主动检测和响应问题，避免其对用户造成影响或导致系统停机。

什么是实例？什么是作业？

在 Prometheus 中，实例指的是被监控的单个目标。例如，单个服务器或服务。作业是执行相同功能的一组实例，例如服务于相同应用程序的一组 Web 服务器。作业允许你将一组目标定义和管理在一起。本质上，实例是 Prometheus 从中收集指标的单个目标，而作业是可以作为组进行管理的相似实例的集合。

Prometheus 支持哪些核心指标类型？

Prometheus 支持多种类型的指标，包括： ``` 1. Counter: A monotonically increasing value used for tracking counts of events or samples. Examples include the number of requests processed or the total number of errors encountered. 2. Gauge: A value that can go up or down, such as CPU usage or memory usage. Unlike counters, gauge values can be arbitrary, meaning they can go up and down based on changes in the system being monitored. 3. Histogram: A set of observations or events that are divided into buckets based on their value. Histograms help in analyzing the distribution of a metric, such as request latencies or response sizes. 4. Summary: A summary is similar to a histogram, but instead of buckets, it provides a set of quantiles for the observed values. Summaries are useful for monitoring the distribution of request latencies or response sizes over time. ``` Prometheus 还支持各种用于聚合和操作指标的函数和运算符，例如 sum、max、min 和 rate。这些特性使其成为监控和警报系统指标的强大工具。

什么是 Exporter？它有什么用途？

Exporter 充当第三方系统或应用程序与 Prometheus 之间的桥梁，使得 Prometheus 能够监控并从该系统或应用程序收集数据。 Exporter 充当服务器，监听特定的网络端口，以接收来自 Prometheus 的抓取指标请求。它从第三方系统或应用程序收集指标，并将其转换为 Prometheus 可以理解的格式。然后，Exporter 通过 HTTP 端点将这些指标暴露给 Prometheus，使其可用于收集和分析。 Exporter 通常用于监控各种类型的基础设施组件，例如数据库、Web 服务器和存储系统。例如，有可用于监控流行数据库（如 MySQL 和 PostgreSQL）以及 Web 服务器（如 Apache 和 Nginx）的 Exporter。总的来说，Exporter 是 Prometheus 生态系统的关键组件，允许监控各种系统和应用程序，并为平台提供了高度的灵活性和可扩展性。

Prometheus 有哪些最佳实践？

以下是其中三个： ``` 1. Label carefully: Careful and consistent labeling of metrics is crucial for effective querying and alerting. Labels should be clear, concise, and include all relevant information about the metric. 2. Keep metrics simple: The metrics exposed by exporters should be simple and focus on a single aspect of the system being monitored. This helps avoid confusion and ensures that the metrics are easily understandable by all members of the team. 3. Use alerting sparingly: While alerting is a powerful feature of Prometheus, it should be used sparingly and only for the most critical issues. Setting up too many alerts can lead to alert fatigue and result in important alerts being ignored. It is recommended to set up only the most important alerts and adjust the thresholds over time based on the actual frequency of alerts. ```

如何获取给定时间内的总请求数？

#### 要使用 Prometheus 获取给定时间内的总请求数，你可以结合使用 *sum* 函数和 *rate* 函数。以下是一个查询示例，它将为你提供过去一小时内的总请求数： #### sum(rate(http_requests_total[1h])) 在此查询中，*http_requests_total* 是跟踪 HTTP 请求总数的指标名称，*rate* 函数计算过去一小时内每秒的请求速率。然后 *sum* 函数将所有请求相加，为你提供过去一小时内的总请求数。你可以通过更改 *rate* 函数中的持续时间来调整时间范围。例如，如果你想获取过去一天内的总请求数，你可以将函数更改为 *rate(http_requests_total[1d])*。

Prometheus 中的 HA 是什么意思？

HA 代表高可用性。这意味着系统被设计为高度可靠且始终可用，即使面临故障或其他问题也是如此。在实践中，这通常涉及设置多个 Prometheus 实例，并确保它们全部同步并且能够无缝地协同工作。这可以通过多种技术来实现，例如负载均衡、复制和故障转移机制。通过在 Prometheus 中实施 HA，用户可以确保他们的监控数据始终可用且是最新的，即使在面临硬件或软件故障、网络问题或可能导致停机或数据丢失的其他问题的情况下也是如此。

如何连接两个指标？

在 Prometheus 中，可以使用 *join()* 函数来连接两个指标。*join()* 函数根据标签值组合两个或多个时间序列。它接受两个必需的参数：*on* 和 *table*。on 参数指定要连接的标签，*table* 参数指定要连接的时间序列。 #### 以下是使用 *join()* 函数连接两个指标的示例： sum_series( join( on(service, instance) request_count_total, on(service, instance) error_count_total, ) #### ) 在这个示例中，*join()* 函数基于 *service* 和 *instance* 标签值组合 *request_count_total* 和 *error_count_total* 时间序列。然后 *sum_series()* 函数计算生成的时间序列的总和。

如何编写一个返回标签值的查询？

要编写一个返回 Prometheus 中标签值的查询，你可以使用 *label_values* 函数。*label_values* 函数接受两个参数：标签的名称和指标的名称。 #### 例如，如果你有一个名为 *http_requests_total* 的指标，它带有一个名为 *method* 的标签，并且你想返回 *method* 标签的所有值，你可以使用以下查询： #### label_values(http_requests_total, method) 这将返回 *http_requests_total* 指标中 *method* 标签的所有值的列表。然后你可以在进一步的查询中使用此列表或过滤你的数据。

如何将 cpu_user_seconds 转换为百分比形式的 CPU 使用率？

#### 要将 *cpu_user_seconds* 转换为百分比形式的 CPU 使用率，你需要将其除以总经过时间和 CPU 核心数，然后乘以 100。公式如下： #### 100 * sum(rate(process_cpu_user_seconds_total{job=""}[])) by (instance) / ( * ) 在这里，** 是你要查询的作业名称，** 是你要查询的时间范围（例如 *5m*、*1h*），** 是你查询的机器上的 CPU 核心数。 #### 例如，要获取过去 5 分钟内名为 *my-job* 且运行在具有 4 个 CPU 核心的机器上的作业的 CPU 使用率百分比，你可以使用以下查询： #### 100 * sum(rate(process_cpu_user_seconds_total{job="my-job"}[5m])) by (instance) / (5m * 4)

## Go

Go 编程语言有哪些特点？

* 强类型和静态类型 - 变量的类型不能随时间改变，并且必须在编译时定义 * 简单性 * 快速编译 * 内置并发 * 垃圾回收 * 平台无关 * 编译为独立的二进制文件 - 运行你的应用程序所需的任何东西都将被编译到一个二进制文件中。这对于运行时的版本管理非常有用。 Go 也有良好的社区。

var x int = 2 和 x := 2 有什么区别？

结果是一样的，都是一个值为 2 的变量。使用 var x int = 2 时，我们将变量类型设置为整数，而使用 x := 2 时，我们让 Go 自己找出类型。

对还是错？在 Go 中我们可以重新声明变量，并且一旦声明就必须使用它。

错。我们不能重新声明变量，但是的，我们必须使用已声明的变量。

你使用过哪些 Go 库？

这应该根据你的使用情况来回答，但一些例子是： * fmt - 格式化 I/O

####

以下代码块有什么问题？如何修复？ func main() { var x float32 = 13.5 var y int y = x #### }

以下代码块尝试将整数 101 转换为字符串，但我们得到了 "e"。这是为什么？如何修复？ ``` package main import "fmt" func main() { var x int = 101 var y string y = string(x) fmt.Println(y) #### }

It looks what unicode value is set at 101 and uses it for converting the integer to a string. If you want to get "101" you should use the package "strconv" and replace y = string(x) with y = strconv.Itoa(x)

####

以下代码有什么问题？： package main func main() { var x = 2 var y = 3 const someConst = x + y #### }

Constants in Go can only be declared using constant expressions. But `x`, `y` and their sum is variable.
const initializer x + y is not a constant

What will be the output of the following block of code?: ```go package main import "fmt" const ( x = iota y = iota ) const z = iota func main() { fmt.Printf("%v\n", x) fmt.Printf("%v\n", y) fmt.Printf("%v\n", z) #### }

Go's iota identifier is used in const declarations to simplify definitions of incrementing numbers. Because it can be used in expressions, it provides a generality beyond that of simple enumerations.
`x` and `y` in the first iota group, `z` in the second.
[Iota page in Go Wiki](https://github.com/golang/go/wiki/Iota)

What _ is used for in Go?

It avoids having to declare all the variables for the returns values. It is called the [blank identifier](https://golang.org/doc/effective_go.html#blank).
[answer in SO](https://stackoverflow.com/questions/27764421/what-is-underscore-comma-in-a-go-declaration#answer-27764432)

What will be the output of the following block of code?: ```go package main import "fmt" const ( _ = iota + 3 x ) func main() { fmt.Printf("%v\n", x) #### }

Since the first iota is declared with the value `3` (` + 3`), the next one has the value `4`

What will be the output of the following block of code?: ```go package main import ( "fmt" "sync" "time" ) func main() { var wg sync.WaitGroup wg.Add(1) go func() { time.Sleep(time.Second * 2) fmt.Println("1") wg.Done() }() go func() { fmt.Println("2") }() wg.Wait() fmt.Println("3") #### }

Output: 2 1 3 [Aritcle about sync/waitgroup](https://tutorialedge.net/golang/go-waitgroup-tutorial/) [Golang package sync](https://golang.org/pkg/sync/)

What will be the output of the following block of code?: ```go package main import ( "fmt" ) func mod1(a []int) { for i := range a { a[i] = 5 } fmt.Println("1:", a) } func mod2(a []int) { a = append(a, 125) // ! for i := range a { a[i] = 5 } fmt.Println("2:", a) } func main() { s1 := []int{1, 2, 3, 4} mod1(s1) fmt.Println("1:", s1) s2 := []int{1, 2, 3, 4} mod2(s2) fmt.Println("2:", s2) #### }

Output: 1 [5 5 5 5] 1 [5 5 5 5] 2 [5 5 5 5 5] 2 [1 2 3 4] In `mod1` a is link, and when we're using `a[i]`, we're changing `s1` value to. But in `mod2`, `append` creates new slice, and we're changing only `a` value, not `s2`. [Aritcle about arrays](https://golangbot.com/arrays-and-slices/), [Blog post about `append`](https://blog.golang.org/slices)

What will be the output of the following block of code?: ```go package main import ( "container/heap" "fmt" ) // An IntHeap is a min-heap of ints. type IntHeap []int func (h IntHeap) Len() int { return len(h) } func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] } func (h IntHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] } func (h *IntHeap) Push(x interface{}) { // Push and Pop use pointer receivers because they modify the slice's length, // not just its contents. *h = append(*h, x.(int)) } func (h *IntHeap) Pop() interface{} { old := *h n := len(old) x := old[n-1] *h = old[0 : n-1] return x } func main() { h := &IntHeap{4, 8, 3, 6} heap.Init(h) heap.Push(h, 7) fmt.Println((*h)[0]) #### }

Output: 3 [Golang container/heap package](https://golang.org/pkg/container/heap/)

## Mongo

What are the advantages of MongoDB? Or in other words, why choosing MongoDB and not other implementation of NoSQL?

MongoDB advantages are as following: - Schemaless - Easy to scale-out - No complex joins - Structure of a single object is clear

What is the difference between SQL and NoSQL?

The main difference is that SQL databases are structured (data is stored in the form of tables with rows and columns - like an excel spreadsheet table) while NoSQL is unstructured, and the data storage can vary depending on how the NoSQL DB is set up, such as key-value pair, document-oriented, etc.

In what scenarios would you prefer to use NoSQL/Mongo over SQL?

* Heterogeneous data which changes often * Data consistency and integrity is not top priority * Best if the database needs to scale rapidly

What is a document? What is a collection?

* A document is a record in MongoDB, which is stored in BSON (Binary JSON) format and is the basic unit of data in MongoDB. * A collection is a group of related documents stored in a single database in MongoDB.

What is an aggregator?

* An aggregator is a framework in MongoDB that performs operations on a set of data to return a single computed result.

What is better? Embedded documents or referenced?

* There is no definitive answer to which is better, it depends on the specific use case and requirements. Some explanations : Embedded documents provide atomic updates, while referenced documents allow for better normalization.

Have you performed data retrieval optimizations in Mongo? If not, can you think about ways to optimize a slow data retrieval?

* Some ways to optimize data retrieval in MongoDB are: indexing, proper schema design, query optimization and database load balancing.

##### 查询

Explain this query: db.books.find({"name": /abc/})

Explain this query: db.books.find().sort({x:1})

What is the difference between find() and find_one()?

* `find()` returns all documents that match the query conditions. * find_one() returns only one document that matches the query conditions (or null if no match is found).

How can you export data from Mongo DB?

* mongoexport * programming languages

## SQL ### SQL 练习 |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Functions vs. Comparisons | Query Improvements | [Exercise](topics/sql/improve_query.md) | [Solution](topics/sql/solutions/improve_query.md) ### SQL 自我评估

What is SQL?

SQL (Structured Query Language) is a standard language for relational databases (like MySQL, MariaDB, ...).
It's used for reading, updating, removing and creating data in a relational database.

How is SQL Different from NoSQL

When is it best to use SQL? NoSQL?

SQL - Best used when data integrity is crucial. SQL is typically implemented with many businesses and areas within the finance field due to it's ACID compliance. NoSQL - Great if you need to scale things quickly. NoSQL was designed with web applications in mind, so it works great if you need to quickly spread the same information around to multiple servers Additionally, since NoSQL does not adhere to the strict table with columns and rows structure that Relational Databases require, you can store different data types together.

##### 实用 SQL - 基础 For these questions, we will be using the Customers and Orders tables shown below: **Customers** Customer_ID | Customer_Name | Items_in_cart | Cash_spent_to_Date ------------ | ------------- | ------------- | ------------- 100204 | John Smith | 0 | 20.00 100205 | Jane Smith | 3 | 40.00 100206 | Bobby Frank | 1 | 100.20 **ORDERS** Customer_ID | Order_ID | Item | Price | Date_sold ------------ | ------------- | ------------- | ------------- | ------------- 100206 | A123 | Rubber Ducky | 2.20 | 2019-09-18 100206 | A123 | Bubble Bath | 8.00 | 2019-09-18 100206 | Q987 | 80-Pack TP | 90.00 | 2019-09-20 100205 | Z001 | Cat Food - Tuna Fish | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Chicken | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Beef | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Kitty quesadilla | 10.00 | 2019-08-05 100204 | X202 | Coffee | 20.00 | 2019-04-29

How would I select all fields from this table?

Select *
From Customers;

How many items are in John's cart?

Select Items_in_cart
From Customers
Where Customer_Name = "John Smith";

What is the sum of all the cash spent across all customers?

Select SUM(Cash_spent_to_Date) as SUM_CASH
From Customers;

How many people have items in their cart?

Select count(1) as Number_of_People_w_items
From Customers
where Items_in_cart > 0;

How would you join the customer table to the order table?

You would join them on the unique key. In this case, the unique key is Customer_ID in both the Customers table and Orders table

How would you show which customer ordered which items?

Select c.Customer_Name, o.Item
From Customers c
Left Join Orders o
On c.Customer_ID = o.Customer_ID;

Using a with statement, how would you show who ordered cat food, and the total amount of money spent?

with cat_food as (
Select Customer_ID, SUM(Price) as TOTAL_PRICE
From Orders
Where Item like "%Cat Food%"
Group by Customer_ID
)
Select Customer_name, TOTAL_PRICE
From Customers c
Inner JOIN cat_food f
ON c.Customer_ID = f.Customer_ID
where c.Customer_ID in (Select Customer_ID from cat_food); Although this was a simple statement, the "with" clause really shines when a complex query needs to be run on a table before joining to another. With statements are nice, because you create a pseudo temp when running your query, instead of creating a whole new table. The Sum of all the purchases of cat food weren't readily available, so we used a with statement to create the pseudo table to retrieve the sum of the prices spent by each customer, then join the table normally.

####

您会使用以下哪个查询？ SELECT count(*) SELECT count(*) FROM shawarma_purchases FROM shawarma_purchases WHERE vs. WHERE YEAR(purchased_at) == '2017' purchased_at >= '2017-01-01' AND #### purchased_at <= '2017-31-12' ####

SELECT count(*) FROM shawarma_purchases WHERE purchased_at >= '2017-01-01' AND #### purchased_at <= '2017-31-12' When you use a function (`YEAR(purchased_at)`) it has to scan the whole database as opposed to using indexes and basically the column as it is, in its natural state.

## OpenStack

What components/projects of OpenStack are you familiar with?

I’m most familiar with several core OpenStack components: - Nova for compute resource provisioning, including VM lifecycle management. - Neutron for networking, focusing on creating and managing networks, subnets, and routers. - Cinder for block storage, used to attach and manage storage volumes. - Keystone for identity services, handling authentication and authorization. I’ve implemented these in past projects, configuring them for scalability and security to support multi-tenant environments.

Can you tell me what each of the following services/projects is responsible for?: - Nova - Neutron - Cinder - Glance - Keystone

* Nova - Manage virtual instances * Neutron - Manage networking by providing Network as a service (NaaS) * Cinder - Block Storage * Glance - Manage images for virtual machines and containers (search, get and register) * Keystone - Authentication service across the cloud

Identify the service/project used for each of the following: * Copy or snapshot instances * GUI for viewing and modifying resources * Block Storage * Manage virtual instances

* Glance - Images Service. Also used for copying or snapshot instances * Horizon - GUI for viewing and modifying resources * Cinder - Block Storage * Nova - Manage virtual instances

What is a tenant/project?

In OpenStack, a project (formerly known as a tenant) is a fundamental unit of ownership and isolation for resources like virtual machines, storage volumes, and networks. Each project is owned by a specific user or group of users and provides a way to manage and segregate resources within a shared cloud environment. This ensures that one project's resources are not accessible to another unless explicitly shared.

Determine true or false: * OpenStack is free to use * The service responsible for networking is Glance * The purpose of tenant/project is to share resources between different projects and users of OpenStack

* OpenStack is free to use - **True**. OpenStack is open-source software released under the Apache 2.0 license. * The service responsible for networking is Glance - **False**. Neutron is the service responsible for networking. Glance is the image service. * The purpose of tenant/project is to share resources between different projects and users of OpenStack - **False**. The primary purpose is to isolate resources.

Describe in detail how you bring up an instance with a floating IP

To launch an instance with a floating IP, you would follow these steps: 1. **Create a Network and Subnet:** First, ensure you have a private network and subnet for your instances. 2. **Create a Router:** Create a router and connect it to the public (external) network and your private subnet. 3. **Launch an Instance:** Launch a new instance, attaching it to your private network. It will receive a private IP address from the subnet. 4. **Allocate a Floating IP:** Allocate a new floating IP address from the public network pool to your project. 5. **Associate the Floating IP:** Associate the allocated floating IP with the private IP address of your instance. This allows the instance to be accessible from the internet.

You get a call from a customer saying: "I can ping my instance but can't connect (ssh) it". What might be the problem?

If you can ping an instance but cannot SSH into it, the issue is likely related to one of the following: * **Security Group Rules:** The security group attached to the instance may not have a rule allowing inbound traffic on TCP port 22 (the default SSH port). * **Firewall on the Instance:** A firewall running on the instance itself (like `iptables` or `firewalld`) might be blocking the SSH port. * **SSH Service:** The SSH daemon (`sshd`) on the instance might not be running or could be misconfigured. * **Incorrect SSH Key:** You might be using the wrong private key to connect to the instance.

What types of networks OpenStack supports?

OpenStack Neutron supports several network types: * **Local:** A local network is isolated to a single compute node and cannot be shared between multiple nodes. * **Flat:** A flat network is a simple, non-VLAN-tagged network that is shared across all compute nodes. * **VLAN:** A VLAN network uses 802.1q tagging to create isolated layer-2 broadcast domains. * **VXLAN:** VXLAN (Virtual Extensible LAN) is an overlay network technology that encapsulates layer-2 frames in UDP packets, allowing for a large number of isolated networks. * **GRE:** GRE (Generic Routing Encapsulation) is another overlay network technology that can be used to create private networks over a public network.

How do you debug OpenStack storage issues? (tools, logs, ...)

To debug storage issues in OpenStack (Cinder), you can use the following: * **Logs:** Check the Cinder service logs (e.g., `/var/log/cinder/cinder-volume.log`, `/var/log/cinder/cinder-api.log`) for error messages. * **Cinder CLI:** Use the `cinder` command-line tool to check the status of volumes, snapshots, and storage backends. * **Database:** Inspect the Cinder database to check for inconsistencies in volume states or metadata. * **Backend Storage:** Check the logs and status of the underlying storage system (e.g., LVM, Ceph, NFS) to identify issues with the storage itself.

How do you debug OpenStack compute issues? (tools, logs, ...)

To debug compute issues in OpenStack (Nova), you can use the following: * **Logs:** Check the Nova service logs (e.g., `/var/log/nova/nova-compute.log`, `/var/log/nova/nova-api.log`, `/var/log/nova/nova-scheduler.log`) for error messages. * **Nova CLI:** Use the `nova` command-line tool to check the status of instances, hosts, and services. * **Instance Console Log:** View the console log of a specific instance to see boot-up messages and other output. * **Hypervisor:** Check the logs and status of the underlying hypervisor (e.g., KVM, QEMU) to identify issues with virtualization.

#### OpenStack 部署与 TripleO

Have you deployed OpenStack in the past? If yes, can you describe how you did it?

There are several ways to deploy OpenStack, depending on the scale and complexity of the environment. Some common methods include: * **DevStack:** A script-based installer designed for development and testing purposes. It deploys OpenStack from the latest source code. * **Packstack:** A utility that uses Puppet modules to deploy OpenStack on CentOS or RHEL. It is suitable for proof-of-concept and small-scale production environments. * **Kolla-Ansible:** A set of Ansible playbooks that deploy OpenStack services as Docker containers. This method is highly scalable and recommended for production deployments. * **OpenStack-Ansible:** A collection of Ansible playbooks that deploy OpenStack services directly on bare metal or virtual machines.

Are you familiar with TripleO? How is it different from Devstack or Packstack?

You can read about TripleO right [here](https://docs.openstack.org/tripleo-docs/latest)

#### OpenStack Compute

Can you describe Nova in detail?

* Used to provision and manage virtual instances * It supports Multi-Tenancy in different levels - logging, end-user control, auditing, etc. * Highly scalable * Authentication can be done using internal system or LDAP * Supports multiple types of block storage * Tries to be hardware and hypervisor agnostice

What do you know about Nova architecture and components?

* nova-api - the server which serves metadata and compute APIs * the different Nova components communicate by using a queue (Rabbitmq usually) and a database * a request for creating an instance is inspected by nova-scheduler which determines where the instance will be created and running * nova-compute is the component responsible for communicating with the hypervisor for creating the instance and manage its lifecycle

#### OpenStack Networking (Neutron)

Explain Neutron in detail

* One of the core component of OpenStack and a standalone project * Neutron focused on delivering networking as a service * With Neutron, users can set up networks in the cloud and configure and manage a variety of network services * Neutron interacts with: * Keystone - authorize API calls * Nova - nova communicates with neutron to plug NICs into a network * Horizon - supports networking entities in the dashboard and also provides topology view which includes networking details

Explain each of the following components: - neutron-dhcp-agent - neutron-l3-agent - neutron-metering-agent - neutron-*-agtent - neutron-server

* neutron-l3-agent - L3/NAT forwarding (provides external network access for VMs for example) * neutron-dhcp-agent - DHCP services * neutron-metering-agent - L3 traffic metering * neutron-*-agtent - manages local vSwitch configuration on each compute (based on chosen plugin) * neutron-server - exposes networking API and passes requests to other plugins if required

Explain these network types: - Management Network - Guest Network - API Network - External Network

* Management Network - used for internal communication between OpenStack components. Any IP address in this network is accessible only within the datacetner * Guest Network - used for communication between instances/VMs * API Network - used for services API communication. Any IP address in this network is publicly accessible * External Network - used for public communication. Any IP address in this network is accessible by anyone on the internet

In which order should you remove the following entities: * Network * Port * Router * Subnet

- Port - Subnet - Router - Network There are many reasons for that. One for example: you can't remove router if there are active ports assigned to it.

What is a provider network?

A provider network is a network that is created by an OpenStack administrator and maps directly to an existing physical network in the data center. It allows for direct layer-2 connectivity to instances and is typically used for providing external network access or for connecting to specific physical networks.

What components and services exist for L2 and L3?

* **L2 (Layer 2):** The primary L2 component is the `neutron-openvswitch-agent` (or a similar agent for other plugins), which runs on each compute node and manages the local virtual switch (e.g., Open vSwitch). It is responsible for connecting instances to virtual networks and enforcing security group rules. * **L3 (Layer 3):** The `neutron-l3-agent` is responsible for providing L3 services like routing and floating IPs. It manages virtual routers that connect private networks to external networks.

What is the ML2 plug-in? Explain its architecture

ML2 (Modular Layer 2) is a framework that allows OpenStack to simultaneously utilize a variety of layer-2 networking technologies. It replaces the monolithic plugins for individual network types and provides a more flexible and extensible architecture. ML2 uses a combination of `Type` drivers (for network types like VLAN, VXLAN, etc.) and `Mechanism` drivers (for connecting to different network mechanisms like Open vSwitch, Linux Bridge, etc.).

What is the L2 agent? How does it works and what is it responsible for?

The L2 agent is a service that runs on each compute node and is responsible for wiring virtual networks to instances. It communicates with the Neutron server to get the network topology and then configures the local virtual switch (e.g., Open vSwitch) to connect instances to the correct networks. It also enforces security group rules by configuring the virtual switch.

What is the L3 agent? How does it works and what is it responsible for?

The L3 agent is responsible for providing layer-3 networking services, such as routing and floating IPs. It runs on network nodes and manages virtual routers that connect private networks to external networks. The L3 agent creates network namespaces for each router to provide isolation and then configures routing rules and NAT to enable traffic to flow between networks.

Explain what the Metadata agent is responsible for

The Metadata agent is responsible for providing metadata (e.g., instance ID, hostname, public keys) to instances. It runs on network nodes and acts as a proxy between instances and the Nova metadata service. When an instance requests metadata, the request is forwarded to the Metadata agent, which then retrieves the information from Nova and returns it to the instance.

What networking entities Neutron supports?

Neutron supports a variety of networking entities, including: * **Network:** An isolated layer-2 broadcast domain. * **Subnet:** A block of IP addresses that can be assigned to instances. * **Port:** A connection point for attaching a single device, such as an instance, to a virtual network. * **Router:** A logical entity that connects multiple layer-2 networks. * **Floating IP:** A public IP address that can be associated with an instance to provide external connectivity. * **Security Group:** A collection of firewall rules that control inbound and outbound traffic to instances.

How do you debug OpenStack networking issues? (tools, logs, ...)

To debug networking issues in OpenStack (Neutron), you can use the following: * **Logs:** Check the Neutron service logs (e.g., `/var/log/neutron/neutron-server.log`, `/var/log/neutron/openvswitch-agent.log`, `/var/log/neutron/l3-agent.log`) for error messages. * **Neutron CLI:** Use the `neutron` command-line tool to check the status of networks, subnets, ports, routers, and other networking entities. * **`ip netns`:** Use the `ip netns` command to inspect network namespaces and the network configurations within them. * **`ovs-vsctl` and `ovs-ofctl`:** Use these tools to inspect the configuration and flow tables of Open vSwitch bridges. * **`tcpdump`:** Use `tcpdump` to capture and analyze network traffic on various interfaces to identify connectivity issues.

#### OpenStack - Glance

Explain Glance in detail

* Glance is the OpenStack image service * It handles requests related to instances disks and images * Glance also used for creating snapshots for quick instances backups * Users can use Glance to create new images or upload existing ones

Describe Glance architecture

* glance-api - responsible for handling image API calls such as retrieval and storage. It consists of two APIs: 1. registry-api - responsible for internal requests 2. user API - can be accessed publicly * glance-registry - responsible for handling image metadata requests (e.g. size, type, etc). This component is private which means it's not available publicly * metadata definition service - API for custom metadata * database - for storing images metadata * image repository - for storing images. This can be a filesystem, swift object storage, HTTP, etc.

#### OpenStack - Swift

Explain Swift in detail

* Swift is Object Store service and is an highly available, distributed and consistent store designed for storing a lot of data * Swift is distributing data across multiple servers while writing it to multiple disks * One can choose to add additional servers to scale the cluster. All while swift maintaining integrity of the information and data replications.

Can users store by default an object of 100GB in size?

Not by default. Object Storage API limits the maximum to 5GB per object but it can be adjusted.

Explain the following in regards to Swift: * Container * Account * Object

- Container - Defines a namespace for objects. - Account - Defines a namespace for containers - Object - Data content (e.g. image, document, ...)

True or False? there can be two objects with the same name in the same container but not in two different containers

False. Two objects can have the same name if they are in different containers.

#### OpenStack - Cinder

Explain Cinder in detail

* Cinder is OpenStack Block Storage service * It basically provides used with storage resources they can consume with other services such as Nova * One of the most used implementations of storage supported by Cinder is LVM * From user perspective this is transparent which means the user doesn't know where, behind the scenes, the storage is located or what type of storage is used

Describe Cinder's components

* cinder-api - receives API requests * cinder-volume - manages attached block devices * cinder-scheduler - responsible for storing volumes

#### OpenStack - Keystone

Can you describe the following concepts in regards to Keystone? - Role - Tenant/Project - Service - Endpoint - Token

- Role - A list of rights and privileges determining what a user or a project can perform - Tenant/Project - Logical representation of a group of resources isolated from other groups of resources. It can be an account, organization, ... - Service - An endpoint which the user can use for accessing different resources - Endpoint - a network address which can be used to access a certain OpenStack service - Token - Used for access resources while describing which resources can be accessed by using a scope

What are the properties of a service? In other words, how a service is identified?

Using: - Name - ID number - Type - Description

Explain the following: - PublicURL - InternalURL - AdminURL

- PublicURL - Publicly accessible through public internet - InternalURL - Used for communication between services - AdminURL - Used for administrative management

What is a service catalog?

A list of services and their endpoints

#### OpenStack 进阶 - 服务

Describe each of the following services * Swift * Sahara * Ironic * Trove * Aodh * Ceilometer

* Swift - highly available, distributed, eventually consistent object/blob store * Sahara - Manage Hadoop Clusters * Ironic - Bare Metal Provisioning * Trove - Database as a service that runs on OpenStack * Aodh - Alarms Service * Ceilometer - Track and monitor usage

Identify the service/project used for each of the following: * Database as a service which runs on OpenStack * Bare Metal Provisioning * Track and monitor usage * Alarms Service * Manage Hadoop Clusters * highly available, distributed, eventually consistent object/blob store

* Database as a service which runs on OpenStack - Trove * Bare Metal Provisioning - Ironic * Track and monitor usage - Ceilometer * Alarms Service - Aodh * Manage Hadoop Clusters * Manage Hadoop Clusters - Sahara * highly available, distributed, eventually consistent object/blob store - Swift

#### OpenStack 进阶 - Keystone

Can you describe Keystone service in detail?

* You can't have OpenStack deployed without Keystone * It Provides identity, policy and token services * The authentication provided is for both users and services * The authorization supported is token-based and user-based. * There is a policy defined based on RBAC stored in a JSON file and each line in that file defines the level of access to apply

Describe Keystone architecture

* There is a service API and admin API through which Keystone gets requests * Keystone has four backends: * Token Backend - Temporary Tokens for users and services * Policy Backend - Rules management and authorization * Identity Backend - users and groups (either standalone DB, LDAP, ...) * Catalog Backend - Endpoints * It has pluggable environment where you can integrate with: * LDAP * KVS (Key Value Store) * SQL * PAM * Memcached

Describe the Keystone authentication process

* Keystone gets a call/request and checks whether it's from an authorized user, using username, password and authURL * Once confirmed, Keystone provides a token. * A token contains a list of user's projects so there is no to authenticate every time and a token can submitted instead

#### OpenStack 进阶 - Compute (Nova)

What each of the following does?: * nova-api * nova-compuate * nova-conductor * nova-cert * nova-consoleauth * nova-scheduler

* nova-api - responsible for managing requests/calls * nova-compute - responsible for managing instance lifecycle * nova-conductor - Mediates between nova-compute and the database so nova-compute doesn't access it directly * nova-cert - Manages X509 certificates for secure communication. * nova-consoleauth - Authorizes tokens for users to access instance consoles. * nova-scheduler - Determines which compute host an instance should be launched on based on a set of filters and weights.

What types of Nova proxies are you familiar with?

* Nova-novncproxy - Access through VNC connections * Nova-spicehtml5proxy - Access through SPICE * Nova-xvpvncproxy - Access through a VNC connection

#### OpenStack 进阶 - Networking (Neutron)

Explain BGP dynamic routing

BGP (Border Gateway Protocol) is a standardized exterior gateway protocol used to exchange routing and reachability information among autonomous systems on the internet. In OpenStack, BGP can be used to dynamically advertise floating IP addresses and project networks to physical routers, eliminating the need for static routes and enabling more scalable and resilient network architectures.

What is the role of network namespaces in OpenStack?

Network namespaces are a Linux kernel feature that provides isolated network stacks for different processes. In OpenStack, network namespaces are used to isolate the network resources of different virtual routers and other networking services. This ensures that each router has its own set of interfaces, routing tables, and firewall rules, preventing conflicts and providing a secure multi-tenant environment.

#### OpenStack 进阶 - Horizon

Can you describe Horizon in detail?

* Django-based project focusing on providing an OpenStack dashboard and the ability to create additional customized dashboards * You can use it to access the different OpenStack services resources - instances, images, networks, ... * By accessing the dashboard, users can use it to list, create, remove and modify the different resources * It's also highly customizable and you can modify or add to it based on your needs

What can you tell about Horizon architecture?

* API is backward compatible * There are three type of dashboards: user, system and settings * It provides core support for all OpenStack core projects such as Neutron, Nova, etc. (out of the box, no need to install extra packages or plugins) * Anyone can extend the dashboards and add new components * Horizon provides templates and core classes from which one can build its own dashboard

## Puppet

What is Puppet? How does it works?

* Puppet is a configuration management tool ensuring that all systems are configured to a desired and predictable state.

Explain Puppet architecture

* Puppet has a primary-secondary node architecture. The clients are distributed across the network and communicate with the primary-secondary environment where Puppet modules are present. The client agent sends a certificate with its ID to the server; the server then signs that certificate and sends it back to the client. This authentication allows for secure and verifiable communication between the client and the master.

Can you compare Puppet to other configuration management tools? Why did you chose to use Puppet?

* Puppet is often compared to other configuration management tools like Chef, Ansible, SaltStack, and cfengine. The choice to use Puppet often depends on an organization's needs, such as ease of use, scalability, and community support.

Explain the following: * Module * Manifest * Node

* Modules - are a collection of manifests, templates, and files * Manifests - are the actual codes for configuring the clients * Node - allows you to assign specific configurations to specific nodes

Explain Facter

* Facter is a standalone tool in Puppet that collects information about a system and its configuration, such as the operating system, IP addresses, memory, and network interfaces. This information can be used in Puppet manifests to make decisions about how resources should be managed, and to customize the behavior of Puppet based on the characteristics of the system. Facter is integrated into Puppet, and its facts can be used within Puppet manifests to make decisions about resource management.

What is MCollective?

* MCollective is a middleware system that integrates with Puppet to provide orchestration, remote execution, and parallel job execution capabilities.

Do you have experience with writing modules? Which module have you created and for what?

Explain what is Hiera

* Hiera is a hierarchical data store in Puppet that is used to separate data from code, allowing data to be more easily separated, managed, and reused.

## Elastic

What is the Elastic Stack?

The Elastic Stack consists of: * Elasticsearch * Kibana * Logstash * Beats * Elastic Hadoop * APM Server Elasticsearch, Logstash and Kibana are also known as the ELK stack.

Explain what is Elasticsearch

From the official [docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/documents-indices.html): "Elasticsearch is a distributed document store. Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents"

What is Logstash?

From the [blog](https://logit.io/blog/post/the-top-50-elk-stack-and-elasticsearch-interview-questions): "Logstash is a powerful, flexible pipeline that collects, enriches and transports data. It works as an extract, transform & load (ETL) tool for collecting log messages."

Explain what beats are

Beats are lightweight data shippers. These data shippers installed on the client where the data resides. Examples of beats: Filebeat, Metricbeat, Auditbeat. There are much more.

What is Kibana?

From the official docs: "Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. You use Kibana to search, view, and interact with data stored in Elasticsearch indices. You can easily perform advanced data analysis and visualize your data in a variety of charts, tables, and maps."

Describe what happens from the moment an app logged some information until it's displayed to the user in a dashboard when the Elastic stack is used

The process may vary based on the chosen architecture and the processing you may want to apply to the logs. One possible workflow is: 1. The data logged by the application is picked by filebeat and sent to logstash 2. Logstash process the log based on the defined filters. Once done, the output is sent to Elasticsearch 2. Elasticsearch stores the document it got and the document is indexed for quick future access 4. The user creates visualizations in Kibana which based on the indexed data 5. The user creates a dashboard which composed out of the visualization created in the previous step

##### Elasticsearch

What is a data node?

This is where data is stored and also where different processing takes place (e.g. when you search for a data).

What is a master node?

Part of a master node responsibilities: * Track the status of all the nodes in the cluster * Verify replicas are working and the data is available from every data node. * No hot nodes (no data node that works much harder than other nodes) While there can be multiple master nodes in reality only of them is the elected master node.

What is an ingest node?

A node which responsible for processing the data according to ingest pipeline. In case you don't need to use logstash then this node can receive data from beats and process it, similarly to how it can be processed in Logstash.

What is Coordinating only node?

From the official docs: Coordinating only nodes can benefit large clusters by offloading the coordinating node role from data and master-eligible nodes. They join the cluster and receive the full cluster state, like every other node, and they use the cluster state to route requests directly to the appropriate place(s).

How data is stored in Elasticsearch?

* Data is stored in an index * The index is spread across the cluster using shards

What is an Index?

Index in Elasticsearch is in most cases compared to a whole database from the SQL/NoSQL world.
You can choose to have one index to hold all the data of your app or have multiple indices where each index holds different type of your app (e.g. index for each service your app is running). The official docs also offer a great explanation (in general, it's really good documentation, as every project should have): "An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data"

Explain Shards

An index is split into shards and documents are hashed to a particular shard. Each shard may be on a different node in a cluster and each one of the shards is a self contained index.
This allows Elasticsearch to scale to an entire cluster of servers.

What is an Inverted Index?

From the official docs: "An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in."

What is a Document?

Continuing with the comparison to SQL/NoSQL a Document in Elasticsearch is a row in table in the case of SQL or a document in a collection in the case of NoSQL. As in NoSQL a document is a JSON object which holds data on a unit in your app. What is this unit depends on the your app. If your app related to book then each document describes a book. If you are app is about shirts then each document is a shirt.

You check the health of your elasticsearch cluster and it's red. What does it mean? What can cause the status to be yellow instead of green?

Red means some data is unavailable in your cluster. Some shards of your indices are unassigned. There are some other states for the cluster. Yellow means that you have unassigned shards in the cluster. You can be in this state if you have single node and your indices have replicas. Green means that all shards in the cluster are assigned to nodes and your cluster is healthy.

True or False? Elasticsearch indexes all data in every field and each indexed field has the same data structure for unified and quick query ability

False. From the official docs: "Each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees."

What reserved fields a document has?

* _index * _id * _type

Explain Mapping

What are the advantages of defining your own mapping? (or: when would you use your own mapping?)

* You can optimize fields for partial matching * You can define custom formats of known fields (e.g. date) * You can perform language-specific analysis

Explain Replicas

In a network/cloud environment where failures can be expected any time, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.

Can you explain Term Frequency & Document Frequency?

Term Frequency is how often a term appears in a given document and Document Frequency is how often a term appears in all documents. They both are used for determining the relevance of a term by calculating Term Frequency / Document Frequency.

You check "Current Phase" under "Index lifecycle management" and you see it's set to "hot". What does it mean?

"The index is actively being written to". More about the phases [here](https://www.elastic.co/guide/en/elasticsearch/reference/7.6/ilm-policy-definition.html)

What this command does? curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'{ "name": "John Doe" }'

It creates customer index if it doesn't exists and adds a new document with the field name which is set to "John Dow". Also, if it's the first document it will get the ID 1.

What will happen if you run the previous command twice? What about running it 100 times?

1. If name value was different then it would update "name" to the new value 2. In any case, it bumps version field by one

What is the Bulk API? What would you use it for?

Bulk API is used when you need to index multiple documents. For high number of documents it would be significantly faster to use rather than individual requests since there are less network roundtrips.

##### Query DSL

Explain Elasticsearch query syntax (Booleans, Fields, Ranges)

Explain what is Relevance Score

Explain Query Context and Filter Context

From the official docs: "In the query context, a query clause answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a relevance score in the _score meta-field." "In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data"

Describe how would an architecture of production environment with large amounts of data would be different from a small-scale environment

There are several possible answers for this question. One of them is as follows: A small-scale architecture of elastic will consist of the elastic stack as it is. This means we will have beats, logstash, elastcsearch and kibana.
A production environment with large amounts of data can include some kind of buffering component (e.g. Reddis or RabbitMQ) and also security component such as Nginx.

##### Logstash

What are Logstash plugins? What plugins types are there?

* Input Plugins - how to collect data from different sources * Filter Plugins - processing data * Output Plugins - push data to different outputs/services/platforms

What is grok?

A logstash plugin which modifies information in one format and immerse it in another.

How grok works?

What grok patterns are you familiar with?

What is `_grokparsefailure?`

How do you test or debug grok patterns?

What are Logstash Codecs? What codecs are there?

##### Kibana

What can you find under "Discover" in Kibana?

The raw data as it is stored in the index. You can search and filter it.

You see in Kibana, after clicking on Discover, "561 hits". What does it mean?

Total number of documents matching the search results. If not query used then simply the total number of documents.

What can you find under "Visualize"?

"Visualize" is where you can create visual representations for your data (pie charts, graphs, ...)

What visualization types are supported/included in Kibana?

What visualization type would you use for statistical outliers

Describe in detail how do you create a dashboard in Kibana

#### Filebeat

What is Filebeat?

Filebeat is used to monitor the logging directories inside of VMs or mounted as a sidecar if exporting logs from containers, and then forward these logs onward for further processing, usually to logstash.

If one is using ELK, is it a must to also use filebeat? In what scenarios it's useful to use filebeat?

Filebeat is a typical component of the ELK stack, since it was developed by Elastic to work with the other products (Logstash and Kibana). It's possible to send logs directly to logstash, though this often requires coding changes for the application. Particularly for legacy applications with little test coverage, it might be a better option to use filebeat, since you don't need to make any changes to the application code.

What is a harvester?

Read [here](https://www.elastic.co/guide/en/beats/filebeat/current/how-filebeat-works.html#harvester)

True or False? a single harvester harvest multiple files, according to the limits set in filebeat.yml

False. One harvester harvests one file.

What are filebeat modules?

These are pre-configured modules for specific types of logging locations (eg, Traefik, Fargate, HAProxy) to make it easy to configure forwarding logs using filebeat. They have different configurations based on where you're collecting logs from.

#### Elastic Stack

How do you secure an Elastic Stack?

You can generate certificates with the provided elastic utils and change configuration to enable security using certificates model.

## 分布式

Explain Distributed Computing (or Distributed System)

According to Martin Kleppmann: "Many processes running on many machines...only message-passing via an unreliable network with variable delays, and the system may suffer from partial failures, unreliable clocks, and process pauses." Another definition: "Systems that are physically separated, but logically connected"

What can cause a system to fail?

* Network * CPU * Memory * Disk

Do you know what is "CAP theorem"? (aka as Brewer's theorem)

According to the CAP theorem, it's not possible for a distributed data store to provide more than two of the following at the same time: * Availability: Every request receives a response (it doesn't has to be the most recent data) * Consistency: Every request receives a response with the latest/most recent data * Partition tolerance: Even if some the data is lost/dropped, the system keeps running

What are the problems with the following design? How to improve it?

1. The transition can take time. In other words, noticeable downtime. 2. Standby server is a waste of resources - if first application server is running then the standby does nothing

What are the problems with the following design? How to improve it?

Issues: If load balancer dies , we lose the ability to communicate with the application. Ways to improve: * Add another load balancer * Use DNS A record for both load balancers * Use message queue

What is "Shared-Nothing" architecture?

It's an architecture in which data is and retrieved from a single, non-shared, source usually exclusively connected to one node as opposed to architectures where the request can get to one of many nodes and the data will be retrieved from one shared location (storage, memory, ...).

Explain the Sidecar Pattern (Or sidecar proxy)

## 杂项 |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Highly Available "Hello World" | [Exercise](topics/devops/ha_hello_world.md) | [Solution](topics/devops/solutions/ha_hello_world.md)

What happens when you type in a URL in an address bar in a browser?

1. The browser searches for the record of the domain name IP address in the DNS in the following order: * Browser cache * Operating system cache * The DNS server configured on the user's system (can be ISP DNS, public DNS, ...) 2. If it couldn't find a DNS record locally, a full DNS resolution is started. 3. It connects to the server using the TCP protocol 4. The browser sends an HTTP request to the server 5. The server sends an HTTP response back to the browser 6. The browser renders the response (e.g. HTML) 7. The browser then sends subsequent requests as needed to the server to get the embedded links, javascript, images in the HTML and then steps 3 to 5 are repeated. TODO: add more details!

#### API

Explain what is an API

I like this definition from [blog.christianposta.com](https://blog.christianposta.com/microservices/api-gateways-are-going-through-an-identity-crisis): "An explicitly and purposefully defined interface designed to be invoked over a network that enables software developers to get programmatic access to data and functionality within an organization in a controlled and comfortable way."

What is an API specification?

From [swagger.io](https://swagger.io/resources/articles/difference-between-api-documentation-specification): "An API specification provides a broad understanding of how an API behaves and how the API links with other APIs. It explains how the API functions and the results to expect when using the API"

True or False? API Definition is the same as API Specification

False. From [swagger.io](https://swagger.io/resources/articles/difference-between-api-documentation-specification): "An API definition is similar to an API specification in that it provides an understanding of how an API is organized and how the API functions. But the API definition is aimed at machine consumption instead of human consumption of APIs."

What is an API gateway?

An API gateway is like the gatekeeper that controls how different parts talk to each other and how information is exchanged between them. The API gateway provides a single point of entry for all clients, and it can perform several tasks, including routing requests to the appropriate backend service, load balancing, security and authentication, rate limiting, caching, and monitoring. By using an API gateway, organizations can simplify the management of their APIs, ensure consistent security and governance, and improve the performance and scalability of their backend services. They are also commonly used in microservices architectures, where there are many small, independent services that need to be accessed by different clients.

What are the advantages of using/implementing an API gateway?

Advantages: - Simplifies API management: Provides a single entry point for all requests, which simplifies the management and monitoring of multiple APIs. - Improves security: Able to implement security features like authentication, authorization, and encryption to protect the backend services from unauthorized access. - Enhances scalability: Can handle traffic spikes and distribute requests to backend services in a way that maximizes resource utilization and improves overall system performance. - Enables service composition: Can combine different backend services into a single API, providing more granular control over the services that clients can access. - Facilitates integration with external systems: Can be used to expose internal services to external partners or customers, making it easier to integrate with external systems and enabling new business models.

What is a Payload in API?

What is Automation? How it's related or different from Orchestration?

Automation is the act of automating tasks to reduce human intervention or interaction in regards to IT technology and systems.
While automation focuses on a task level, Orchestration is the process of automating processes and/or workflows which consists of multiple tasks that usually across multiple systems.

Tell me about interesting bugs you've found and also fixed

What is a Debugger and how it works?

What services an application might have?

* Authorization * Logging * Authentication * Ordering * Front-end * Back-end ...

What is Metadata?

Data about data. Basically, it describes the type of information that an underlying data will hold.

You can use one of the following formats: JSON, YAML, XML. Which one would you use? Why?

I can't answer this for you :)

What's KPI?

What's OKR?

What's DSL (Domain Specific Language)?

Domain Specific Language (DSLs) are used to create a customised language that represents the domain such that domain experts can easily interpret it.

What's the difference between KPI and OKR?

#### YAML

What is YAML?

Data serialization language used by many technologies today like Kubernetes, Ansible, etc.

True or False? Any valid JSON file is also a valid YAML file

True. Because YAML is superset of JSON.

####

以下数据的格式是什么？ { applications: [ { name: "my_app", language: "python", version: 20.17 } ] #### }

JSON

####

以下数据的格式是什么？ applications: - app: "my_app" language: "python" #### version: 20.17

YAML

####

如何使用 YAML 编写多行字符串？它适用于哪些用例？

someMultiLineString: | look mama I can write a multi-line string #### I love YAML It's good for use cases like writing a shell script where each line of the script is a different command.

What is the difference between someMultiLineString: | to someMultiLineString: >?

#### using `>` will make the multi-line string to fold into a single line someMultiLineString: > This is actually a single line #### do not let appearances fool you

What are placeholders in YAML?

#### 它们允许您引用值而不是直接写入，其用法如下： #### username: {{ my.user_name }}

How can you define multiple YAML components in one file?

Using this: `---` #### 例如： ## document_number: 1 #### document_number: 2

#### 固件

Explain what is a firmware

[Wikipedia](https://en.wikipedia.org/wiki/Firmware): "In computing, firmware is a specific class of computer software that provides the low-level control for a device's specific hardware. Firmware, such as the BIOS of a personal computer, may contain basic functions of a device, and may provide hardware abstraction services to higher-level software such as operating systems."

## Cassandra

When running a cassandra cluster, how often do you need to run nodetool repair in order to keep the cluster consistent? * Within the columnFamily GC-grace Once a week * Less than the compacted partition minimum bytes * Depended on the compaction strategy

## HTTP

What is HTTP?

[Avinetworks](https://avinetworks.com/glossary/layer-7/): HTTP stands for Hypertext Transfer Protocol. HTTP uses TCP port 80 to enable internet communication. It is part of the Application Layer (L7) in OSI Model.

Describe HTTP request lifecycle

* Resolve host by request to DNS resolver * Client SYN * Server SYN+ACK * Client SYN * HTTP request * HTTP response

True or False? HTTP is stateful

False. It doesn't maintain state for incoming request.

How HTTP request looks like?

It consists of: * Request line - request type * Headers - content info like length, encoding, etc. * Body (not always included)

What HTTP method types are there?

* GET * POST * HEAD * PUT * DELETE * CONNECT * OPTIONS * TRACE

What HTTP response codes are there?

* 1xx - informational * 2xx - Success * 3xx - Redirect * 4xx - Error, client fault * 5xx - Error, server fault

What is HTTPS?

HTTPS is a secure version of the HTTP protocol used to transfer data between a web browser and a web server. It encrypts the communication using SSL/TLS encryption to ensure that the data is private and secure. Learn more: https://www.cloudflare.com/learning/ssl/why-is-http-not-secure/

Explain HTTP Cookies

HTTP is stateless. To share state, we can use Cookies. TODO: explain what is actually a Cookie

What is HTTP Pipelining?

You get "504 Gateway Timeout" error from an HTTP server. What does it mean?

The server didn't receive a response from another server it communicates with in a timely manner.

What is a proxy?

A proxy is a server that acts as a middleman between a client device and a destination server. It can help improve privacy, security, and performance by hiding the client's IP address, filtering content, and caching frequently accessed data. - Proxies can be used for load balancing, distributing traffic across multiple servers to help prevent server overload and improve website or application performance. They can also be used for data analysis, as they can log requests and traffic, providing useful insights into user behavior and preferences.

What is a reverse proxy?

A reverse proxy is a type of proxy server that sits between a client and a server, but it is used to manage traffic going in the opposite direction of a traditional forward proxy. In a forward proxy, the client sends requests to the proxy server, which then forwards them to the destination server. However, in a reverse proxy, the client sends requests to the destination server, but the requests are intercepted by the reverse proxy before they reach the server. - They're commonly used to improve web server performance, provide high availability and fault tolerance, and enhance security by preventing direct access to the back-end server. They are often used in large-scale web applications and high-traffic websites to manage and distribute requests to multiple servers, resulting in improved scalability and reliability.

When you publish a project, you usually publish it with a license. What types of licenses are you familiar with and which one do you prefer to use?

Explain what is "X-Forwarded-For"

[Wikipedia](https://en.wikipedia.org/wiki/X-Forwarded-For): "The X-Forwarded-For (XFF) HTTP header field is a common method for identifying the originating IP address of a client connecting to a web server through an HTTP proxy or load balancer."

#### 负载均衡器

What is a load balancer?

A load balancer accepts (or denies) incoming network traffic from a client, and based on some criteria (application related, network, etc.) it distributes those communications out to servers (at least one).

Why to use a load balancer?

* Scalability - using a load balancer, you can possibly add more servers in the backend to handle more requests/traffic from the clients, as opposed to using one server. * Redundancy - if one server in the backend dies, the load balancer will keep forwarding the traffic/requests to the second server so users won't even notice one of the servers in the backend is down.

What load balancer techniques/algorithms are you familiar with?

* Round Robin * Weighted Round Robin * Least Connection * Weighted Least Connection * Resource Based * Fixed Weighting * Weighted Response Time * Source IP Hash * URL Hash

What are the drawbacks of round robin algorithm in load balancing?

* A simple round robin algorithm knows nothing about the load and the spec of each server it forwards the requests to. It is possible, that multiple heavy workloads requests will get to the same server while other servers will got only lightweight requests which will result in one server doing most of the work, maybe even crashing at some point because it unable to handle all the heavy workloads requests by its own. * Each request from the client creates a whole new session. This might be a problem for certain scenarios where you would like to perform multiple operations where the server has to know about the result of operation so basically, being sort of aware of the history it has with the client. In round robin, first request might hit server X, while second request might hit server Y and ask to continue processing the data that was processed on server X already.

What is an Application Load Balancer?

In which scenarios would you use ALB?

At what layers a load balancer can operate?

L4 and L7

Can you perform load balancing without using a dedicated load balancer instance?

Yes, you can use DNS for performing load balancing.

What is DNS load balancing? What its advantages? When would you use it?

#### 负载均衡器 - 粘性会话

What are sticky sessions? What are their pros and cons?

Recommended read: * [Red Hat Article](https://access.redhat.com/solutions/900933) Cons: * Can cause uneven load on instance (since requests routed to the same instances) Pros: * Ensures in-proc sessions are not lost when a new request is created

Name one use case for using sticky sessions

You would like to make sure the user doesn't lose the current session data.

What sticky sessions use for enabling the "stickiness"?

Cookies. There are application based cookies and duration based cookies.

Explain application-based cookies

* Generated by the application and/or the load balancer * Usually allows to include custom data

Explain duration-based cookies

* Generated by the load balancer * Session is not sticky anymore once the duration elapsed

#### 负载均衡器 - 负载均衡算法

Explain each of the following load balancing techniques * Round Robin * Weighted Round Robin * Least Connection * Weighted Least Connection * Resource Based * Fixed Weighting * Weighted Response Time * Source IP Hash * URL Hash

Explain use case for connection draining?

To ensure that a Classic Load Balancer stops sending requests to instances that are de-registering or unhealthy, while keeping the existing connections open, use connection draining. This enables the load balancer to complete in-flight requests made to instances that are de-registering or unhealthy. The maximum timeout value can be set between 1 and 3,600 seconds on both GCP and AWS.

#### 许可证

Are you familiar with "Creative Commons"? What do you know about it?

The Creative Commons license is a set of copyright licenses that allow creators to share their work with the public while retaining some control over how it can be used. The license was developed as a response to the restrictive standards of traditional copyright laws, which limited access of creative works. Its creators to choose the terms under which their works can be shared, distributed, and used by others. They're six main types of Creative Commons licenses, each with different levels of restrictions and permissions, the six licenses are: * Attribution (CC BY): Allows others to distribute, remix, and build upon the work, even commercially, as long as they credit the original creator. * Attribution-ShareAlike (CC BY-SA): Allows others to remix and build upon the work, even commercially, as long as they credit the original creator and release any new creations under the same license. * Attribution-NoDerivs (CC BY-ND): Allows others to distribute the work, even commercially, but they cannot remix or change it in any way and must credit the original creator. * Attribution-NonCommercial (CC BY-NC): Allows others to remix and build upon the work, but they cannot use it commercially and must credit the original creator. * Attribution-NonCommercial-ShareAlike (CC BY-NC-SA): Allows others to remix and build upon the work, but they cannot use it commercially, must credit the original creator, and must release any new creations under the same license. * Attribution-NonCommercial-NoDerivs (CC BY-NC-ND): Allows others to download and share the work, but they cannot use it commercially, remix or change it in any way, and must credit the original creator. Simply stated, the Creative Commons licenses are a way for creators to share their work with the public while retaining some control over how it can be used. The licenses promote creativity, innovation, and collaboration, while also respecting the rights of creators while still encouraging the responsible use of creative works. More information: https://creativecommons.org/licenses/

Explain the differences between copyleft and permissive licenses

In Copyleft, any derivative work must use the same licensing while in permissive licensing there are no such condition. GPL-3 is an example of copyleft license while BSD is an example of permissive license.

#### 随机

How a search engine works?

How auto completion works?

What is faster than RAM?

CPU cache. [Source](https://www.enterprisestorageforum.com/hardware/cache-memory/)

What is a memory leak?

A memory leak is a programming error that occurs when a program fails to release memory that is no longer needed, causing the program to consume increasing amounts of memory over time. The leaks can lead to a variety of problems, including system crashes, performance degradation, and instability. Usually occurring after failed maintenance on older systems and compatibility with new components over time.

What is your favorite protocol?

SSH HTTP DHCP DNS ...

What is Cache API?

What is the C10K problem? Is it relevant today?

https://idiallo.com/blog/c10k-2016

## 存储

What types of storage are there?

* File * Block * Object

Explain Object Storage

- Data is divided to self-contained objects - Objects can contain metadata

What are the pros and cons of object storage?

Pros: - Usually with object storage, you pay for what you use as opposed to other storage types where you pay for the storage space you allocate - Scalable storage: Object storage mostly based on a model where what you use, is what you get and you can add storage as need Cons: - Usually performs slower than other types of storage - No granular modification: to change an object, you have re-create it

What are some use cases for using object storage?

Explain File Storage

- File Storage used for storing data in files, in a hierarchical structure - Some of the devices for file storage: hard drive, flash drive, cloud-based file storage - Files usually organized in directories

What are the pros and cons of File Storage?

Pros: - Users have full control of their own files and can run variety of operations on the files: delete, read, write and move. - Security mechanism allows for users to have a better control at things such as file locking

What are some examples of file storage?

Local filesystem Dropbox Google Drive

What types of storage devices are there?

Explain IOPS

Explain storage throughput

What is a filesystem?

A file system is a way for computers and other electronic devices to organize and store data files. It provides a structure that helps to organize data into files and directories, making it easier to find and manage information. A file system is crucial for providing a way to store and manage data in an organized manner. Commonly used filed systems: Windows: * NTFS * exFAT Mac OS: * HFS+ *APFS

Explain Dark Data

Explain MBR

## 您可以提问的问题 A list of questions you as a candidate can ask the interviewer during or after the interview. These are only a suggestion, use them carefully. Not every interviewer will be able to answer these (or happy to) which should be perhaps a red flag warning for your regarding working in such place but that's really up to you.

What do you like about working here?

How does the company promote personal growth?

What is the current level of technical debt you are dealing with?

Be careful when asking this question - all companies, regardless of size, have some level of tech debt. Phrase the question in the light that all companies have the deal with this, but you want to see the current pain points they are dealing with
This is a great way to figure how managers deal with unplanned work, and how good they are at setting expectations with projects.

Why I should NOT join you? (or 'what you don't like about working here?')

What was your favorite project you've worked on?

This can give you insights in some of the cool projects a company is working on, and if you would enjoy working on projects like these. This is also a good way to see if the managers are allowing employees to learn and grow with projects outside of the normal work you'd do.

If you could change one thing about your day to day, what would it be?

Similar to the tech debt question, this helps you identify any pain points with the company. Additionally, it can be a great way to show how you'd be an asset to the team.
For Example, if they mention they have problem X, and you've solved that in the past, you can show how you'd be able to mitigate that problem.

Let's say that we agree and you hire me to this position, after X months, what do you expect that I have achieved?

Not only this will tell you what is expected from you, it will also provide big hint on the type of work you are going to do in the first months of your job.

## 测试

Explain white-box testing

Explain black-box testing

What are unit tests?

Unit test are a software testing technique that involves systimatically breaking down a system and testing each individual part of the assembly. These tests are automated and can be run repeatedly to allow developers to catch edge case scenarios or bugs quickly while developing. The main objective of unit tests are to verify each function is producing proper outputs given a set of inputs.

What types of tests would you run to test a web application?

Explain test harness?

What is A/B testing?

What is network simulation and how do you perform it?

What types of performances tests are you familiar with?

Explain the following types of tests: * Load Testing * Stress Testing * Capacity Testing * Volume Testing * Endurance Testing

## Regex Given a text file, perform the following exercises #### 提取

Extract all the numbers

- "\d+"

Extract the first word of each line

- "^\w+" Bonus: extract the last word of each line - "\w+(?=\W*$)" (in most cases, depends on line formatting)

Extract all the IP addresses

- "\b(?:\d{1,3}\ .){3}\d{1,3}\b" IPV4:(This format looks for 1 to 3 digit sequence 3 times)

Extract dates in the format of yyyy-mm-dd or yyyy-dd-mm

Extract email addresses

- "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\ .[A-Za-z]{2,}\b"

#### 替换

Replace tabs with four spaces

Replace 'red' with 'green'

## 系统设计

Explain what a "single point of failure" is.

A "single point of failure", in a system or organization, if it were to fail would cause the entire system to fail or significantly disrupt it's operation. In other words, it is a vulnerability where there is no backup in place to compensate for the failure.

What is CDN?

CDN (Content Delivery Network) responsible for distributing content geographically. Part of it, is what is known as edge locations, aka cache proxies, that allows users to get their content quickly due to cache features and geographical distribution.

Explain Multi-CDN

In single CDN, the whole content is originated from content delivery network.
In multi-CDN, content is distributed across multiple different CDNs, each might be on a completely different provider/cloud.

What are the benefits of Multi-CDN over a single CDN?

* Resiliency: Relying on one CDN means no redundancy. With multiple CDNs you don't need to worry about your CDN being down * Flexibility in Costs: Using one CDN enforces you to specific rates of that CDN. With multiple CDNs you can take into consideration using less expensive CDNs to deliver the content. * Performance: With Multi-CDN there is bigger potential in choosing better locations which more close to the client asking the content * Scale: With multiple CDNs, you can scale services to support more extreme conditions

Explain "3-Tier Architecture" (including pros and cons)

A "3-Tier Architecture" is a pattern used in software development for designing and structuring applications. It divides the application into 3 interconnected layers: Presentation, Business logic and Data storage. PROS: * Scalability * Security * Reusability CONS: * Complexity * Performance overhead * Cost and development time

Explain Mono-repo vs. Multi-repo.What are the cons and pros of each approach?

In a Mono-repo, all the code for an organization is stored in a single,centralized repository. PROS (Mono-repo): * Unified tooling * Code Sharing CONS (Mono-repo): * Increased complexity * Slower cloning In a Multi-repo setup, each component is stored in it's own separate repository. Each repository has it's own version control history. PROS (Multi-repo): * Simpler to manage * Different teams and developers can work on different parts of the project independently, making parallel development easier. CONS (Multi-repo): * Code duplication * Integration challenges

What are the drawbacks of monolithic architecture?

* Not suitable for frequent code changes and the ability to deploy new features * Not designed for today's infrastructure (like public clouds) * Scaling a team to work monolithic architecture is more challenging * If a single component in this architecture fails, then the entire application fails.

What are the advantages of microservices architecture over a monolithic architecture?

* Each of the services individually fail without escalating into an application-wide outage. * Each service can be developed and maintained by a separate team and this team can choose its own tools and coding language

What's a service mesh?

It is a layer that facilitates communication management and control between microservices in a containerized application. It handles tasks such as load balancing, encryption, and monitoring.

Explain "Loose Coupling"

In "Loose Coupling", components of a system communicate with each other with a little understanding of each other's internal workings. This improves scalability and ease of modification in complex systems.

What is a message queue? When is it used?

It is a communication mechanism used in distributed systems to enable asynchronous communication between different components. It is generally used when the systems use a microservices approach.

#### 可伸缩性

Explain Scalability

The ability easily grow in size and capacity based on demand and usage.

Explain Elasticity

The ability to grow but also to reduce based on what is required

Explain Disaster Recovery

Disaster recovery is the process of restoring critical business systems and data after a disruptive event. The goal is to minimize the impact and resume normal business activities quickly. This involves creating a plan, testing it, backing up critical data, and storing it in safe locations. In case of a disaster, the plan is then executed, backups are restored, and systems are hopefully brought back online. The recovery process may take hours or days depending on the damages of infrastructure. This makes business planning important, as a well-designed and tested disaster recovery plan can minimize the impact of a disaster and keep operations going.

Explain Fault Tolerance and High Availability

Fault Tolerance - The ability to self-heal and return to normal capacity. Also the ability to withstand a failure and remain functional. High Availability - Being able to access a resource (in some use cases, using different platforms)

What is the difference between high availability and Disaster Recovery?

[wintellect.com](https://www.wintellect.com/high-availability-vs-disaster-recovery): "High availability, simply put, is eliminating single points of failure and disaster recovery is the process of getting a system back to an operational state when a system is rendered inoperative. In essence, disaster recovery picks up when high availability fails, so HA first."

Explain Vertical Scaling

Vertical Scaling is the process of adding resources to increase power of existing servers. For example, adding more CPUs, adding more RAM, etc.

What are the disadvantages of Vertical Scaling?

With vertical scaling alone, the component still remains a single point of failure. In addition, it has hardware limit where if you don't have more resources, you might not be able to scale vertically.

Which type of cloud services usually support vertical scaling?

Databases, cache. It's common mostly for non-distributed systems.

Explain Horizontal Scaling

Horizontal Scaling is the process of adding more resources that will be able handle requests as one unit

What is the disadvantage of Horizontal Scaling? What is often required in order to perform Horizontal Scaling?

A load balancer. You can add more resources, but if you would like them to be part of the process, you have to serve them the requests/responses. Also, data inconsistency is a concern with horizontal scaling.

Explain in which use cases will you use vertical scaling and in which use cases you will use horizontal scaling

Explain Resiliency and what ways are there to make a system more resilient

Explain "Consistent Hashing"

How would you update each of the services in the following drawing without having app (foo.com) downtime?

What is the problem with the following architecture and how would you fix it?

The load on the producers or consumers may be high which will then cause them to hang or crash.
Instead of working in "push mode", the consumers can pull tasks only when they are ready to handle them. It can be fixed by using a streaming platform like Kafka, Kinesis, etc. This platform will make sure to handle the high load/traffic and pass tasks/messages to consumers only when the ready to get them.

Users report that there is huge spike in process time when adding little bit more data to process as an input. What might be the problem?

How would you scale the architecture from the previous question to hundreds of users?

#### 缓存

What is "cache"? In which cases would you use it?

What is "distributed cache"?

What is a "cache replacement policy"?

Take a look [here](https://en.wikipedia.org/wiki/Cache_replacement_policies)

Which cache replacement policies are you familiar with?

You can find a list [here](https://en.wikipedia.org/wiki/Cache_replacement_policies)

Explain the following cache policies: * FIFO * LIFO * LRU

Read about it [here](https://en.wikipedia.org/wiki/Cache_replacement_policies)

Why not writing everything to cache instead of a database/datastore?

Caching and databases serve different purposes and are optimized for different use cases. Caching is used to speed up read operations by storing frequently accessed data in memory or on a fast storage medium. By keeping data close to the application, caching reduces the latency and overhead of accessing data from a slower, more distant storage system such as a database or disk. On the other hand, databases are optimized for storing and managing persistent data. Databases are designed to handle concurrent read and write operations, enforce consistency and integrity constraints, and provide features such as indexing and querying.

#### 迁移

How you prepare for a migration? (or plan a migration)

You can mention: roll-back & roll-forward cut over dress rehearsals DNS redirection

Explain "Branch by Abstraction" technique

#### 设计一个系统

Can you design a video streaming website?

Can you design a photo upload website?

How would you build a URL shortener?

#### 更多系统设计问题 Additional exercises can be found in [system-design-notebook repository](https://github.com/bregman-arie/system-design-notebook).

## 硬件

What is a CPU?

A central processing unit (CPU) performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program. This contrasts with external components such as main memory and I/O circuitry, and specialized processors such as graphics processing units (GPUs).

What is RAM?

RAM (Random Access Memory) is the hardware in a computing device where the operating system (OS), application programs and data in current use are kept so they can be quickly reached by the device's processor. RAM is the main memory in a computer. It is much faster to read from and write to than other kinds of storage, such as a hard disk drive (HDD), solid-state drive (SSD) or optical drive.

What is a GPU?

A GPU, or Graphics Processing Unit, is a specialized electronic circuit designed to expedite image and video processing for display on a computer screen.

What is an embedded system?

An embedded system is a computer system - a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is embedded as part of a complete device often including electrical or electronic hardware and mechanical parts.

Can you give an example of an embedded system?

A common example of an embedded system is a microwave oven's digital control panel, which is managed by a microcontroller. When committed to a certain goal, Raspberry Pi can serve as an embedded system.

What types of storage are there?

There are several types of storage, including hard disk drives (HDDs), solid-state drives (SSDs), and optical drives (CD/DVD/Blu-ray). Other types of storage include USB flash drives, memory cards, and network-attached storage (NAS).

What are some considerations DevOps teams should keep in mind when selecting hardware for their job?

Choosing the right DevOps hardware is essential for ensuring streamlined CI/CD pipelines, timely feedback loops, and consistent service availability. Here's a distilled guide on what DevOps teams should consider: 1. **Understanding Workloads**: - **CPU**: Consider the need for multi-core or high-frequency CPUs based on your tasks. - **RAM**: Enough memory is vital for activities like large-scale coding or intensive automation. - **Storage**: Evaluate storage speed and capacity. SSDs might be preferable for swift operations. 2. **Expandability**: - **Horizontal Growth**: Check if you can boost capacity by adding more devices. - **Vertical Growth**: Determine if upgrades (like RAM, CPU) to individual machines are feasible. 3. **Connectivity Considerations**: - **Data Transfer**: Ensure high-speed network connections for activities like code retrieval and data transfers. - **Speed**: Aim for low-latency networks, particularly important for distributed tasks. - **Backup Routes**: Think about having backup network routes to avoid downtimes. 4. **Consistent Uptime**: - Plan for hardware backups like RAID configurations, backup power sources, or alternate network connections to ensure continuous service. 5. **System Compatibility**: - Make sure your hardware aligns with your software, operating system, and intended platforms. 6. **Power Efficiency**: - Hardware that uses energy efficiently can reduce costs in long-term, especially in large setups. 7. **Safety Measures**: - Explore hardware-level security features, such as TPM, to enhance protection. 8. **Overseeing & Control**: - Tools like ILOM can be beneficial for remote handling. - Make sure the hardware can be seamlessly monitored for health and performance. 9. **Budgeting**: - Consider both initial expenses and long-term costs when budgeting. 10. **Support & Community**: - Choose hardware from reputable vendors known for reliable support. - Check for available drivers, updates, and community discussions around the hardware. 11. **Planning Ahead**: - Opt for hardware that can cater to both present and upcoming requirements. 12. **Operational Environment**: - **Temperature Control**: Ensure cooling systems to manage heat from high-performance units. - **Space Management**: Assess hardware size considering available rack space. - **Reliable Power**: Factor in consistent and backup power sources. 13. **Cloud Coordination**: - If you're leaning towards a hybrid cloud setup, focus on how local hardware will mesh with cloud resources. 14. **Life Span of Hardware**: - Be aware of the hardware's expected duration and when you might need replacements or upgrades. 15. **Optimized for Virtualization**: - If utilizing virtual machines or containers, ensure the hardware is compatible and optimized for such workloads. 16. **Adaptability**: - Modular hardware allows individual component replacements, offering more flexibility. 17. **Avoiding Single Vendor Dependency**: - Try to prevent reliance on a single vendor unless there are clear advantages. 18. **Eco-Friendly Choices**: - Prioritize sustainably produced hardware that's energy-efficient and environmentally responsible. In essence, DevOps teams should choose hardware that is compatible with their tasks, versatile, gives good performance, and stays within their budget. Furthermore, long-term considerations such as maintenance, potential upgrades, and compatibility with impending technological shifts must be prioritized.

What is the role of hardware in disaster recovery planning and implementation?

Hardware is critical in disaster recovery (DR) solutions. While the broader scope of DR includes things like standard procedures, norms, and human roles, it's the hardware that keeps business processes running smoothly. Here's an outline of how hardware works with DR: 1. **Storing Data and Ensuring Its Duplication**: - **Backup Equipment**: Devices like tape storage, backup servers, and external HDDs keep essential data stored safely at a different location. - **Disk Arrays**: Systems such as RAID offer a safety net. If one disk crashes, the others compensate. 2. **Alternate Systems for Recovery**: - **Backup Servers**: These step in when the main servers falter, maintaining service flow. - **Traffic Distributors**: Devices like load balancers share traffic across servers. If a server crashes, they reroute users to operational ones. 3. **Alternate Operation Hubs**: - **Ready-to-use Centers**: Locations equipped and primed to take charge immediately when the main center fails. - **Basic Facilities**: Locations with necessary equipment but lacking recent data, taking longer to activate. - **Semi-prepped Facilities**: Locations somewhat prepared with select systems and data, taking a moderate duration to activate. 4. **Power Backup Mechanisms**: - **Instant Power Backup**: Devices like UPS offer power during brief outages, ensuring no abrupt shutdowns. - **Long-term Power Solutions**: Generators keep vital systems operational during extended power losses. 5. **Networking Equipment**: - **Backup Internet Connections**: Having alternatives ensures connectivity even if one provider faces issues. - **Secure Connection Tools**: Devices ensuring safe remote access, especially crucial during DR situations. 6. **On-site Physical Setup**: - **Organized Housing**: Structures like racks to neatly store and manage hardware. - **Emergency Temperature Control**: Backup cooling mechanisms to counter server overheating in HVAC malfunctions. 7. **Alternate Communication Channels**: - **Orbit-based Phones**: Handy when regular communication methods falter. - **Direct Communication Devices**: Devices like radios useful when primary systems are down. 8. **Protection Mechanisms**: - **Electronic Barriers & Alert Systems**: Devices like firewalls and intrusion detection keep DR systems safeguarded. - **Physical Entry Control**: Systems controlling entry and monitoring, ensuring only cleared personnel have access. 9. **Uniformity and Compatibility in Hardware**: - It's simpler to manage and replace equipment in emergencies if hardware configurations are consistent and compatible. 10. **Equipment for Trials and Upkeep**: - DR drills might use specific equipment to ensure the primary systems remain unaffected. This verifies the equipment's readiness and capacity to manage real crises. In summary, while software and human interventions are important in disaster recovery operations, it is the hardware that provides the underlying support. It is critical for efficient disaster recovery plans to keep this hardware resilient, duplicated, and routinely assessed.

What is a RAID?

RAID is an acronym that stands for "Redundant Array of Independent Disks." It is a technique that combines numerous hard drives into a single device known as an array in order to improve performance, expand storage capacity, and/or offer redundancy to prevent data loss. RAID levels (for example, RAID 0, RAID 1, and RAID 5) provide varied benefits in terms of performance, redundancy, and storage efficiency.

What is a microcontroller?

A microcontroller is a small integrated circuit that controls certain tasks in an embedded system. It typically includes a CPU, memory, and input/output peripherals.

What is a Network Interface Controller or NIC?

A Network Interface Controller (NIC) is a piece of hardware that connects a computer to a network and allows it to communicate with other devices.

What is a DMA?

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).DMA enables devices to share and receive data from the main memory in a computer. It does this while still allowing the CPU to perform other tasks.

What is a Real-Time Operating Systems?

A real-time operating system (RTOS) is an operating system (OS) for real-time computing applications that processes data and events that have critically defined time constraints. An RTOS is distinct from a time-sharing operating system, such as Unix, which manages the sharing of system resources with a scheduler, data buffers, or fixed task prioritization in a multitasking or multiprogramming environment. Processing time requirements need to be fully understood and bound rather than just kept as a minimum. All processing must occur within the defined constraints. Real-time operating systems are event-driven and preemptive, meaning the OS can monitor the relevant priority of competing tasks, and make changes to the task priority. Event-driven systems switch between tasks based on their priorities, while time-sharing systems switch the task based on clock interrupts.

List of interrupt types

There are six classes of interrupts possible: * External * Machine check * I/O * Program * Restart * Supervisor call (SVC)

## 大数据

Explain what is exactly Big Data

As defined by Doug Laney: * Volume: Extremely large volumes of data * Velocity: Real time, batch, streams of data * Variety: Various forms of data, structured, semi-structured and unstructured * Veracity or Variability: Inconsistent, sometimes inaccurate, varying data

What is DataOps? How is it related to DevOps?

DataOps seeks to reduce the end-to-end cycle time of data analytics, from the origin of ideas to the literal creation of charts, graphs and models that create value. DataOps combines Agile development, DevOps and statistical process controls and applies them to data analytics.

What is Data Architecture?

An answer from [talend.com](https://www.talend.com/resources/what-is-data-architecture): "Data architecture is the process of standardizing how organizations collect, store, transform, distribute, and use data. The goal is to deliver relevant data to people who need it, when they need it, and help them make sense of it."

Explain the different formats of data

* Structured - data that has defined format and length (e.g. numbers, words) * Semi-structured - Doesn't conform to a specific format but is self-describing (e.g. XML, SWIFT) * Unstructured - does not follow a specific format (e.g. images, test messages)

What is a Data Warehouse?

[Wikipedia's explanation on Data Warehouse](https://en.wikipedia.org/wiki/Data_warehouse) [Amazon's explanation on Data Warehouse](https://aws.amazon.com/data-warehouse)

What is Data Lake?

[Data Lake - Wikipedia](https://en.wikipedia.org/wiki/Data_lake)

Can you explain the difference between a data lake and a data warehouse?

What is "Data Versioning"? What models of "Data Versioning" are there?

What is ETL?

#### Apache Hadoop

Explain what is Hadoop

[Apache Hadoop - Wikipedia](https://en.wikipedia.org/wiki/Apache_Hadoop)

Explain Hadoop YARN

Responsible for managing the compute resources in clusters and scheduling users' applications

Explain Hadoop MapReduce

A programming model for large-scale data processing

Explain Hadoop Distributed File Systems (HDFS)

* Distributed file system providing high aggregate bandwidth across the cluster. * For a user it looks like a regular file system structure but behind the scenes it's distributed across multiple machines in a cluster * Typical file size is TB and it can scale and supports millions of files * It's fault tolerant which means it provides automatic recovery from faults * It's best suited for running long batch operations rather than live analysis

What do you know about HDFS architecture?

[HDFS Architecture](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) * Master-slave architecture * Namenode - master, Datanodes - slaves * Files split into blocks * Blocks stored on datanodes * Namenode controls all metadata

## Ceph

Explain what is Ceph

Ceph is an Open-Source Distributed Storage System designed to provide excellent performance, reliability, and scalability. It's often used in cloud computing environments and Data Centers.

True or False? Ceph favor consistency and correctness over performances

True

Which services or types of storage Ceph supports?

* Object (RGW) * Block (RBD) * File (CephFS)

What is RADOS?

* Reliable Autonomic Distributed Object Storage * Provides low-level data object storage service * Strong Consistency * Simplifies design and implementation of higher layers (block, file, object)

Describe RADOS software components

* Monitor * Central authority for authentication, data placement, policy * Coordination point for all other cluster components * Protect critical cluster state with Paxos * Manager * Aggregates real-time metrics (throughput, disk usage, etc.) * Host for pluggable management functions * 1 active, 1+ standby per cluster * OSD (Object Storage Daemon) * Stores data on an HDD or SSD * Services client IO requests

What is the workflow of retrieving data from Ceph?

The work flow is as follows: 1. The client sends a request to the ceph cluster to retrieve data: > **Client could be any of the following** >> * Ceph Block Device >> * Ceph Object Gateway >> * Any third party ceph client 2. The client retrieves the latest cluster map from the Ceph Monitor 3. The client uses the CRUSH algorithm to map the object to a placement group. The placement group is then assigned to a OSD. 4. Once the placement group and the OSD Daemon are determined, the client can retrieve the data from the appropriate OSD

What is the workflow of writing data to Ceph?

The work flow is as follows: 1. The client sends a request to the ceph cluster to retrieve data 2. The client retrieves the latest cluster map from the Ceph Monitor 3. The client uses the CRUSH algorithm to map the object to a placement group. The placement group is then assigned to a Ceph OSD Daemon dynamically. 4. The client sends the data to the primary OSD of the determined placement group. If the data is stored in an erasure-coded pool, the primary OSD is responsible for encoding the object into data chunks and coding chunks, and distributing them to the other OSDs.

What are "Placement Groups"?

Describe in the detail the following: Objects -> Pool -> Placement Groups -> OSDs

What is OMAP?

What is a metadata server? How it works?

## Packer

What is Packer? What is it used for?

In general, Packer automates machine images creation. It allows you to focus on configuration prior to deployment while making the images. This allows you start the instances much faster in most cases.

Packer follows a "configuration->deployment" model or "deployment->configuration"?

A configuration->deployment which has some advantages like: 1. Deployment Speed - you configure once prior to deployment instead of configuring every time you deploy. This allows you to start instances/services much quicker. 2. More immutable infrastructure - with configuration->deployment it's not likely to have very different deployments since most of the configuration is done prior to the deployment. Issues like dependencies errors are handled/discovered prior to deployment in this model.

## 发布

Explain Semantic Versioning

#### [此](https://semver.org/)页面对其进行了完美的解释： Given a version number MAJOR.MINOR.PATCH, increment the: MAJOR version when you make incompatible API changes MINOR version when you add functionality in a backwards compatible manner PATCH version when you make backwards compatible bug fixes #### 作为 MAJOR.MINOR.PATCH 格式的扩展，提供了用于预发布和构建元数据的附加标签。

## 证书 If you are looking for a way to prepare for a certain exam this is the section for you. Here you'll find a list of certificates, each references to a separate file with focused questions that will help you to prepare to the exam. Good luck :) #### AWS * [Cloud Practitioner](certificates/aws-cloud-practitioner.md) (Latest update: 2020) * [Solutions Architect Associate](certificates/aws-solutions-architect-associate.md) (Latest update: 2021) * [Cloud SysOps Administration Associate](certificates/aws-cloud-sysops-associate.md) (Latest update: Oct 2022) #### Azure * [AZ-900](certificates/azure-fundamentals-az-900.md) (Latest update: 2021) #### Kubernetes * [Certified Kubernetes Administrator (CKA)](topics/kubernetes/CKA.md) (Latest update: 2022) ## 其他 DevOps 和 SRE 项目

## 致谢 Thanks to all of our amazing [contributors](https://github.com/bregman-arie/devops-exercises/graphs/contributors) who make it easy for everyone to learn new things :) Logos credits can be found [here](credits.md) ## 许可证 [![License: CC BY-NC-ND 3.0](https://img.shields.io/badge/License-CC%20BY--NC--ND%203.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-nd/3.0/) ```

标签：Ansible, Awesome, AWS, Azure, DNS, Docker, DPI, ECS, GCP, Git, Jenkins, OpenStack, Python, SQL, SRE, Terraform, xlsx, 人体姿态估计, 偏差过滤, 多线程, 子域名突变, 学习资源, 安全防御评估, 无后门, 日志审计, 漏洞利用检测, 监控, 硬件, 系统审计, 系统提示词, 系统管理, 编程, 网络, 网络安全研究, 网络调试, 自动化, 自定义请求头, 虚拟化, 请求拦截, 软件开发, 运维, 逆向工具, 面试题, 题库