leandromoreira/ffmpeg-libav-tutorial

GitHub: leandromoreira/ffmpeg-libav-tutorial

一份由浅入深的 FFmpeg libav 编程教程，从音视频基础概念讲到转封装、转码、自适应流媒体的实际代码实现。

Stars: 11011 | Forks: 1018

[🇨🇳](/README-cn.md "Simplified Chinese") [🇰🇷](/README-ko.md "Korean") [🇪🇸](/README-es.md "Spanish") [🇻🇳](/README-vn.md "Vietnamese") [🇧🇷](/README-pt.md "Portuguese") [🇷🇺](/README-ru.md "Russian") [![license](https://img.shields.io/badge/license-BSD--3--Clause-blue.svg)](https://img.shields.io/badge/license-BSD--3--Clause-blue.svg) 我一直在寻找一本能够教会我如何开始将 [FFmpeg](https://www.ffmpeg.org/) 作为库（又称 libav）使用的教程或书籍，然后我发现了 [“如何在 1k 行代码内编写一个视频播放器”](http://dranger.com/ffmpeg/) 这篇教程。不幸的是它已经过时了，所以我决定写这篇教程。这里的大部分代码将使用 C 语言**但别担心**：你可以轻松理解并将其应用到你喜欢的语言中。 FFmpeg libav 有许多语言的绑定，例如 [python](https://pyav.org/)、[go](https://github.com/imkira/go-libav)，即使你的语言没有绑定，你仍然可以通过 `ffi` 来支持它（这里有一个 [Lua](https://github.com/daurnimator/ffmpeg-lua-ffi/blob/master/init.lua) 的例子）。我们将从关于什么是视频、音频、编解码器和容器的快速课程开始，然后我们将进行关于如何使用 `FFmpeg` 命令行的速成课程，最后我们将编写代码，你可以随意直接跳到[](http://newmediarockstars.com/wp-content/uploads/2015/11/nintendo-direct-iwata.jpg) [Learn FFmpeg libav the Hard Way](#learn-ffmpeg-libav-the-hard-way) 这一节。有些人常说互联网视频流媒体是传统电视的未来，无论如何，FFmpeg 都是值得学习的东西。 __目录__ * [简介](#intro) * [视频 - 你所看到的！](#video---what-you-see) * [音频 - 你所听到的！](#audio---what-you-listen) * [编解码器 (codec) - 压缩数据](#codec---shrinking-data) * [容器 - 音频和视频的舒适之所](#container---a-comfy-place-for-audio-and-video) * [FFmpeg - 命令行](#ffmpeg---command-line) * [FFmpeg 命令行工具 101](#ffmpeg-command-line-tool-101) * [常见视频操作](#common-video-operations) * [转码](#transcoding) * [转封装](#transmuxing) * [转码率](#transrating) * [转分辨率](#transsizing) * [额外奖励：自适应流媒体](#bonus-round-adaptive-streaming) * [更上一层楼](#going-beyond) * [Learn FFmpeg libav the Hard Way](#learn-ffmpeg-libav-the-hard-way) * [第 0 章 - 臭名昭著的 hello world](#chapter-0---the-infamous-hello-world) * [FFmpeg libav 架构](#ffmpeg-libav-architecture) * [第 1 章 - 时间同步](#chapter-1---syncing-audio-and-video) * [第 2 章 - 重封装](#chapter-2---remuxing) * [第 3 章 - 转码](#chapter-3---transcoding) # 简介 ## 视频 - 你所看到的！如果你有一组连续的图像并以给定的频率（比如[每秒 24 张图像](https://www.filmindependent.org/blog/hacking-film-24-frames-per-second/)）切换它们，你就会创造一种[运动错觉](https://en.wikipedia.org/wiki/Persistence_of_vision)。总之，这就是视频背后最基本的理念：**以给定速率运行的一系列图片/帧**。 flip book

Zeitgenössische Illustration (1886) ## 音频 - 你所听到的！虽然静音视频可以表达多种情感，但加入声音会给体验带来更多乐趣。声音是以压力波的形式传播的振动，通过空气或任何其他传输介质（如气体、液体或固体）传播。 ![音频模拟到数字](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c7/CPT-Sound-ADC-DAC.svg/640px-CPT-Sound-ADC-DAC.svg.png "audio analog to digital") ## 编解码器 (codec) - 压缩数据但是如果我们选择将数百万张图像打包到一个文件中并称之为电影，我们可能会得到一个巨大的文件。让我们算一下：假设我们要创建一个分辨率为 `1080 x 1920`（高 x 宽）的视频，并且我们将为每个像素（屏幕上的最小点）花费 `3 bytes` 来编码颜色（或[24 位颜色](https://en.wikipedia.org/wiki/Color_depth#True_color_.2824-bit.29)），这给了我们 16,777,216 种不同的颜色），并且这个视频以 `24 frames per second` 运行，时长为 `30 minutes`。 ``` toppf = 1080 * 1920 //total_of_pixels_per_frame cpp = 3 //cost_per_pixel tis = 30 * 60 //time_in_seconds fps = 24 //frames_per_second required_storage = tis * fps * toppf * cpp ``` 这个视频大约需要 `250.28GB` 的存储空间或 `1.19 Gbps` 的带宽！这就是为什么我们需要使用 [CODEC](https://github.com/leandromoreira/digital_video_introduction#how-does-a-video-codec-work)。 ## 容器 - 音频和视频的舒适之所一个**包含所有流**（主要是音频和视频）的单一文件，它还提供**同步和通用元数据**，例如标题、分辨率等。通常我们可以通过查看文件的扩展名来推断其格式：例如，一个 `video.webm` 可能是一个使用 [`webm`](https://www.webmproject.org/) 容器的视频。 ![container](/img/container.png) # FFmpeg - 命令行为了处理多媒体，我们可以使用名为 [FFmpeg](https://www.ffmpeg.org/) 的绝佳工具/库。你很可能已经在直接或间接地使用它（你使用 [Chrome 吗？](https://www.chromium.org/developers/design-documents/video)）。它有一个名为 `ffmpeg` 的命令行程序，一个非常简单但强大的二进制文件。例如，你可以通过输入以下命令将 `mp4` 转换为容器 `avi`： ``` $ ffmpeg -i input.mp4 output.avi ``` 我们刚刚在这里做了一个**重封装**，即从一个容器转换到另一个容器。从技术上讲，FFmpeg 也可能在做一个转码，但我们稍后再讨论这个。 ## FFmpeg 命令行工具 101 FFmpeg 确实有一份[文档](https://www.ffmpeg.org/ffmpeg.html)，很好地解释了它是如何工作的。 ``` # 你也可以使用命令行查找文档 ffmpeg -h full | grep -A 10 -B 10 avoid_negative_ts ``` 简而言之，FFmpeg 命令行程序期望以下参数格式来执行其操作 `ffmpeg {1} {2} -i {3} {4} {5}`，其中： 1. 全局选项 2. 输入文件选项 3. 输入 url 4. 输出文件选项 5. 输出 url 第 2、3、4 和 5 部分可以根据需要设置任意多个。通过实际操作更容易理解这个参数格式： ``` # WARNING: 此文件约为 300MB $ wget -O bunny_1080p_60fps.mp4 http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_60fps_normal.mp4 $ ffmpeg \ -y \ # global options -c:a libfdk_aac \ # input options -i bunny_1080p_60fps.mp4 \ # input url -c:v libvpx-vp9 -c:a libvorbis \ # output options bunny_1080p_60fps_vp9.webm # output url ``` 此命令获取一个包含两个流（一个使用 `aac` CODEC 编码的音频和一个使用 `h264` CODEC 编码的视频）的输入文件 `mp4`，并将其转换为 `webm`，同时更改其音频和视频 CODEC。我们可以简化上面的命令，但请注意 FFmpeg 会为你采用或猜测默认值。例如，当你只输入 `ffmpeg -i input.avi output.mp4` 时，它使用什么音频/视频 CODEC 来生成 `output.mp4`？ Werner Robitza 编写了一篇必读/必执行的[关于使用 FFmpeg 进行编码和编辑的教程](http://slhck.info/ffmpeg-encoding-course/#/)。 # 常见视频操作在处理音频/视频时，我们通常会执行一组针对媒体的任务。 ## 转码 ![transcoding](/img/transcoding.png) **是什么？** 将流（音频或视频）之一从一种 CODEC 转换为另一种的行为。 **为什么？** 有时某些设备（电视、智能手机、游戏机等）不支持 X 但支持 Y，而且较新的 CODEC 提供更好的压缩率。 **怎么做？** 将 `H264` (AVC) 视频转换为 `H265` (HEVC)。 ``` $ ffmpeg \ -i bunny_1080p_60fps.mp4 \ -c:v libx265 \ bunny_1080p_60fps_h265.mp4 ``` ## 转封装 ![transmuxing](/img/transmuxing.png) **是什么？** 从一种格式（容器）转换为另一种的行为。 **为什么？** 有时某些设备（电视、智能手机、游戏机等）不支持 X 但支持 Y，有时较新的容器提供现代所需的功能。 **怎么做？** 将 `mp4` 转换为 `ts`。 ``` $ ffmpeg \ -i bunny_1080p_60fps.mp4 \ -c copy \ # just saying to ffmpeg to skip encoding bunny_1080p_60fps.ts ``` ## 转码率 ![transrating](/img/transrating.png) **是什么？** 更改码率，或生成其他版本的行为。 **为什么？** 人们会尝试在性能较差的智能手机上通过 `2G` (edge) 连接观看你的视频，或者在 4K 电视上通过 `fiber` 互联网连接观看，因此你应该提供具有不同码率的同一视频的多个版本。 **怎么做？** 生成码率在 964K 到 3856K 之间的版本。 ``` $ ffmpeg \ -i bunny_1080p_60fps.mp4 \ -minrate 964K -maxrate 3856K -bufsize 2000K \ bunny_1080p_60fps_transrating_964_3856.mp4 ``` 通常我们会将转码率与转分辨率结合使用。Werner Robitza 还写了另一篇必读/必执行的[关于 FFmpeg 码率控制的系列文章](http://slhck.info/posts/)。 ## 转分辨率 ![transsizing](/img/transsizing.png) **是什么？** 从一种分辨率转换为另一种的行为。如前所述，转分辨率通常与转码率一起使用。 **为什么？** 原因与转码率大致相同。 **怎么做？** 将 `1080p` 分辨率转换为 `480p`。 ``` $ ffmpeg \ -i bunny_1080p_60fps.mp4 \ -vf scale=480:-1 \ bunny_1080p_60fps_transsizing_480.mp4 ``` ## 额外奖励：自适应流媒体 ![adaptive streaming](/img/adaptive-streaming.png) **是什么？** 生成多种分辨率（码率）并将媒体分割成块并通过 http 提供服务的行为。 **为什么？** 为了提供可以在低端智能手机或 4K 电视上观看的灵活媒体，它也易于扩展和部署，但可能会增加延迟。 **怎么做？** 使用 DASH 创建自适应 WebM。 ``` # video streams $ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 160x90 -b:v 250k -keyint_min 150 -g 150 -an -f webm -dash 1 video_160x90_250k.webm $ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 320x180 -b:v 500k -keyint_min 150 -g 150 -an -f webm -dash 1 video_320x180_500k.webm $ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 640x360 -b:v 750k -keyint_min 150 -g 150 -an -f webm -dash 1 video_640x360_750k.webm $ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 640x360 -b:v 1000k -keyint_min 150 -g 150 -an -f webm -dash 1 video_640x360_1000k.webm $ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 1280x720 -b:v 1500k -keyint_min 150 -g 150 -an -f webm -dash 1 video_1280x720_1500k.webm # audio streams $ ffmpeg -i bunny_1080p_60fps.mp4 -c:a libvorbis -b:a 128k -vn -f webm -dash 1 audio_128k.webm # the DASH manifest $ ffmpeg \ -f webm_dash_manifest -i video_160x90_250k.webm \ -f webm_dash_manifest -i video_320x180_500k.webm \ -f webm_dash_manifest -i video_640x360_750k.webm \ -f webm_dash_manifest -i video_640x360_1000k.webm \ -f webm_dash_manifest -i video_1280x720_500k.webm \ -f webm_dash_manifest -i audio_128k.webm \ -c copy -map 0 -map 1 -map 2 -map 3 -map 4 -map 5 \ -f webm_dash_manifest \ -adaptation_sets "id=0,streams=0,1,2,3,4 id=1,streams=5" \ manifest.mpd ``` PS：我从 [Instructions to playback Adaptive WebM using DASH](http://wiki.webmproject.org/adaptive-streaming/instructions-to-playback-adaptive-webm-using-dash) 偷来了这个例子 ## 更上一层楼 FFmpeg 还有[许多许多其他用途](https://github.com/leandromoreira/digital_video_introduction/blob/master/encoding_pratical_examples.md#split-and-merge-smoothly)。我将它与 *iMovie* 结合使用，为 YouTube 制作/编辑一些视频，你当然也可以专业地使用它。 # Learn FFmpeg libav the Hard Way 既然 [FFmpeg](#ffmpeg---command-line) 作为一个命令行工具对媒体文件执行基本任务非常有用，我们如何在我们自己的程序中使用它呢？ FFmpeg [由几个库组成](https://www.ffmpeg.org/doxygen/trunk/index.html)，可以集成到我们自己的程序中。通常，当你安装 FFmpeg 时，它会自动安装所有这些库。我将把这些库的集合称为 **FFmpeg libav**。 ## 第 0 章 - 臭名昭著的 hello world 这个 hello world 实际上不会在终端中显示消息 `"hello world"` :tongue: 相反，我们将**打印有关视频的信息**，例如其格式（容器）、持续时间、分辨率、音频通道，最后，我们将**解码一些帧并将它们保存为图像文件**。 ### FFmpeg libav 架构但在我们开始编写代码之前，让我们了解一下 **FFmpeg libav 架构** 是如何工作的，以及它的组件如何与其他组件通信。这是解码视频过程的图表： ![ffmpeg libav architecture - decoding process](/img/decoding.png) 你首先需要将媒体文件加载到名为 [`AVFormatContext`](https://ffmpeg.org/doxygen/trunk/structAVFormatContext.html) 的组件中（视频容器也称为格式）。实际上它并没有完全加载整个文件：它通常只读取头部。一旦我们加载了容器的最小**头部**，我们就可以访问它的流（将它们视为基本的音频和视频数据）。每个流都将在名为 [`AVStream`](https://ffmpeg.org/doxygen/trunk/structAVStream.html) 的组件中可用。假设我们的视频有两个流：一个使用 [AAC CODEC](https://en.wikipedia.org/wiki/Advanced_Audio_Coding) 编码的音频和一个使用 [H264 (AVC) CODEC](https://en.wikipedia.org/wiki/H.264/MPEG-4_AVC) 编码的视频。我们可以从每个流中提取称为数据包的数据片段，这些数据包将被加载到名为 [`AVPacket`](https://ffmpeg.org/doxygen/trunk/structAVPacket.html) 的组件中。 **数据包内的数据仍然被编码**（压缩），为了解码数据包，我们需要将它们传递给特定的 [`AVCodec`](https://ffmpeg.org/doxygen/trunk/structAVCodec.html)。 `AVCodec` 会将它们解码为 [`AVFrame`](https://ffmpeg.org/doxygen/trunk/structAVFrame.html)，最后，该组件为我们提供**未压缩的帧**。请注意，音频和视频流使用相同的术语/过程。 ### 要求由于有些人在[编译或运行示例时遇到问题](https://github.com/leandromoreira/ffmpeg-libav-tutorial/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+compiling)，**我们将使用 [`Docker`](https://docs.docker.com/install/) 作为我们的开发/运行环境**，我们还将使用 big buck bunny 视频，所以如果你本地没有它，只需运行命令 `make fetch_small_bunny_video`。 ### 第 0 章 - 代码演练我们将跳过一些细节，但别担心：[源代码可在 github 上找到](/0_hello_world.c)。我们将为组件 [`AVFormatContext`](http://ffmpeg.org/doxygen/trunk/structAVFormatContext.html) 分配内存，该组件将保存有关格式（容器）的信息。 ``` AVFormatContext *pFormatContext = avformat_alloc_context(); ``` 现在我们将打开文件并读取其头部，并用有关格式的最小信息填充 `AVFormatContext`（请注意，通常编解码器未打开）。用于执行此操作的函数是 [`avformat_open_input`](http://ffmpeg.org/doxygen/trunk/group__lavf__decoding.html#ga31d601155e9035d5b0e7efedc894ee49)。它期望一个 `AVFormatContext`、一个 `filename` 和两个可选参数：[`AVInputFormat`](https://ffmpeg.org/doxygen/trunk/structAVInputFormat.html)（如果你传递 `NULL`，FFmpeg 将猜测格式）和 [`AVDictionary`](https://ffmpeg.org/doxygen/trunk/structAVDictionary.html)（这是解复用器的选项）。 _BLOCK_10/> 我们可以打印格式名称和媒体持续时间： ``` printf("Format %s, duration %lld us", pFormatContext->iformat->long_name, pFormatContext->duration); ``` 要访问 `streams`，我们需要从媒体读取数据。函数 [`avformat_find_stream_info`](https://ffmpeg.org/doxygen/trunk/group__lavf__decoding.html#gad42172e27cddafb81096939783b157bb) 执行此操作。现在，`pFormatContext->nb_streams` 将保存流的数量，而 `pFormatContext->streams[i]` 将为我们提供第 `i` 个流（一个 [`AVStream`](https://ffmpeg.org/doxygen/trunk/structAVStream.html))。 ``` avformat_find_stream_info(pFormatContext, NULL); ``` 现在我们将遍历所有流。 ``` for (int i = 0; i < pFormatContext->nb_streams; i++) { // } ``` 对于每个流，我们将保留 [`AVCodecParameters`](https://ffmpeg.org/doxygen/trunk/structAVCodecParameters.html)，它描述了流 `i` 所使用的编解码器的属性。 ``` AVCodecParameters *pLocalCodecParameters = pFormatContext->streams[i]->codecpar; ``` 利用编解码器属性，我们可以查询函数 [`avcodec_find_decoder`](https://ffmpeg.org/doxygen/trunk/group__lavc__decoding.html#ga19a0ca553277f019dd5b0fec6e1f9dca) 来查找适当的 CODEC，并找到该编解码器 id 的已注册解码器，并返回一个 [`AVCodec`](http://ffmpeg.org/doxygen/trunk/structAVCodec.html)，该组件知道如何 en**CO**de（编码）和 **DEC**ode（解码）该流。 ``` AVCodec *pLocalCodec = avcodec_find_decoder(pLocalCodecParameters->codec_id); ``` 现在我们可以打印有关编解码器的信息。 ``` // specific for video and audio if (pLocalCodecParameters->codec_type == AVMEDIA_TYPE_VIDEO) { printf("Video Codec: resolution %d x %d", pLocalCodecParameters->width, pLocalCodecParameters->height); } else if (pLocalCodecParameters->codec_type == AVMEDIA_TYPE_AUDIO) { printf("Audio Codec: %d channels, sample rate %d", pLocalCodecParameters->channels, pLocalCodecParameters->sample_rate); } // general printf("\tCodec %s ID %d bit_rate %lld", pLocalCodec->long_name, pLocalCodec->id, pLocalCodecParameters->bit_rate); ``` 有了编解码器，我们可以为 [`AVCodecContext`](https://ffmpeg.org/doxygen/trunk/structAVCodecContext.html) 分配内存，该组件将保存我们解码/编码过程的上下文，但我们需要用 CODEC 参数填充此编解码器上下文；我们使用 [`avcodec_parameters_to_context`](https://ffmpeg.org/doxygen/trunk/group__lavc__core.html#gac7b282f51540ca7a99416a3ba6ee0d16) 来做到这一点。填充编解码器上下文后，我们需要打开编解码器。我们调用函数 [`avcodec_open2`](https://ffmpeg.org/doxygen/trunk/group__lavc__core.html#ga11f785a188d7d9df71621001465b0f1d)，然后就可以使用它了。 ``` AVCodecContext *pCodecContext = avcodec_alloc_context3(pCodec); avcodec_parameters_to_context(pCodecContext, pCodecParameters); avcodec_open2(pCodecContext, pCodec, NULL); ``` 现在我们将从流中读取数据包并将它们解码为帧，但首先，我们需要为这两个组件分配内存，即 [`AVPacket`](https://ffmpeg.org/doxygen/trunk/structAVPacket.html) 和 [`AVFrame`](https://ffmpeg.org/doxygen/trunk/structAVFrame.html)。 ``` AVPacket *pPacket = av_packet_alloc(); AVFrame *pFrame = av_frame_alloc(); ``` 让我们使用函数 [`av_read_frame`](https://ffmpeg.org/doxygen/trunk/group__lavf__decoding.html#ga4fdb3084415a82e3810de6ee60e46a61) 从流中获取数据包，只要它有数据包。 ``` while (av_read_frame(pFormatContext, pPacket) >= 0) { //... } ``` 让我们通过编解码器上下文，使用函数 [`avcodec_send_packet`](https://ffmpeg.org/doxygen/trunk/group__lavc__decoding.html#ga58bc4bf1e0ac59e27362597e467efff3) 将**原始数据包**（压缩帧）发送到解码器。 ``` avcodec_send_packet(pCodecContext, pPacket); ``` 让我们通过相同的编解码器上下文，使用函数 [`avcodec_receive_frame`](https://ffmpeg.org/doxygen/trunk/group__lavc__decoding.html#ga11e6542c4e66d3028668788a1a74217c) 从解码器接收**原始数据帧**（未压缩帧）。 ``` avcodec_receive_frame(pCodecContext, pFrame); ``` 我们可以打印帧号、[PTS](https://en.wikipedia.org/wiki/Presentation_timestamp)、DTS、[帧类型](https://en.wikipedia.org/wiki/Video_compression_picture_types) 等。 ``` printf( "Frame %c (%d) pts %d dts %d key_frame %d [coded_picture_number %d, display_picture_number %d]", av_get_picture_type_char(pFrame->pict_type), pCodecContext->frame_number, pFrame->pts, pFrame->pkt_dts, pFrame->key_frame, pFrame->coded_picture_number, pFrame->display_picture_number ); ``` 最后，我们可以将解码后的帧保存为[简单的灰度图像](https://en.wikipedia.org/wiki/Netpbm_format#PGM_example)。过程非常简单，我们将使用 `pFrame->data`，其中的索引与[平面 Y、Cb 和 Cr](https://en.wikipedia.org/wiki/YCbCr) 相关，我们只选择了 `0` (Y) 来保存我们的灰度图像。 ``` save_gray_frame(pFrame->data[0], pFrame->linesize[0], pFrame->width, pFrame->height, frame_filename); static void save_gray_frame(unsigned char *buf, int wrap, int xsize, int ysize, char *filename) { FILE *f; int i; f = fopen(filename,"w"); // writing the minimal required header for a pgm file format // portable graymap format -> https://en.wikipedia.org/wiki/Netpbm_format#PGM_example fprintf(f, "P5\n%d %d\n%d\n", xsize, ysize, 255); // writing line by line for (i = 0; i < ysize; i++) fwrite(buf + i * wrap, 1, xsize, f); fclose(f); } ``` 瞧！现在我们有了一张 2MB 的灰度图像： ![saved frame](/img/generated_frame.png) ## 第 1 章 - 同步音频和视频在我们转到[编写转码示例](#chapter-2---transcoding)之前，让我们谈谈**时间**，或者视频播放器如何知道播放帧的正确时间。在上一个示例中，我们保存了一些可以在这里看到的帧： ![frame 0](/img/hello_world_frames/frame0.png) ![frame 1](/img/hello_world_frames/frame1.png) ![frame 2](/img/hello_world_frames/frame2.png) ![frame 3](/img/hello_world_frames/frame3.png) ![frame 4](/img/hello_world_frames/frame4.png) ![frame 5](/img/hello_world_frames/frame5.png) 当我们设计视频播放器时，我们需要**以给定的速度播放每一帧**，否则很难愉快地观看视频，要么因为它播放得太快，要么太慢。因此，我们需要引入一些逻辑来平滑地播放每一帧。就此而言，每一帧都有一个**显示时间戳 (PTS)**，它是一个以**时间基**（timebase）为因子的递增数字，该时间基是一个有理数（其中分母称为**时间刻度**（timescale）），可以被**帧率 (fps)** 整除。当我们看一些例子时更容易理解，让我们模拟一些场景。对于 `fps=60/1` 和 `timebase=1/60000`，每个 PTS 将增加 `timescale / fps = 1000`，因此每一帧的 **PTS 实时时间**可能是（假设它从 0 开始）： * `frame=0, PTS = 0, PTS_TIME = 0` * `frame=1, PTS = 1000, PTS_TIME = PTS * timebase = 0.016` * `frame=2, PTS = 2000, PTS_TIME = PTS * timebase = 0.033` 对于几乎相同的场景，但时间基等于 `1/60`。 * `frame=0, PTS = 0, PTS_TIME = 0` * `frame=1, PTS = 1, PTS_TIME = PTS * timebase = 0.016` * `frame=2, PTS = 2, PTS_TIME = PTS * timebase = 0.033` * `frame=3, PTS = 3, PTS_TIME = PTS * timebase = 0.050` 对于 `fps=25/1` 和 `timebase=1/75`，每个 PTS 将增加 `timescale / fps = 3`，PTS 时间可能是： * `frame=0, PTS = 0, PTS_TIME = 0` * `frame=1, PTS = 3, PTS_TIME = PTS * timebase = 0.04` * `frame=2, PTS = 6, PTS_TIME = PTS * timebase = 0.08` * `frame=3, PTS = 9, PTS_TIME = PTS * timebase = 0.12` * ... * `frame=24, PTS = 72, PTS_TIME = PTS * timebase = 0.96` * ... * `frame=4064, PTS = 12192, PTS_TIME = PTS * timebase = 162.56` 现在有了 `pts_time`，我们可以找到一种方法来渲染此视频，使其与音频 `pts_time` 或系统时钟同步。FFmpeg libav 通过其 API 提供这些信息： - fps = [`AVStream->avg_frame_rate`](https://ffmpeg.org/doxygen/trunk/structAVStream.html#a946e1e9b89eeeae4cab8a833b482c1ad) - tbr = [`AVStream->r_frame_rate`](https://ffmpeg.org/doxygen/trunk/structAVStream.html#ad63fb11cc1415e278e09ddc676e8a1ad) - tbn = [`AVStream->time_base`](https://ffmpeg.org/doxygen/trunk/structAVStream.html#a9db755451f14e2bf590d4b85d82b32e6) 出于好奇，我们保存的帧是按 DTS 顺序发送的（帧：1,6,4,2,3,5），但按 PTS 顺序播放（帧：1,2,3,4,5）。另外，注意与 P 帧或 I 帧相比，B 帧是多么廉价。 ``` LOG: AVStream->r_frame_rate 60/1 LOG: AVStream->time_base 1/60000 ... LOG: Frame 1 (type=I, size=153797 bytes) pts 6000 key_frame 1 [DTS 0] LOG: Frame 2 (type=B, size=8117 bytes) pts 7000 key_frame 0 [DTS 3] LOG: Frame 3 (type=B, size=8226 bytes) pts 8000 key_frame 0 [DTS 4] LOG: Frame 4 (type=B, size=17699 bytes) pts 9000 key_frame 0 [DTS 2] LOG: Frame 5 (type=B, size=6253 bytes) pts 10000 key_frame 0 [DTS 5] LOG: Frame 6 (type=P, size=34992 bytes) pts 11000 key_frame 0 [DTS 1] ``` ## 第 2 章 - 重封装重封装是从一种格式（容器）更改为另一种的行为，例如，我们可以使用 FFmpeg 将 [MPEG-4](https://en.wikipedia.org/wiki/MPEG-4_Part_14) 视频更改为 [MPEG-TS](https://en.wikipedia.org/wiki/MPEG_transport_stream) 视频，而没有太多痛苦： ``` ffmpeg input.mp4 -c copy output.ts ``` 它将解复用 mp4，但不会对其进行解码或编码（`-c copy`），最后，它将其复用为 `mpegts` 文件。如果你不提供格式 `-f`，ffmpeg 将尝试根据文件的扩展名进行猜测。 FFmpeg 或 libav 的一般用法遵循一种模式/架构或工作流： * **[协议层](https://ffmpeg.org/doxygen/trunk/protocols_8c.html)** - 它接受一个 `input`（例如一个 `file`，但也可以是 `rtmp` 或 `HTTP` 输入） * **[格式层](https://ffmpeg.org/doxygen/trunk/group__libavf.html)** - 它 `demuxes`（解复用）其内容，主要显示元数据及其流 * **[编解码器层](https://ffmpeg.org/doxygen/trunk/group__libavc.html)** - 它 `decodes`（解码）其压缩流数据 ^*可选* * **[像素层](https://ffmpeg.org/doxygen/trunk/group__lavfi.html)** - 它还可以对原始帧应用一些 `filters`（如调整大小）^*可选* * 然后它执行反向路径 * **[编解码器层](https://ffmpeg.org/doxygen/trunk/group__libavc.html)** - 它 `encodes`（编码）（或 `re-encodes` 或甚至 `transcodes`）原始帧^*可选* * **[格式层](https://ffmpeg.org/doxygen/trunk/group__libavf.html)** - 它 `muxes`（复用）（或 `remuxes`）原始流（压缩数据） * **[协议层](https://ffmpeg.org/doxygen/trunk/protocols_8c.html)** - 最后，复用后的数据被发送到 `output`（另一个文件或者可能是网络远程服务器） ![ffmpeg libav workflow](/img/ffmpeg_libav_workflow.jpeg) 现在让我们使用 libav 编写一个示例，以提供与 `ffmpeg input.mp4 -c copy output.ts` 相同的效果。我们将从一个输入（`input_format_context`）读取并将其更改为另一个输出（`output_format_context`）。 ``` AVFormatContext *input_format_context = NULL; AVFormatContext *output_format_context = NULL; ``` 我们开始执行常规的内存分配并打开输入格式。对于这个特定情况，我们将打开一个输入文件并为输出文件分配内存。 ``` if ((ret = avformat_open_input(&input_format_context, in_filename, NULL, NULL)) < 0) { fprintf(stderr, "Could not open input file '%s'", in_filename); goto end; } if ((ret = avformat_find_stream_info(input_format_context, NULL)) < 0) { fprintf(stderr, "Failed to retrieve input stream information"); goto end; } avformat_alloc_output_context2(&output_format_context, NULL, NULL, out_filename); if (!output_format_context) { fprintf(stderr, "Could not create output context\n"); ret = AVERROR_UNKNOWN; goto end; } ``` 我们将只重封装视频、音频和字幕类型的流，因此我们将要使用的流保存到一个索引数组中。 ``` number_of_streams = input_format_context->nb_streams; streams_list = av_mallocz_array(number_of_streams, sizeof(*streams_list)); ``` 在分配了所需的内存之后，我们将遍历所有流，对于每个流，我们需要使用 [avformat_new_stream](https://ffmpeg.org/doxygen/trunk/group__lavf__core.html#gadcb0fd3e507d9b58fe78f61f8ad39827) 函数在我们的输出格式上下文中创建新的输出流。请注意，我们正在标记所有不是视频、音频或字幕的流，以便我们可以在之后跳过它们。 ``` for (i = 0; i < input_format_context->nb_streams; i++) { AVStream *out_stream; AVStream *in_stream = input_format_context->streams[i]; AVCodecParameters *in_codecpar = in_stream->codecpar; if (in_codecpar->codec_type != AVMEDIA_TYPE_AUDIO && in_codecpar->codec_type != AVMEDIA_TYPE_VIDEO && in_codecpar->codec_type != AVMEDIA_TYPE_SUBTITLE) { streams_list[i] = -1; continue; } streams_list[i] = stream_index++; out_stream = avformat_new_stream(output_format_context, NULL); if (!out_stream) { fprintf(stderr, "Failed allocating output stream\n"); ret = AVERROR_UNKNOWN; goto end; } ret = avcodec_parameters_copy(out_stream->codecpar, in_codecpar); if (ret < 0) { fprintf(stderr, "Failed to copy codec parameters\n"); goto end; } } ``` 现在我们可以创建输出文件。 ``` if (!(output_format_context->oformat->flags & AVFMT_NOFILE)) { ret = avio_open(&output_format_context->pb, out_filename, AVIO_FLAG_WRITE); if (ret < 0) { fprintf(stderr, "Could not open output file '%s'", out_filename); goto end; } } ret = avformat_write_header(output_format_context, NULL); if (ret < 0) { fprintf(stderr, "Error occurred when opening output file\n"); goto end; } ``` 之后，我们可以将数据包逐个从输入流复制到输出流。我们将在有数据包时循环（`av_read_frame`），对于每个数据包，我们需要重新计算 PTS 和 DTS，以便最终将其写入（`av_interleaved_write_frame`）我们的输出格式上下文。 ``` while (1) { AVStream *in_stream, *out_stream; ret = av_read_frame(input_format_context, &packet); if (ret < 0) break; in_stream = input_format_context->streams[packet.stream_index]; if (packet.stream_index >= number_of_streams || streams_list[packet.stream_index] < 0) { av_packet_unref(&packet); continue; } packet.stream_index = streams_list[packet.stream_index]; out_stream = output_format_context->streams[packet.stream_index]; /* copy packet */ packet.pts = av_rescale_q_rnd(packet.pts, in_stream->time_base, out_stream->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX); packet.dts = av_rescale_q_rnd(packet.dts, in_stream->time_base, out_stream->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX); packet.duration = av_rescale_q(packet.duration, in_stream->time_base, out_stream->time_base); // https://ffmpeg.org/doxygen/trunk/structAVPacket.html#ab5793d8195cf4789dfb3913b7a693903 packet.pos = -1; //https://ffmpeg.org/doxygen/trunk/group__lavf__encoding.html#ga37352ed2c63493c38219d935e71db6c1 ret = av_interleaved_write_frame(output_format_context, &packet); if (ret < 0) { fprintf(stderr, "Error muxing packet\n"); break; } av_packet_unref(&packet); } ``` 最后，我们需要使用 [av_write_trailer](https://ffmpeg.org/doxygen/trunk/group__lavf__encoding.html#ga7f14007e7dc8f481f054b21614dfec13) 函数将流尾部写入输出媒体文件。 ``` av_write_trailer(output_format_context); ``` 现在我们准备测试它，第一个测试将是从 MP4 到 MPEG-TS 视频文件的格式（视频容器）转换。我们基本上是在使用 libav 执行命令行 `ffmpeg input.mp4 -c copy output.ts`。 ``` make run_remuxing_ts ``` 它工作了！！！你不相信我吗？！你不应该相信，我们可以用 `ffprobe` 检查它： ``` ffprobe -i remuxed_small_bunny_1080p_60fps.ts Input #0, mpegts, from 'remuxed_small_bunny_1080p_60fps.ts': Duration: 00:00:10.03, start: 0.000000, bitrate: 2751 kb/s Program 1 Metadata: service_name : Service01 service_provider: FFmpeg Stream #0:0[0x100]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 60 fps, 60 tbr, 90k tbn, 120 tbc Stream #0:1[0x101]: Audio: ac3 ([129][0][0][0] / 0x0081), 48000 Hz, 5.1(side), fltp, 320 kb/s ``` 为了总结我们在这里以图表形式所做的工作，我们可以重温我们最初关于 [libav 如何工作的想法](https://github.com/leandromoreira/ffmpeg-libav-tutorial#ffmpeg-libav-architecture)，但要说明我们跳过了编解码器部分。 ![remuxing libav components](/img/remuxing_libav_components.png) 在本章结束之前，我想展示重封装过程的一个重要部分，**你可以将选项传递给复用器**。假设我们要提供 [MPEG-DASH](https://developer.mozilla.org/en-US/docs/Web/Apps/Fundamentals/Audio_and_video_delivery/Setting_up_adaptive_streaming_media_sources#MPEG-DASH_Encoding) 格式，为此我们需要使用 [fragmented mp4](https://stackoverflow.com/a/35180327)（有时称为 `fmp4`）而不是 MPEG-TS 或普通 MPEG-4。使用[命令行我们可以轻松做到这一点](https://developer.mozilla.org/en-US/docs/Web/API/Media_Source_Extensions_API/Transcoding_assets_for_MSE#Fragmenting)。 ``` ffmpeg -i non_fragmented.mp4 -movflags frag_keyframe+empty_moov+default_base_moof fragmented.mp4 ``` 几乎与命令行一样简单的是它的 libav 版本，我们只需要在写入输出头部时传递选项，就在复制数据包之前。 ``` AVDictionary* opts = NULL; av_dict_set(&opts, "movflags", "frag_keyframe+empty_moov+default_base_moof", 0); ret = avformat_write_header(output_format_context, &opts); ``` 我们现在可以生成这个分片 mp4 文件： ``` make run_remuxing_fragmented_mp4 ``` 但为了确保我没有对你说谎。你可以使用绝佳的站点/工具 [gpac/mp4box.js](http://download.tsi.telecom-paristech.fr/gpac/mp4box.js/filereader.html) 或站点 [http://mp4parser.com/](http://mp4parser.com/) 来查看差异，首先加载“普通” mp4。 ![mp4 boxes](/img/boxes_normal_mp4.png) 如你所见，它有一个单一的 `mdat` atom/box，**这是视频和音频帧所在的地方**。现在加载分片 mp4 以查看它是如何分散 `mdat` box 的。 ![fragmented mp4 boxes](/img/boxes_fragmente_mp4.png) ## 第 3 章 - 转码在本章中，我们将创建一个极简的转码器，用 C 语言编写，可以使用 **FFmpeg/libav** 库，特别是 [libavcodec](https://ffmpeg.org/libavcodec.html)、libavformat 和 libavutil，将 H264 编码的视频转换为 H265。 ![media transcoding flow](/img/transcoding_flow.png) ### 转封装让我们从简单的转封装操作开始，然后我们可以在此代码基础上构建，第一步是**加载输入文件**。 ``` // Allocate an AVFormatContext avfc = avformat_alloc_context(); // Open an input stream and read the header. avformat_open_input(avfc, in_filename, NULL, NULL); // Read packets of a media file to get stream information. avformat_find_stream_info(avfc, NULL); ``` 现在我们将设置解码器，`AVFormatContext` 将让我们访问所有 `AVStream` 组件，对于每一个组件，我们可以获取它们的 `AVCodec` 并创建特定的 `AVCodecContext`，最后我们可以打开给定的编解码器，以便我们可以进行解码过程。 ``` for (int i = 0; i < avfc->nb_streams; i++) { AVStream *avs = avfc->streams[i]; AVCodec *avc = avcodec_find_decoder(avs->codecpar->codec_id); AVCodecContext *avcc = avcodec_alloc_context3(*avc); avcodec_parameters_to_context(*avcc, avs->codecpar); avcodec_open2(*avcc, *avc, NULL); } ``` 我们也需要为转封装准备输出媒体文件，我们首先为输出 `AVFormatContext` **分配内存**。我们在输出格式中创建**每个流**。为了正确打包流，我们从解码器**复制编解码器参数**。我们**设置标志** `AV_CODEC_FLAG_GLOBAL_HEADER`，它告诉编码器它可以使用全局头部，最后我们打开输出**文件进行写入**并持久化头部。 _BLOCK_40/> 我们从解码器获取 `AVPacket`，调整时间戳，并将数据包正确写入输出文件。尽管函数 `av_interleaved_write_frame` 说是“写入帧”，但我们存储的是数据包。我们通过将流尾部写入文件来完成转封装过程。 ``` AVFrame *input_frame = av_frame_alloc(); AVPacket *input_packet = av_packet_alloc(); while (av_read_frame(decoder_avfc, input_packet) >= 0) { av_packet_rescale_ts(input_packet, decoder_video_avs->time_base, encoder_video_avs->time_base); av_interleaved_write_frame(*avfc, input_packet) < 0)); } av_write_trailer(encoder_avfc); ``` ### 转码上一节展示了一个简单的转封装程序，现在我们将添加编码文件的能力，特别是我们将使其能够将视频从 `h264` 转码为 `h265`。在我们准备了解码器之后，但在我们安排输出媒体文件之前，我们将设置编码器。 * 在编码器中创建视频 `AVStream`，[`avformat_new_stream`](https://www.ffmpeg.org/doxygen/trunk/group__lavf__core.html#gadcb0fd3e507d9b58fe78f61f8ad39827) * 使用名为 `libx265` 的 `AVCodec`，[`avcodec_find_encoder_by_name`](https://www.ffmpeg.org/doxygen/trunk/group__lavc__encoding.html#gaa614ffc38511c104bdff4a3afa086d37) * 根据创建的编解码器创建 `AVCodecContext`，[`avcodec_alloc_context3`](https://www.ffmpeg.org/doxygen/trunk/group__lavc__core.html#gae80afec6f26df6607eaacf39b561c315) * 为转码会话设置基本属性，以及 * 打开编解码器并将参数从上下文复制到流。[`avcodec_open2`](https://www.ffmpeg.org/doxygen/trunk/group__lavc__core.html#ga11f785a188d7d9df71621001465b0f1d) 和 [`avcodec_parameters_from_context`](https://www.ffmpeg.org/doxygen/trunk/group__lavc__core.html#ga0c7058f764778615e7978a1821ab3cfe) ``` AVRational input_framerate = av_guess_frame_rate(decoder_avfc, decoder_video_avs, NULL); AVStream *video_avs = avformat_new_stream(encoder_avfc, NULL); char *codec_name = "libx265"; char *codec_priv_key = "x265-params"; // we're going to use internal options for the x265 // it disables the scene change detection and fix then // GOP on 60 frames. char *codec_priv_value = "keyint=60:min-keyint=60:scenecut=0"; AVCodec *video_avc = avcodec_find_encoder_by_name(codec_name); AVCodecContext *video_avcc = avcodec_alloc_context3(video_avc); // encoder codec params av_opt_set(sc->video_avcc->priv_data, codec_priv_key, codec_priv_value, 0); video_avcc->height = decoder_ctx->height; video_avcc->width = decoder_ctx->width; video_avcc->pix_fmt = video_avc->pix_fmts[0]; // control rate video_avcc->bit_rate = 2 * 1000 * 1000; video_avcc->rc_buffer_size = 4 * 1000 * 1000; video_avcc->rc_max_rate = 2 * 1000 * 1000; video_avcc->rc_min_rate = 2.5 * 1000 * 1000; // time base video_avcc->time_base = av_inv_q(input_framerate); video_avs->time_base = sc->video_avcc->time_base; avcodec_open2(sc->video_avcc, sc->video_avc, NULL); avcodec_parameters_from_context(sc->video_avs->codecpar, sc->video_avcc); ``` 我们需要扩展我们的解码循环以进行视频流转码： * 将空的 `AVPacket` 发送到解码器，[`avcodec_send_packet`](https://www.ffmpeg.org/doxygen/trunk/group__lavc__decoding.html#ga58bc4bf1e0ac59e27362597e467efff3) * 接收未压缩的 `AVFrame`，[`avcodec_receive_frame`](https://www.ffmpeg.org/doxygen/trunk/group__lavc__decoding.html#ga11e6542c4e66d3028668788a1a74217c) * 开始转码此原始帧， * 发送原始帧，[`avcodec_send_frame`](https://www.ffmpeg.org/doxygen/trunk/group__lavc__decoding.html#ga9395cb802a5febf1f00df31497779169) * 接收基于我们编解码器的压缩 `AVPacket`，[`avcodec_receive_packet`](https://www.ffmpeg.org/doxygen/trunk/group__lavc__decoding.html#ga5b8eff59cf259747cf0b31563e38ded6) * 设置时间戳，[`av_packet_rescale_ts`](https://www.ffmpeg.org/doxygen/trunk/group__lavc__packet.html#gae5c86e4d93f6e7aa62ef2c60763ea67e) * 将其写入输出文件。[`av_interleaved_write_frame`](https://www.ffmpeg.org/doxygen/trunk/group__lavf__encoding.html#ga37352ed2c63493c38219d935e71db6c1) ``` AVFrame *input_frame = av_frame_alloc(); AVPacket *input_packet = av_packet_alloc(); while (av_read_frame(decoder_avfc, input_packet) >= 0) { int response = avcodec_send_packet(decoder_video_avcc, input_packet); while (response >= 0) { response = avcodec_receive_frame(decoder_video_avcc, input_frame); if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) { break; } else if (response < 0) { return response; } if (response >= 0) { encode(encoder_avfc, decoder_video_avs, encoder_video_avs, decoder_video_avcc, input_packet->stream_index); } av_frame_unref(input_frame); } av_packet_unref(input_packet); } av_write_trailer(encoder_avfc); // used function int encode(AVFormatContext *avfc, AVStream *dec_video_avs, AVStream *enc_video_avs, AVCodecContext video_avcc int index) { AVPacket *output_packet = av_packet_alloc(); int response = avcodec_send_frame(video_avcc, input_frame); while (response >= 0) { response = avcodec_receive_packet(video_avcc, output_packet); if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) { break; } else if (response < 0) { return -1; } output_packet->stream_index = index; output_packet->duration = enc_video_avs->time_base.den / enc_video_avs->time_base.num / dec_video_avs->avg_frame_rate.num * dec_video_avs->avg_frame_rate.den; av_packet_rescale_ts(output_packet, dec_video_avs->time_base, enc_video_avs->time_base); response = av_interleaved_write_frame(avfc, output_packet); } av_packet_unref(output_packet); av_packet_free(&output_packet); return 0; } ``` 我们将媒体流从 `h264` 转换为 `h265`，正如预期的那样，媒体文件的 `h265` 版本比 `h264` 版本小，但是[创建的程序](/3_transcoding.c)能够做到： ``` /* * H264 -> H265 * Audio -> remuxed (untouched) * MP4 - MP4 */ StreamingParams sp = {0}; sp.copy_audio = 1; sp.copy_video = 0; sp.video_codec = "libx265"; sp.codec_priv_key = "x265-params"; sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0"; /* * H264 -> H264 (fixed gop) * Audio -> remuxed (untouched) * MP4 - MP4 */ StreamingParams sp = {0}; sp.copy_audio = 1; sp.copy_video = 0; sp.video_codec = "libx264"; sp.codec_priv_key = "x264-params"; sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-cfr=1"; /* * H264 -> H264 (fixed gop) * Audio -> remuxed (untouched) * MP4 - fragmented MP4 */ StreamingParams sp = {0}; sp.copy_audio = 1; sp.copy_video = 0; sp.video_codec = "libx264"; sp.codec_priv_key = "x264-params"; sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-cfr=1"; sp.muxer_opt_key = "movflags"; sp.muxer_opt_value = "frag_keyframe+empty_moov+delay_moov+default_base_moof"; /* * H264 -> H264 (fixed gop) * Audio -> AAC * MP4 - MPEG-TS */ StreamingParams sp = {0}; sp.copy_audio = 0; sp.copy_video = 0; sp.video_codec = "libx264"; sp.codec_priv_key = "x264-params"; sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-cfr=1"; sp.audio_codec = "aac"; sp.output_extension = ".ts"; /* WIP :P -> it's not playing on VLC, the final bit rate is huge * H264 -> VP9 * Audio -> Vorbis * MP4 - WebM */ //StreamingParams sp = {0}; //sp.copy_audio = 0; //sp.copy_video = 0; //sp.video_codec = "libvpx-vp9"; //sp.audio_codec = "libvorbis"; //sp.output_extension = ".webm"; ```

标签：DNS解析, FFmpeg, libav, 多媒体开发, 多媒体框架, 客户端加密, 封装, 开源项目, 技术开发文档, 流媒体, 编程学习, 编程教程, 视频开发, 视频编解码, 计算机科学, 请求拦截, 转码, 音视频处理, 音频开发