tafia/quick-xml
GitHub: tafia/quick-xml
Rust 编写的高性能近乎零拷贝 XML 解析与生成库,支持 Serde 序列化。
Stars: 1476 | Forks: 277
# quick-xml

[](https://crates.io/crates/quick-xml)
[](https://docs.rs/quick-xml)
[](https://codecov.io/gh/tafia/quick-xml)
[](https://blog.rust-lang.org/2021/10/21/Rust-1.56.0.html)
高性能 XML pull reader/writer。
该 reader:
- 几乎是零拷贝的(尽可能使用 `Cow`)
- 对内存分配友好(API 提供了重用缓冲区的方式)
- 支持多种编码(通过 `encoding` feature)、命名空间解析和特殊字符。
语法灵感来源于 [xml-rs](https://github.com/netvl/xml-rs)。
## 示例
### Reader
```
use quick_xml::events::Event;
use quick_xml::reader::Reader;
let xml = r#"
Test
Test 2
"#;
let mut reader = Reader::from_str(xml);
reader.config_mut().trim_text(true);
let mut count = 0;
let mut txt = Vec::new();
let mut buf = Vec::new();
// The `Reader` does not implement `Iterator` because it outputs borrowed data (`Cow`s)
loop {
// NOTE: this is the generic case when we don't know about the input BufRead.
// when the input is a &str or a &[u8], we don't actually need to use another
// buffer, we could directly call `reader.read_event()`
match reader.read_event_into(&mut buf) {
Err(e) => panic!("Error at position {}: {:?}", reader.error_position(), e),
// exits the loop when reaching end of file
Ok(Event::Eof) => break,
Ok(Event::Start(e)) => {
match e.name().as_ref() {
b"tag1" => println!("attributes values: {:?}",
e.attributes().map(|a| a.unwrap().value)
.collect::>()),
b"tag2" => count += 1,
_ => (),
}
}
Ok(Event::Text(e)) => txt.push(e.decode().unwrap().into_owned()),
// There are several other `Event`s we do not consider here
_ => (),
}
// if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
buf.clear();
}
```
### Writer
```
use quick_xml::events::{Event, BytesEnd, BytesStart};
use quick_xml::reader::Reader;
use quick_xml::writer::Writer;
use std::io::Cursor;
let xml = r#"text "#;
let mut reader = Reader::from_str(xml);
reader.config_mut().trim_text(true);
let mut writer = Writer::new(Cursor::new(Vec::new()));
loop {
match reader.read_event() {
Ok(Event::Start(e)) if e.name().as_ref() == b"this_tag" => {
// creates a new element ... alternatively we could reuse `e` by calling
// `e.into_owned()`
let mut elem = BytesStart::new("my_elem");
// collect existing attributes
elem.extend_attributes(e.attributes().map(|attr| attr.unwrap()));
// copy existing attributes, adds a new my-key="some value" attribute
elem.push_attribute(("my-key", "some value"));
// writes the event to the writer
assert!(writer.write_event(Event::Start(elem)).is_ok());
},
Ok(Event::End(e)) if e.name().as_ref() == b"this_tag" => {
assert!(writer.write_event(Event::End(BytesEnd::new("my_elem"))).is_ok());
},
Ok(Event::Eof) => break,
// we can either move or borrow the event to write, depending on your use-case
Ok(e) => assert!(writer.write_event(e).is_ok()),
Err(e) => panic!("Error at position {}: {:?}", reader.error_position(), e),
}
}
let result = writer.into_inner().into_inner();
let expected = r#"text "#;
assert_eq!(result, expected.as_bytes());
```
## Serde
当使用 `serialize` feature 时,quick-xml 可以与 serde 的 `Serialize`/`Deserialize` traits 配合使用。
XML 与 Rust 类型之间的映射,特别是允许你区分*元素*(elements)和*属性*(attributes)的语法,
在 [反序列化](https://docs.rs/quick-xml/latest/quick_xml/de/) 的文档中有详细描述。
### 解析标签的“值”
如果你有一个形如 `bar ` 的输入,并且想要获取 `bar`,
你可以使用特殊名称 `$text` 或特殊名称 `$value`:
```
struct Foo {
#[serde(rename = "@abc")]
pub abc: String,
#[serde(rename = "$text")]
pub body: String,
}
```
在 [文档](https://docs.rs/quick-xml/latest/quick_xml/de/index.html#difference-between-text-and-value-special-names) 中阅读关于它们差异的说明。
### 性能
请注意,尽管没有专注于性能(存在一些不必要的拷贝),但它仍然比 serde-xml-rs 快约 10 倍。
# 功能
- `encoding`:支持非 utf8 xml
- `serialize`:支持 serde `Serialize`/`Deserialize`
## 性能
基准测试很难做,结果取决于你的输入文件和机器。
在我的特定文件上,quick-xml 大约比 [xml-rs](https://crates.io/crates/xml-rs) crate **快 50 倍**。
```
// quick-xml benches
test bench_quick_xml ... bench: 198,866 ns/iter (+/- 9,663)
test bench_quick_xml_escaped ... bench: 282,740 ns/iter (+/- 61,625)
test bench_quick_xml_namespaced ... bench: 389,977 ns/iter (+/- 32,045)
// same bench with xml-rs
test bench_xml_rs ... bench: 14,468,930 ns/iter (+/- 321,171)
// serde-xml-rs vs serialize feature
test bench_serde_quick_xml ... bench: 1,181,198 ns/iter (+/- 138,290)
test bench_serde_xml_rs ... bench: 15,039,564 ns/iter (+/- 783,485)
```
关于功能和性能的对比,你也可以查看 RazrFalcon 的 [解析器对比表](https://github.com/RazrFalcon/roxmltree#parsing)。
## 贡献
欢迎任何 PR!
## 许可证
MIT
标签:Crate, Io, Parser, Pull解析, Rust, Serialization, XML, XML读写, 内存安全, 反序列化, 可视化界面, 序列化, 开发组件, 开源库, 搜索引擎爬虫, 数据结构, 编码转换, 网络流量审计, 解析器, 通知系统, 通知系统, 零拷贝