ipanalytics/ASNforge
GitHub: ipanalytics/ASNforge
Stars: 1 | Forks: 0
# ASNForge
ASNForge builds reproducible ASN and prefix-origin intelligence artifacts for IP enrichment, routing analytics, and security data pipelines. It compiles public registry and routing inputs into a compact IP-to-ASN MaxMind DB, canonical ASN tables, prefix-origin snapshots, build metadata, checksums, and release-ready archives.
asnforge inspect-asn
asnforge stats
asnforge version
Common flags:
| Flag | Default | Description |
| --- | --- | --- |
| `--config` | `config/public-safe.yaml` | Build configuration |
| `--out` | `release/current` | Release output directory |
| `--cache` | `data/cache` | Source cache directory |
| `--build-id` | UTC timestamp | Explicit build identifier |
| `--schema-version` | `asnforge.v0.1` | Artifact schema version |
| `--private-asn-policy` | config value | `flag`, `drop`, or `keep` |
| `--moas-policy` | config value | `mark_ambiguous`, `most_observed`, or `lowest_asn` |
| `--mmdb` | `/asnforge.mmdb` | MMDB path for build or inspect |
| `--skip-download` | `false` | Use cached/local source files |
| `--strict` | `false` | Treat quality warnings as build failures |
| `--format` | `text` | `text` or `json` where supported |
## Outputs
A successful build writes a complete release directory:
release/current/
├── asnforge.mmdb
├── asnforge.mmdb.gz
├── asnforge-asn.jsonl
├── asnforge-asn.jsonl.gz
├── asnforge-asn.csv
├── asnforge-asn.csv.gz
├── asnforge-prefixes.jsonl
├── asnforge-prefixes.jsonl.gz
├── asnforge-prefixes.csv
├── asnforge-prefixes.csv.gz
├── metadata.json
├── checksums.txt
├── quality-report.md
├── asnforge-diff.json
└── manifest.json
| File | Description |
| --- | --- |
| `asnforge.mmdb` | Compact MaxMind DB for IP -> ASN profile lookup |
| `asnforge-asn.jsonl` | Canonical ASN profile table |
| `asnforge-asn.csv` | CSV form of the ASN profile table |
| `asnforge-prefixes.jsonl` | Canonical prefix-origin snapshot |
| `asnforge-prefixes.csv` | CSV form of prefix-origin state |
| `metadata.json` | Build metadata, source hashes, artifact hashes, summary, quality verdict |
| `checksums.txt` | SHA256 checksums for release artifacts |
| `quality-report.md` | Human-readable build and quality report |
| `asnforge-diff.json` | Baseline or release-to-release diff shape |
| `manifest.json` | Machine-readable artifact manifest |
## Data Formats
Every primary record includes:
- `schema_version`
- `build_id`
These fields make joins explicit and prevent accidental mixing of incompatible builds.
### ASN Profile
`asnforge-asn.jsonl` and `asnforge-asn.csv` contain one row per ASN:
{
"schema_version": "asnforge.v0.1",
"build_id": "local-dev",
"asn": 15169,
"asn_name": "Google LLC",
"asn_org": "Google",
"asn_type": "cloud",
"asn_tags": ["cloud", "dns", "manual-override", "search"],
"registration_country": "US",
"rir": "arin",
"asn_confidence": 100
}
`registration_country` is the registry allocation country from RIR delegated data. It is not user, host, or service geolocation.
### Prefix Origin
`asnforge-prefixes.jsonl` and `asnforge-prefixes.csv` preserve routing observation state:
{
"prefix": "8.8.8.0/24",
"origin_asns": [15169],
"selected_origin_asn": 15169,
"moas": false,
"origin_policy": "most_observed",
"observation_count": 2,
"source_collectors": ["ris-rrc00", "routeviews2"],
"prefix_confidence": 90,
"rpki_state": "unknown"
}
### MMDB Record
The MMDB is prefix-keyed and optimized for local IP enrichment:
{
"schema_version": "asnforge.v0.1",
"build_id": "local-dev",
"asn": 15169,
"asn_name": "Google LLC",
"asn_org": "Google",
"asn_type": "cloud",
"asn_tags": ["cloud", "dns", "manual-override", "search"],
"registration_country": "US",
"rir": "arin",
"moas": false,
"asn_confidence": 100
}
Detailed MOAS state, origin arrays, collector observations, and field-level provenance belong in the prefix and ASN tables. Keeping those fields out of the MMDB preserves data-section deduplication and keeps the database compact.
## Operational Notes
- Builds are deterministic for the same inputs, config, schema version, and build id.
- ASN rows are sorted by numeric ASN.
- Prefix rows are sorted by IP family, address bytes, and prefix length.
- CSV list fields use semicolon-separated stable ordering.
- `metadata.json` records source paths, URLs, SHA256 hashes, sizes, generated time, artifact hashes, and quality summary.
- `validate --strict` is intended for CI and release workflows.
## Source Profiles
| Profile | Purpose | Network required |
| --- | --- | --- |
| `config/local-dev.yaml` | Deterministic development and CI fixture build | No |
| `config/public-safe.yaml` | Public-safe release profile using RIR delegated files, bgp.tools exports, and static ipanalytics ASN signal feeds | Yes |
| `config/research-caida.yaml` | Public-safe sources plus optional CAIDA ASRank, AS2Org, and AS relationships bulk files | Yes, plus operator-provided CAIDA files |
The research CAIDA profile is separate because CAIDA datasets have their own acceptable-use, citation, and redistribution terms. CAIDA fields are written to the ASN JSONL/CSV artifacts and are intentionally excluded from the compact MMDB.
Default CAIDA research inputs:
| Dataset | File |
| --- | --- |
| AS2Org | `https://publicdata.caida.org/datasets/as-organizations/latest.as-org2info.txt.gz` |
| AS relationships | Latest `*.as-rel2.txt.bz2` resolved from `https://publicdata.caida.org/datasets/as-relationships/serial-2/` |
| ASRank | Operator-provided CSV path or URL; ASRank API crawling is not used |
The repository includes a monthly/manual `release-caida` workflow that publishes a prerelease tagged `research-caida-YYYYMMDD-HHMMSSZ`.
## Use Cases
- IP enrichment in SIEM, fraud, abuse, and traffic analytics systems.
- Local ASN profile joins in data warehouses and stream processors.
- Prefix-origin snapshots for routing analytics and MOAS review.
- Reproducible release artifacts for internal security data pipelines.
- Build-time validation of third-party registry and routing source changes.
## Scope
ASNForge v0.1 focuses on the pipeline shape: parsers, normalized models, deterministic outputs, compact MMDB generation, metadata, validation, and release automation.
Classification is conservative. `asn_type` is a scored operational classification, not an authoritative registry fact. Confidence describes source agreement, completeness, or observation strength; it is not a risk score.
## Limitations
- Native MRT parsing is not implemented in v0.1.
- Prefix-origin input is normalized CSV/TSV.
- RPKI state defaults to `unknown`.
- ASN classification is intentionally sparse without optional enrichment sources.
- Live routing data varies by collector and collection time.
## Directory Structure
.
├── cmd/asnforge/ # CLI entry point
├── internal/asn/ # ASN models, classification, private/reserved policy
├── internal/bgp/ # Prefix-origin parser and aggregation
├── internal/build/ # Build pipeline, metadata, quality, diff
├── internal/config/ # Config loading and CLI options
├── internal/download/ # Source download, hashing, source state
├── internal/mmdb/ # MaxMind DB writer and inspector
├── internal/output/ # JSONL, CSV, gzip, checksums
├── internal/rir/ # RIR delegated parser
├── internal/smoke/ # Smoke test runner
├── config/ # Build profiles
├── schemas/ # JSON Schemas
├── examples/ # Overrides, smoke cases, deterministic testdata
├── docs/ # Data source and artifact documentation
└── .github/workflows/ # CI and release automation
## Deployment
Release artifacts are intended to be published through GitHub Releases, not committed to the repository.
./asnforge build --config config/public-safe.yaml --out release/current
./asnforge validate --out release/current --strict
For internal deployments, run the same commands in CI and publish `release/current/*` to object storage, package registries, or internal artifact repositories.
## Documentation
- [Data Sources](docs/DATA_SOURCES.md)
- [Third-Party Data](docs/THIRD_PARTY_DATA.md)
- [Classification](docs/CLASSIFICATION.md)
- [MMDB Output](docs/MMDB_OUTPUT.md)
- [Release Artifacts](docs/RELEASE_ARTIFACTS.md)
- [Confidence](docs/CONFIDENCE.md)
## License
ASNForge is licensed under the [Apache License 2.0](LICENSE).
## Disclaimer
ASNForge aggregates registry and observed routing data for defensive, analytical, and operational use. Routing data is observational and may differ by collector and collection time.
标签:EVTX分析