ricotrevisan/bubble_ex

GitHub: ricotrevisan/bubble_ex

BubbleEx 是一个 Elixir 库，用于分析 Bubble.io 应用，支持数据库结构逆向导出、机密信息扫描和应用日志查询。

Stars: 0 | Forks: 0

# BubbleEx BubbleEx 是一组用于扫描 bubble.io 应用的实用工具。它可以： - 检查 URL 是否为 bubble.io 应用 - 检查 bubble 应用是否处于专用实例中 - 构建 db 结构并将其导出为 DBML、SQL (Postgres/SQLite/T-SQL)、Ecto、Zod、Xano 或 Convex - 检查暴露的 endpoint - 扫描暴露的机密信息（内置 Native 扫描器或 Trufflehog） - 查询应用日志以进行监控和调试 - 深入搜索嵌套数据结构以查找特定值 ## 错误处理每个公共函数都返回 `{:ok, result}` 或 `{:error, %BubbleEx.Error{}}`。 `BubbleEx.Error` 是一个单一结构体，包含一个 `:kind`（一个封闭的 atom 集合，例如 `:not_a_bubble_app`、`:unauthorized`、`:invalid_input`、`:parse_failed`、 `:cli_missing`、`:request_failed`），一个人类可读的 `:message`，以及一个 `:context` 映射。可以通过 Pattern-match `kind` 来统一处理失败情况： ``` case BubbleEx.fetch_app("some-app") do {:ok, app} -> app {:error, %BubbleEx.Error{kind: :not_a_bubble_app}} -> :not_bubble {:error, %BubbleEx.Error{} = error} -> Logger.warning(Exception.message(error)) end ``` ## 安装将 `bubble_ex` 添加到 `mix.exs` 的依赖列表中： ``` def deps do [{:bubble_ex, "~> 0.3"}] end ``` 文档可在获取。如果与 Phoenix 一起使用，您可能会遇到 `floki` 的错误。您必须移除 `only: test` 限制才能使其正常工作。 ## 配置可以在您的应用配置中配置 BubbleEx： ``` config :bubble_ex, logs: [ default_endpoint: "https://bubble.io/appeditor/get_jetstream_logs", default_timeout: 30_000, default_app_version: "live", pool_max_connections: 10, pool_timeout: 30_000 ], apps: [ default_timeout: 10_000, max_body_length: 100_000_000 ] ``` ## 数据库结构与 Schema 导出 BubbleEx 会重构 Bubble 应用的数据模型 —— 数据类型（表）、选项集、字段、类型和关系 —— 并将其渲染为多种 schema 格式。向 `fetch_app/2` 传递 `:format`；渲染后的 schema 将在 `:schema` 键中返回。 ``` {:ok, app} = BubbleEx.fetch_app("my-app", format: :postgres) IO.puts(app.schema) # CREATE SCHEMA IF NOT EXISTS "custom"; # # CREATE TABLE "custom"."Survey Response" ( # "answer" text, # ... # "_id" text, # PRIMARY KEY ("_id") # ); # # ALTER TABLE "custom"."Survey Response" # ADD FOREIGN KEY ("status") REFERENCES "option"."Status Type" ("Display"); ``` ### 可用格式 | `:format` | 输出 | |-------------|--------| | `:dbml` | DBML ([dbdiagram.io](https://dbdiagram.io) / [dbml.org](https://dbml.org)) | | `:postgres` | PostgreSQL DDL | | `:sqlite` | SQLite DDL | | `:tsql` | SQL Server / Azure SQL T-SQL DDL | | `:ecto` | Ecto schema 模块 + 迁移 | | `:zod` | Zod (TypeScript) 验证 schema | | `:xano` | Xano 表 schema 导入 JSON | | `:convex` | Convex `schema.ts` | ### 命名默认情况下，输出使用应用人类可读的显示名称（`naming: :proper`）。传入 `naming: :id` 以改用 Bubble 的内部标识符： ``` {:ok, app} = BubbleEx.fetch_app("my-app", format: :ecto, naming: :id) ``` ### 保留与不保留的内容每个编码器都会在目标允许的范围内尽可能忠实地映射 Bubble 的模型。标量引用会变成真正的外键（SQL/Ecto）或 id 字段；Bubble 的 *list* 字段在支持的地方会变为原生数组（如 Postgres 中的 `text[]`），否则会变成 JSON/text 列。选项集（enums）会作为查找表或字符串字段输出 —— Bubble 的 payload 不包含选项的 *成员值*，因此它们无法成为原生数据库枚举。外部（`:api`）数据类型会被省略。每种格式的输出都会内联记录其自身的降级处理。 ### DBML / 数据库图表（旧版选项）原有的 DBML 路径保持不变，仍可通过其自身的选项使用： ``` {:ok, app} = BubbleEx.fetch_app("my-app", dbml: true) app.dbml # DBML text app.dbdiagram # same content ``` ### 自定义格式格式是可插拔的。每种格式都是一个实现了 `BubbleEx.Db.Encoder` behaviour 的模块 —— `encode(db_map, opts) :: {:ok, String.t()} | {:error, %BubbleEx.Error{}}`，作用于由 `BubbleEx.Db.Reader.parse/1` 生成的通用 map —— 注册在 `BubbleEx.Db.Encoder` 中。要添加新目标，请实现该 behaviour 并注册其 `:format` atom。 ## 扫描机密信息机密扫描可通过 `BubbleEx.Secrets` behaviour 进行插拔。默认的适配器 `BubbleEx.Secrets.Trufflehog` 会通过 shell 调用可选的 `trufflehog` CLI。您可以按每次调用（`adapter:` 选项）或全局替换为自己的扫描器： ``` config :bubble_ex, :secrets_adapter, MyApp.CustomScanner ``` ### BubbleEx.Secrets.Native `BubbleEx.Secrets.Native` 是一个纯 Elixir 编写、零依赖的离线扫描器 —— 无需 CLI。对于 `trufflehog` 不可用的环境，或者当您需要快速、无依赖的初步扫描时，它是一个很好的基准。它能检测到的机密类型少于 Trufflehog，并且 **不会进行实时验证** —— 每个发现结果的 `verified: false` 均为 false，应被视为潜在的机密，有待进一步审查。 **全局选择它：** ``` config :bubble_ex, :secrets_adapter, BubbleEx.Secrets.Native ``` **或按调用选择：** ``` {:ok, findings} = BubbleEx.Secrets.scan(payload, adapter: BubbleEx.Secrets.Native) ``` **检测器（默认运行）：** AWS 凭证、GitHub 个人访问 token、 Stripe 密钥、Slack token、Google API 密钥、JWT 以及 PEM 私钥头部。 base64 解析阶段会重新扫描解码后的值。还有一个额外的熵级别，但它是 **可选的**（默认关闭）： ``` {:ok, findings} = BubbleEx.Secrets.scan(payload, adapter: BubbleEx.Secrets.Native, entropy: true ) ``` **发现结果结构**（atom 键）： ``` %{ detector: "github_pat", # string identifying the detector raw: "ghp_...", # the matched string redacted: "ghp_…abcd", # Path elements are strings (map keys) OR integers (list indices), # e.g. ["plugins", 0, "token"] for a secret inside the first list item. path: ["plugins", 0, "token"], decoder: :plain, # :plain | :base64 verified: false, # always false — no live check is performed confidence: :high # :high (regex/base64) | :low (entropy) } ``` ### 前置条件默认适配器（`BubbleEx.Secrets.Trufflehog`）要求在您的 `PATH` 中包含 `trufflehog` CLI（[安装说明](https://github.com/trufflesecurity/trufflehog)）。当未安装时，扫描将返回 `{:error, %BubbleEx.Error{kind: :cli_missing}}` 而不会引发异常。如果您需要无 CLI 的替代方案，请使用 `BubbleEx.Secrets.Native`。 ### 同步扫描对于简单的同步扫描： ``` # 使用 Elixir map payload = %{"_id" => "app_123", "data" => "content to scan"} {:ok, results} = BubbleEx.scan_payload_for_secrets(payload) # 处理结果 Enum.each(results, fn item -> IO.puts("Found secret: #{item["DetectorType"]}") end) ``` ### 使用 Server 进行异步扫描对于长时间运行的扫描，您可以使用 `Server` 异步运行扫描： #### 启动 Server 首先，将 Server 添加到您的 application.ex 文件中的监控树（supervision tree）中： ``` def start(_type, _args) do children = [ # ...other children {BubbleEx.Server, []} ] opts = [strategy: :one_for_one, name: YourApp.Supervisor] Supervisor.start_link(children, opts) end ``` 或者，手动启动它： ``` {:ok, _pid} = BubbleEx.Server.start_link() ``` #### 使用 Server ``` # 启动扫描并获取 ref payload = %{"_id" => "app_123", "data" => "content to scan"} {:ok, ref} = BubbleEx.start_scan_for_secrets(payload) # 调用进程将收到进度消息： receive do {:scan_started, ^ref} -> IO.puts("Scan started") {:scan_output, ^ref, output} -> IO.puts("Scan progress: #{output}") {:scan_completed, ^ref, results} -> IO.puts("Scan completed with #{length(results)} findings") {:scan_error, ^ref, error} -> IO.puts("Scan error: #{inspect(error)}") {:scan_cancelled, ^ref} -> IO.puts("Scan was cancelled") end # 检查状态 {:ok, status} = BubbleEx.scan_status(ref) # 如有需要取消扫描 :ok = BubbleEx.cancel_scan(ref) ``` ## 查询应用日志 BubbleEx 提供了查询 Bubble.io 应用日志的功能，用于监控、调试和分析。 ### 前置条件您需要一个有效的 Bubble 会话 cookie 来向 Bubble API 验证身份。可以通过登录您的 Bubble 账户并从浏览器中提取该 session cookie 来获取。 ### 基本用法 ``` # 获取最近一小时的日志 {:ok, logs} = BubbleEx.fetch_logs("my-app", cookie: "bubble_session=abc123..." ) # 访问日志条目 IO.inspect(logs.logs) ``` ### 时间范围过滤 ``` # 获取最近 30 分钟的日志 {after_time, before_time} = BubbleEx.Logs.time_range({:minutes, 30}) {:ok, logs} = BubbleEx.fetch_logs("my-app", cookie: "bubble_session=abc123...", after: after_time, before: before_time ) # 获取最近一天的日志 {after_time, before_time} = BubbleEx.Logs.time_range(:last_day) {:ok, logs} = BubbleEx.fetch_logs("my-app", cookie: "bubble_session=abc123...", after: after_time, before: before_time ) ``` ### 按日志类型过滤 ``` # 仅获取错误日志 {:ok, error_logs} = BubbleEx.fetch_logs("my-app", cookie: "bubble_session=abc123...", tags: BubbleEx.Logs.preset_filter(:errors, "my-app") ) # 仅获取与 workflow 相关的日志 {:ok, workflow_logs} = BubbleEx.fetch_logs("my-app", cookie: "bubble_session=abc123...", tags: BubbleEx.Logs.preset_filter(:workflows, "my-app") ) # 获取与 API 相关的日志 {:ok, api_logs} = BubbleEx.fetch_logs("my-app", cookie: "bubble_session=abc123...", tags: BubbleEx.Logs.preset_filter(:api, "my-app") ) ``` ### 可用的预设过滤器 - `:errors` - 错误和失败消息 - `:workflows` - Workflow 执行日志 - `:api` - HTTP 请求和 API workflow - `:database` - 数据库操作 - `:plugins` - 插件控制台输出和错误 - `:scheduled` - 计划任务执行 - `:all` - 所有日志类型（默认） ### 自定义过滤 ``` # 自定义 tag 过滤 {:ok, custom_logs} = BubbleEx.fetch_logs("my-app", cookie: "bubble_session=abc123...", tags: %{ message: ["running event", "event completed"], appname: "my-app", app_version: "live" } ) ``` ### 查询不同的应用版本 ``` # 从测试版本获取日志 {:ok, test_logs} = BubbleEx.fetch_logs("my-app", cookie: "bubble_session=abc123...", app_version: "test" ) # 从开发版本获取日志 {:ok, dev_logs} = BubbleEx.fetch_logs("my-app", cookie: "bubble_session=abc123...", app_version: "development" ) ``` ### 异步用法对于需要异步查询日志或处理大量日志请求的应用，以下是一些模式： #### 使用 Task 获取异步日志 ``` # 异步获取日志 task = Task.async(fn -> BubbleEx.fetch_logs("my-app", cookie: System.get_env("BUBBLE_COOKIE"), tags: BubbleEx.Logs.preset_filter(:errors, "my-app") ) end) # 执行其他操作... # 获取结果 case Task.await(task, 30_000) do {:ok, logs} -> IO.puts("Found #{length(logs.logs)} error logs") {:error, reason} -> IO.puts("Error fetching logs: #{inspect(reason)}") end ``` #### 使用 GenServer 进行定期日志监控 ``` defmodule MyApp.LogMonitor do use GenServer def start_link(opts) do GenServer.start_link(__MODULE__, opts, name: __MODULE__) end def init(opts) do app_id = Keyword.fetch!(opts, :app_id) cookie = Keyword.fetch!(opts, :cookie) interval = Keyword.get(opts, :interval, 60_000) # 1 minute schedule_check(interval) {:ok, %{app_id: app_id, cookie: cookie, interval: interval}} end def handle_info(:check_logs, state) do {after_time, before_time} = BubbleEx.Logs.time_range({:minutes, 5}) case BubbleEx.fetch_logs(state.app_id, cookie: state.cookie, after: after_time, before: before_time, tags: BubbleEx.Logs.preset_filter(:errors, state.app_id)) do {:ok, %{logs: logs}} when length(logs) > 0 -> # Handle error logs - send alerts, store in database, etc. handle_error_logs(logs) {:ok, _} -> # No errors found :ok {:error, reason} -> # Log monitoring error Logger.error("Failed to fetch logs: #{inspect(reason)}") end schedule_check(state.interval) {:noreply, state} end defp schedule_check(interval) do Process.send_after(self(), :check_logs, interval) end defp handle_error_logs(logs) do # Process error logs... Enum.each(logs, fn log -> Logger.warning("App error detected: #{inspect(log)}") end) end end # 启动 monitor {:ok, _pid} = MyApp.LogMonitor.start_link( app_id: "my-app", cookie: System.get_env("BUBBLE_COOKIE") ) ``` #### 批量处理多个应用 ``` defmodule MyApp.LogAggregator do def fetch_logs_for_apps(app_configs) do app_configs |> Task.async_stream(fn %{app_id: app_id, cookie: cookie} -> {after_time, before_time} = BubbleEx.Logs.time_range(:last_hour) case BubbleEx.fetch_logs(app_id, cookie: cookie, after: after_time, before: before_time) do {:ok, logs} -> {:ok, app_id, logs} {:error, reason} -> {:error, app_id, reason} end end, max_concurrency: 5, timeout: 30_000) |> Enum.to_list() end end # 用法 apps = [ %{app_id: "app1", cookie: "cookie1"}, %{app_id: "app2", cookie: "cookie2"}, %{app_id: "app3", cookie: "cookie3"} ] results = MyApp.LogAggregator.fetch_logs_for_apps(apps) results |> Enum.each(fn {:ok, {:ok, app_id, logs}} -> IO.puts("#{app_id}: #{length(logs.logs)} logs") {:ok, {:error, app_id, reason}} -> IO.puts("#{app_id}: Error - #{inspect(reason)}") end) ``` ### 性能优化 BubbleEx 使用 HTTP 连接池来提升发起多次日志请求时的性能： ``` # 配置 connection pooling（可选） config :bubble_ex, logs: [ pool_max_connections: 20, # Maximum concurrent connections pool_timeout: 30_000 # Pool timeout in milliseconds ] ``` 连接池会被自动管理，并在请求间重用连接，这显著提升了频繁进行日志查询的应用的性能。 ### 安全注意事项 - 切勿在您的应用日志中记录或暴露 session cookie - 将 cookie 作为环境变量或安全地存储在配置中 - 使用正确的错误处理以避免泄露身份验证详细信息 - 考虑为长时间运行的应用实现 cookie 轮换 - 使用异步模式时，请确保进行适当的超时处理，以避免进程挂起 ## 深度搜索 BubbleEx 提供了强大的深度搜索功能，用于遍历和搜索复杂的嵌套数据结构，这在分析 Bubble.io 应用数据时特别有用。 ### 基本用法 ``` # 在嵌套数据中搜索特定值 data = %{ "user" => %{ "name" => "John Doe", "email" => "john@example.com", "settings" => %{ "theme" => "dark", "notifications" => ["email", "push"] } }, "posts" => [ %{"title" => "First Post", "content" => "Hello world"}, %{"title" => "Second Post", "content" => "Another post"} ] } # 查找所有包含 "email" 的路径 paths = BubbleEx.DeepSearch.find_all_paths(data, "email") # 返回：[["user", "settings", "notifications", 0], ["user", "email"]] # 查找所有包含 "Post" 的路径 paths = BubbleEx.DeepSearch.find_all_paths(data, "Post") # 返回：[["posts", 1, "title"], ["posts", 0, "title"]] ``` ### 理解路径结果返回的路径是列表，其中： - String 元素代表 map 的键 - Integer 元素代表 list 的索引 **关于 `get_in/2` 用法的注意事项：** 虽然仅包含 map 键的路径可以在 `get_in/2` 中正常使用，但由于 Elixir 的 Access 限制，包含 list 索引的路径无法使用。对于 list 访问，请使用手动遍历或 `Enum.at/2`。 ``` # 仅包含 map 的路径可与 get_in 一起使用 data = %{"users" => %{"admin" => %{"name" => "Alice"}}} [path] = BubbleEx.DeepSearch.find_all_paths(data, "Alice") value = get_in(data, path) # 返回："Alice" # 带有列表索引的路径需要手动遍历 data = %{"items" => ["first", "second", "third"]} [["items", 1]] = BubbleEx.DeepSearch.find_all_paths(data, "second") value = data["items"] |> Enum.at(1) # 返回："second" ``` ### Bubble.io 数据的实际示例 ``` # 在 Bubble app 数据中搜索特定 field ID {:ok, app_data} = BubbleEx.fetch_bubble_app("myapp") field_paths = BubbleEx.DeepSearch.find_all_paths(app_data, "_id_1234567890") # 查找对特定用户的所有引用 user_refs = BubbleEx.DeepSearch.find_all_paths(app_data, "user_abc123") # 定位 API endpoints api_paths = BubbleEx.DeepSearch.find_all_paths(app_data, "api/1.1/") # 查找 database table 引用 table_paths = BubbleEx.DeepSearch.find_all_paths(app_data, "data_type_") ``` ### 处理搜索结果 ``` # 提取并处理所有匹配的值 data = fetch_complex_data() paths = BubbleEx.DeepSearch.find_all_paths(data, "secret_") # 获取所有实际值 values = Enum.map(paths, fn path -> {path, get_in(data, path)} end) # 按深度对路径分组 grouped = Enum.group_by(paths, &length/1) # 仅查找顶级出现位置 top_level = Enum.filter(paths, fn path -> length(path) == 1 end) ``` ### 性能注意事项 - 该函数会对数据结构进行完整的遍历 - 对于非常大的数据集，请考虑实现分页或流式传输 - 结果以发现的倒序返回（最深层的优先） - 出于性能考虑，字符串匹配区分大小写

标签：Bubble.io, Elixir, StruQ, 敏感信息扫描, 数据提取, 逆向分析