Arizonal8/Keyword-Search-Regular-Expressions
GitHub: Arizonal8/Keyword-Search-Regular-Expressions
Stars: 0 | Forks: 0
# Week 5: Keyword Search and Regular Expressions
## Overview
This lab explores keyword searching across a forensic disk image using EnCase.
Both RAW (direct disk) and Indexed searching strategies are applied and
compared. The investigation searches for the string `55-501165 Digital
Forensics` across multiple encoding formats, deleted data, system files, and
embedded image metadata.
## Why This Is Necessary
Keyword searching is one of the fastest ways to locate relevant evidence on a
large drive. Investigators use it to find incriminating communications,
financial data, credentials, and other targeted content. Understanding the
difference between RAW and Indexed search methods — and knowing where each
fails — is essential for a thorough investigation.
## Tools Used
- **EnCase Forensic** — keyword and index search engine
- **Regular expression engine** — for pattern-based searching
- **ANSI and Unicode search support** — to cover multiple text encodings
## Key Concepts Covered
- Difference between an **item** (file containing a hit) and a **hit** (each
individual occurrence of the keyword)
- Searching across multiple encodings: ANSI, UTF-8, UTF-16 LE/BE
- Finding keywords in deleted files, system files ($MFT), and image EXIF data
- RAW search: scans disk directly; must be repeated for every new keyword
- Indexed search: builds a searchable database; instant subsequent searches;
does not search raw binary content of media files directly
## Investigation Findings
### Keyword Searched
`55-501165 Digital Forensics`
- Encodings enabled: ANSI Latin-1, Unicode
- Case sensitive: No
### Results
- **Items found:** 14
- **Hits found:** 21
### File Types Containing the Keyword
| File | Location | Type |
|------|----------|------|
| df-ansi.txt | D:\Windows\Keywords\ | ANSI text |
| df-utf-8.txt | D:\Windows\Keywords\ | UTF-8 text |
| df-utf8-bom.txt | D:\Windows\Keywords\ | UTF-8 with BOM |
| df-utf-16-le.txt | D:\Windows\Keywords\ | UTF-16 Little Endian |
| df-utf-16-be.txt | D:\Windows\Keywords\ | UTF-16 Big Endian |
| df-deleted.txt | D:\$Recycle.Bin\ | Deleted file (recovered) |
| df-erased.txt | Unallocated space | Erased but recoverable |
| CatwithEXIF.jpg | D:\Pictures\ | EXIF metadata |
| WinterHex.jpg | D:\Pictures\ | Embedded hex content |
| $MFT (multiple) | System | Master File Table entries |
### RAW vs Indexed Search Comparison
| Capability | RAW Search | Indexed Search |
|------------|-----------|----------------|
| New keyword speed | Slow (full disk re-scan) | Instant |
| Searches deleted data | Yes | Limited |
| Searches $MFT | Yes | No |
| Searches image EXIF | Yes | Yes (metadata) |
| Searches inside PDFs/DOCX | No (encoded content) | Yes |
| Searches raw image bytes | Yes | No |
## Screenshots
### Search Configuration

*EnCase keyword search configuration showing ANSI and Unicode encodings
selected for the search term '55-501165 Digital Forensics'.*
### Search Results Summary

*Keyword Hits panel showing 14 items and 21 hits, demonstrating the
distinction between files containing hits and individual hit occurrences.*
### Results Analysis

*Full results panel showing keyword hits across text files, image EXIF
metadata, deleted files, and system structures including the $MFT.*
### RAW vs Indexed Comparison

*Comparison of RAW and Indexed search results highlighting the differences
in coverage and performance between the two search strategies.*
## Learning Outcomes
- Configure and execute a keyword search across multiple text encodings
- Differentiate between items and hits in search results
- Understand why both RAW and Indexed searches are needed in thorough
investigations
- Locate keywords in deleted files, EXIF metadata, and system structures
- Know the limitations of each search method to avoid missing evidence