Entity Extraction
How PlotLens reads your manuscripts and turns them into a story bible — characters, locations, events, items, organizations, and concepts.
Entity extraction reads every document you upload and builds your story bible automatically — every character, place, and beat, with the source text behind each one.
When to use this
You don’t run extraction on purpose. It runs as soon as you upload a document. Read this page when you want to understand what PlotLens found, fix something it got wrong, or decide whether to re-run it on a manuscript you’ve revised.
If a character is missing, a place is duplicated, or a name was mis-tagged, this is where you fix it.
What PlotLens extracts
PlotLens recognizes six entity types. Every extracted record is one of these:
- Characters — people, beings, or anything with agency. Captures traits, role (protagonist, antagonist, supporting, minor, mentioned), and status (alive, deceased, unknown, transformed, missing). Example: Eilan, “the Stranger from Chapter 3.”
- Locations — physical places. Captures location type (continent, country, region, city, town, village, building, room, landmark, natural feature) and parent location. Example: Highmoor Keep (building) inside the Northern Reach (region).
- Events — significant happenings: battles, ceremonies, journeys, deaths, discoveries, plot beats. Captures participants, location, outcome, and timestamp. Example: the Siege of Caer Tarn.
- Items — weapons, artifacts, and objects with narrative weight. Example: the Splintered Crown.
- Organizations — kingdoms, factions, agencies, companies, families treated as a collective. Example: House Vorel, the Merchant Guild.
- Concepts — magic systems, prophecies, laws, lore. Example: the Threefold Binding, the Treaty of Ashford.
Anything outside these six types is ignored. If you want to track something that doesn’t fit (a recurring metaphor, a stylistic note), use a canon rule instead.
How extraction works
Extraction runs as a layered pipeline, not a single AI call:
- Rule-based pass — gazetteers and patterns catch known proper nouns and structured names quickly and cheaply.
- Named entity recognition — a spaCy NER model adds high-precision matches for people, places, and organizations.
- LLM pass — for everything the first two layers miss (and for narrative-aware judgments like “is this an event?”), an LLM reads each chunk in context.
After extraction, PlotLens deduplicates results, merges obvious aliases (Jon → Jonathan), and stores the supporting text spans (called attestations) so every entity links back to where it came from.
The whole pipeline runs in the background. You’ll see progress on the document while it’s working; entities appear in the entity list as they’re confirmed.
Confidence scores and what they mean
Every extracted entity gets a confidence score from 0.0 to 1.0. The thresholds:
- 0.9–1.0 — Explicitly named with detailed context. Trust these by default.
- 0.7–0.9 — Named with surrounding context. Usually correct; worth a glance.
- 0.5–0.7 — Implied or only briefly mentioned. Review these before relying on them.
- Below 0.5 — Speculative. PlotLens flags these for the verification queue.
The confidence badge on each entity is color-coded so you can scan a list and find the ones that need your attention.
A second signal, diegetic status, tells you whether a claim is presented as fact, rumor, lie, dream, retcon, red herring, unreliable POV, or speculation within the story. This is separate from extraction confidence — a rumor can be extracted with very high confidence and still be a rumor.
How to review and edit entities
- Open your project and click Entities in the sidebar.
- Filter by type (Characters, Locations, etc.) or by review status to find what you want.
- Click an entity to open its detail drawer.
- Edit the name, type, description, aliases, or type-specific attributes inline.
- Use the Attestations tab to see the source text behind every claim.
- Save. Your edits override the extracted values and stay through any future re-runs.
You can also create entities by hand from the entity list if PlotLens missed one. Manually-created entities are marked accordingly and behave the same way as extracted ones.
Merge duplicates
When PlotLens finds two entities that are likely the same — say “Captain Vell” and “Vell” — it surfaces a merge suggestion.
- Open the entity list and look for the conflict indicator on duplicate rows.
- Click Review merge on the suggestion.
- Compare the two records side-by-side. The one with stronger attestations is proposed as the survivor.
- Click Confirm merge to combine them, or Keep separate to dismiss.
Merges are reversible. Open the merged entity, find the History tab, and click Undo merge to split them back apart.
PlotLens auto-merges duplicates that match at 0.9 confidence or higher. Anything below that threshold goes to you for review.
Re-run extraction
Re-extract a document when you’ve revised it heavily, or when you’ve upgraded your project settings and want fresh results.
- Open the document detail page.
- Click Re-extract entities.
- Confirm. Existing mentions and embeddings for that document are cleared and rebuilt.
Re-running extraction does not delete your manual edits to entity records. It rebuilds the source links underneath them.
If extraction fails (rate limit, provider outage, malformed document), the document is marked Extraction failed with an error message. Click Re-extract to retry — PlotLens automatically retries up to five times before surfacing the failure.
Plan availability
Entity extraction is available on every plan. What changes is the cap on how many entities one project can hold:
| Plan | Entities per project |
|---|---|
| Free | 50 |
| Lite | 300 |
| Plus | 1,500 |
| Pro | Unlimited |
| Small Team | Unlimited |
| Studio | Unlimited |
| Production | Unlimited |
| Enterprise | Unlimited |
Hit the cap and new extractions stop saving entities until you delete some, merge duplicates, or upgrade. PlotLens warns you when you’re close.
Extraction itself is not metered by validations or any monthly quota — only by the entity cap.
Limits & edge cases
- Extraction failed — Document is marked failed after five retries. Open the document and click Re-extract to try again.
- Low-confidence backlog — Entities below 0.5 land in the verification queue. Confirm or delete them in batches from the entity list.
- Coreference gaps — If pronoun resolution fails on a chunk, mentions are still extracted but won’t be grouped into a single character. You can merge them manually.
- Hallucinated entities — A blocklist filters out junk like bare articles (“The”), stop words, and acronyms mis-tagged as characters. If you still see one, delete it; it won’t come back on re-extraction.
- Hitting the entity cap — New extractions stop adding entities to the project. Merge duplicates, delete unused records, or upgrade your plan.
- Fair-use throttling — Extreme extraction volume can trigger fair-use limits and a 429 response. Wait or contact support if you hit this on legitimate work.
Common pitfalls
- Reviewing every entity by hand. Don’t. Filter to the under-0.7 confidence band and start there. The 0.9+ entities almost never need editing.
- Deleting duplicates instead of merging them. Merging preserves attestations from both records. Deleting throws them away.
- Re-extracting before fixing edits. Manual edits survive re-extraction, but mentions are rebuilt — so attestations may shift. Re-extract first, then edit.
- Treating diegetic status as a confidence score. A rumor isn’t a low-confidence extraction; it’s a high-confidence extraction of a rumor. Use the right signal.
- Expecting non-canonical concepts to extract. PlotLens only tracks the six types. Stylistic notes and reader-facing metadata belong in canon rules.