Before we get into my contributions, here is a link to the paper that my mentor/adviser made on the original iteration of this project.
Intro / Goals
The VISION+ project uses a large language model to pull out important concepts from patent text and builds interactive network visualizations so you can see how patents relate through 4 key lenses: components, behaviors, engineering principles, and functions. The version I worked on (VISION+AI) takes the existing pipeline and ports it to run on Ollama with Gemma locally. The goal was to keep the same analytical “lenses” and visualizations while moving the brain of the system to a local only model. It also adds the option to point the script at a remote Ollama instance (e.g., over Tailscale) when I want to run analysis from a laptop against a stronger machine. This write-up summarizes what the tool does and the changes I made to get there.
What VISION Does: Four Lenses and a Graph
The system treats each patent as a bag of concepts and then connects patents to each other through the four analytical “lenses”:
| Lens | Focus | Example terms |
|---|---|---|
| Component | Physical parts, tangible objects | gear, cable, sensor, motor, housing, spring |
| Behavior | Properties and characteristics | elastic, conductive, thermal, rigid, flexible |
| Engineering principle | Core technical concepts | efficiency, stability, resonance, friction, leverage |
| Function | Actions and operations | rotate, connect, transmit, filter, control |
A local LLM (Gemma, via Ollama) reads patent text and, for each lens, returns a short list of terms plus importance weights (0.0–1.0). Those are stored in SQLite; a separate step builds force-directed graphs (D3.js) that link keywords to patents. So you get four interactive visualizations—one per lens—with filtering by keyword and CPC-style group, search, zoom, and double-click to highlight neighbors.
Changes Overview
I took on VISION when the analysis pipeline was still driven by non-negative matrix factorization (NMF)—i.e. no LLM or other neural/AI component. All of the AI-based analysis in this project is work I added: the move from NMF to LLM-powered keyword extraction, the Claude-based version (VISION+CLAUDE), and the local Ollama/Gemma version (VISION+AI), including the prompts, response parsing, and integration with the existing database and visualizations. The four-lens framework and the visualization stack were already part of VISION; the shift to using an LLM to populate those lenses and the design of that pipeline are mine.From Claude to Gemma
Replacing the LLM backend
-
New class:
OllamaPatentAnalyzer -
Talks to Ollama’s
/api/generate(prompt + model name,stream: falsefor a single response). -
On startup, hits
/api/tagsto check that the chosen model (e.g.gemma3:12b) is present and prints clear errors if the server is down or the model is missing. -
Uses a 6,000-character window of patent text per request (smaller than the 8,000 used for Claude) to stay within local model context limits.
-
Prompts tuned for Gemma
Gemma tends to do better with very explicit, task-style instructions. I rewrote the per-lens prompts into a strict “Task / Instructions / Example output” format and added a hard constraint that terms must be one word to keep parsing reliable. Each prompt spells out the expected line format: term score with score in [0.0, 1.0].
- Robust model output parsing
Local models don’t always stick to the format. I added parse_llm_response to more proactively skip lines that look like task text, separators, or bullet/numbered lists. Duplicates are removed and the list is capped at 15 terms per lens so the rest of the pipeline stays consistent.
Remote Ollama and Tailscale
-
I didn’t want to be tied to running the script only on the machine that runs Ollama. So the script accepts
--ollama-urland a convenience flag--tailscale-ip. If you pass--tailscale-ip 100.x.x.x, it sets the Ollama base URL tohttp://100.x.x.x:11434. That way I can run the analysis on a laptop (or other thin client) while Ollama and the heavy model sit on a desktop or server on the same Tailscale network. The README and in-code comments mention this as an option.
CLI and docs
-
The argument parser already supported
--model,--batch-size,--visualize,--lens,--top-keywords,--top-patents, etc. I added--ollama-urland--tailscale-ip, and wired them through so the analyzer uses the chosen base URL. -
I wrote the VISION+AI README (in
VISION+AI/README): environment setup (venv, pip), Ollama install andollama pull gemma2:9b(or gemma3:12b), patent file layout (patents/*_all.txt), the three-phase workflow (import → analyze → visualize), full CLI reference, design notes (local-only data, modular components, incremental processing), usage examples, and troubleshooting (connection, memory, speed, visualization size). The rootREADME.mdwas updated to point to the current version (VISION+AI) and to the older variants (MatLab, LDA, CLAUDE) in OldVersions.
Tech Snapshot
| Layer | Choice |
|---|---|
| Language / runtime | Python 3 (venv), argparse for CLI |
| Database | SQLite (patents, patent_keywords, visualizations) |
| LLM | Ollama + Gemma (default gemma3:12b), configurable URL |
| Visualization | D3.js (force-directed graph), jQuery UI (search/filters), HTML/JSON output |
| Data flow | patents/*_all.txt → import → analyze (Ollama) → visualize → output/*.html + *.json |
Design Choices
- Local-first
All patent text and extracted keywords stay on your machine (or your Tailscale network). No cloud API and no API keys for the main analysis path. The trade-off is runtime, but it also provides developers and users with greater individual control.
- Modular layout
Database, importer, analyzer, and visualization are separate classes. Swapping Claude for Ollama only touched the analyzer and the CLI that constructs it; the rest of the pipeline stayed the same.
- Incremental runs
The DB tracks which patents are already analyzed. You can add new *_all.txt files, re-run import and analyze, and only new patents get sent to the LLM; then you regenerate visualizations as needed.
Workflow / Outcome
In practice I run: (1) --import once per collection, (2) --analyze with a batch size that fits my RAM (and optionally --tailscale-ip when I’m not on the Ollama host), then (3) --visualize to refresh the four lens graphs. The interactive HTML lets me filter by keyword and CPC-style group, search for a patent or term, and double-click a node to see its neighborhood—and shift-click still opens the patent on Google Patents. So the outcome is a local, privacy-preserving version of the same high-level workflow, with the flexibility to run the script anywhere and point it at a remote Ollama instance when that’s useful.
Miscellaneous / Addendum
- Repo layout
The current version lives in VISION+AI (Ollama/Gemma). OldVersions holds the original MatLab app, VISION+ (LDA), and VISION+CLAUDE (Anthropic). The root README summarizes this.
- Attribution
The four-lens framework, database schema, and D3-based visualization design come from the existing VISION project. My contributions are the Ollama/Gemma backend, the Tailscale-friendly remote URL option, the stricter prompts and parsing for local models, and the VISION+AI README and root README update.