About four months ago, I discovered Thu Vu’s video on extracting knowledge graphs from text using GPT-4. The concept was transformative, watching minimal code convert unstructured meeting transcriptions and podcast content into visual structured knowledge networks that made complex information instantly clearer.
Think of knowledge graphs as a way to represent information the way your brain naturally organizes it, through connections and relationships. Instead of reading through pages of text to understand how concepts relate, you see a visual map where everything is connected.
“from seeing the world as a machine to understanding it as a network”.
This principle applies perfectly to knowledge management. Understanding information isn’t about memorizing facts, it’s about seeing how they connect.
📖 From text to structure: A simple example
Consider this sentence: “Sarah works at TechCorp in San Francisco and reports to Michael, who is the VP of Engineering.”
When you read this, your brain automatically extracts:
- Entities: Sarah (person), TechCorp (company), San Francisco (location), Michael (person)
- Relationships: Sarah WORKS_AT TechCorp, TechCorp LOCATED_IN San Francisco, Sarah REPORTS_TO Michael, Michael HAS_ROLE VP of Engineering
A knowledge graph captures this structure explicitly, making it:
- Searchable: Find all people who work at TechCorp
- Analyzable: Discover reporting hierarchies
- Connectable: Link information across multiple documents
This becomes exponentially more valuable as your text corpus grows. What’s overwhelming as unstructured text becomes navigable as a connected graph.
🎯 Why knowledge graphs solve real problems
Traditional text processing treats documents as isolated blocks of information. Knowledge graphs transform this by making connections first-class citizens:
For meeting notes and transcriptions:
Instead of searching through hours of meeting notes to remember “who mentioned that project deadline,” you can query the graph: “Show me all deadlines mentioned by Sarah in Q1 meetings.” The relationships are already extracted and structured.
For research and learning:
When watching educational content or reading technical documentation, knowledge graphs automatically build a concept map. You can see how ideas connect, which concepts are central, and what you might be missing.
For organizational knowledge:
Companies have information scattered across documents, wikis, and conversations. Knowledge graphs unify this by extracting entities and relationships regardless of where they appear, creating a living map of organizational knowledge.
🎥 Building on proven approaches
Thu Vu’s implementation demonstrates knowledge graph extraction using GPT-4 and LangChain’s LLMGraphTransformer. The video provides valuable context for understanding how high-level abstractions simplify the extraction process through framework integration.
Watching this implementation in action reveals both the power of abstraction layers and the questions they leave unanswered. While the code is concise and effective, understanding what happens beneath those abstractions becomes important when you need to:
- Debug unexpected extraction results
- Optimize prompts for specific domains
- Adapt the system for different LLM providers
- Control costs at scale
This article explores building a similar system from the ground up, focusing on Gemini’s structured output capabilities. The goal isn’t to replace framework-based approaches, but to understand the fundamental mechanisms that make knowledge graph extraction work.
🔄 The extraction challenge
Here’s the fundamental problem: knowledge is typically created and stored as natural language (documents, transcripts, articles), but it’s most useful when structured as a graph.
Traditionally, building knowledge graphs required:
- Manual annotation by domain experts
- Complex rule-based extraction systems
- Expensive natural language processing pipelines
- Continuous maintenance as information changes
Modern LLMs changed this equation. They can read text and extract structured information with surprising accuracy, understanding context, resolving ambiguities, and identifying relationships that traditional NLP systems would miss.
This is what made that initial discovery so compelling: sophisticated knowledge extraction became accessible through relatively straightforward code.
“Data isn’t information. Information, unlike data, is useful. While there’s a gulf between data and information, there’s a wide ocean between information and knowledge. What turns the gears in our brains isn’t information, but ideas, inventions, and inspiration”.
LLMs don’t just generate text, they can impose structure on chaos, turning massive text corpora into navigable knowledge networks.
🛠️ Understanding knowledge graphs through direct implementation
While tools like LangChain’s LLMGraphTransformer make this process accessible, working directly with the underlying mechanisms provides deeper insight into how structured extraction actually functions. This understanding becomes valuable when you need to debug extraction issues, optimize prompt engineering, or adapt the system for specific use cases.
Modern LLM APIs now offer structured output capabilities that make knowledge graph extraction more straightforward than you might expect. Understanding these fundamentals helps you make informed decisions about when to use abstraction layers and when to work directly with the APIs.
❓ Why build a focused implementation?
Working with high-level abstractions is efficient for production systems, but there are specific situations where understanding the underlying mechanisms becomes essential:
💡 It reveals the core patterns
When you strip away multi-provider support and framework overhead, the fundamental logic of knowledge graph extraction becomes clear. You can see exactly how prompt engineering drives the quality of extracted entities and relationships.
🔍 It enables precise control
Direct API access lets you fine-tune every aspect of the extraction process. You can iterate on prompt design, adjust validation rules, and optimize for your specific domain without navigating through abstraction layers.
⚡ It builds transferable knowledge
Understanding how structured output works at the API level prepares you to work with any LLM provider. The principles remain consistent even as tools and frameworks evolve.
🛠️ Core architecture: Four focused layers
📝 Extraction layer: Where structure meets language
Modern LLMs with structured output support can accept schema definitions directly. This eliminates manual JSON parsing and validation:
class GeminiExtractor:
    def __init__(self, api_key: str, model_name: str):
        genai.configure(api_key=api_key)
        self.model = genai.GenerativeModel(
            model_name=model_name,
            generation_config={
                "response_mime_type": "application/json",
                "response_schema": KnowledgeGraph  # Pydantic model
            },
            system_instruction=SYSTEM_PROMPT
        )
The prompt engineering component requires careful consideration:
Key requirements for consistent extraction:
- Node identifiers should be semantic (elon_musk rather than person_1)
- Relationship types should follow clear conventions (WORKS_AT not works_at)
- Coreference resolution rules must be explicit (ensuring “John” and “he” map to the same entity)
- These constraints significantly improve debuggability and graph quality. Generic numeric IDs make troubleshooting extraction issues challenging.
🗂️ Data models: Defining the structure
Clear data models establish the contract between the LLM and your application:
class Node(BaseModel):
    id: str  # Semantic identifier
    label: str  # Human-readable name
    type: str  # Entity category
class Relationship(BaseModel):
    id: str
    type: str  # Relationship category
    source_node_id: str
    target_node_id: str
class KnowledgeGraph(BaseModel):
    nodes: list[Node]
    relationships: list[Relationship]
Built-in deduplication at the model level handles cases where the LLM generates redundant relationships, a common occurrence that needs systematic handling.
🎨 Visualization layer: Making graphs explorable
Effective visualization transforms raw graph data into actionable insight:
Consistency through color-coding: Using deterministic color assignment (hashing node types to a color palette) ensures that entity types maintain consistent visual representation across different graphs. This consistency helps users build mental models quickly.
Interactive physics simulation: Graph layout algorithms like ForceAtlas2 create natural-feeling visualizations where connected nodes cluster together and isolated nodes spread apart. This spatial organization often reveals patterns not obvious in the raw data.

Dual rendering modes: Supporting both HTML string generation and file output enables flexible integration - whether embedding in web applications or generating standalone visualizations.
💻 Interface layer: Streamlining the workflow
A clear interface flow reduces friction:
- Configuration (API credentials)
- Input submission
- Multi-view results:- Visual graph exploration
- Raw data inspection
- Statistical overview
 

💰 Cost considerations for production use
Understanding the economics helps with scaling decisions:
Gemini Flash 2.5 Lite pricing structure:
- Input: $0.075 per 1M tokens
- Output: $0.30 per 1M tokens
Practical example: A 500-word article extraction:
- ~600 input tokens
- ~300 output tokens
- Cost: ~$0.0001 per extraction
This cost structure makes knowledge graph extraction viable for high-volume applications and real-time processing scenarios.
🚀 Getting started: Minimal setup
Get the source code
# Clone repository without downloading all files
git clone --depth 1 --filter=blob:none --sparse git@github.com:jebucaro/blog-code.git
# Navigate to repository
cd blog-code
# Download only the folder you need
git sparse-checkout set python/2025-10-create-a-knowledge-graph-from-text-with-gemini
Configure the project
# Environment configuration
echo "GEMINI_API_KEY=your_key_here" > .env
uv sync
Launch the application with uv
# Launch application
uv run streamlit run src/nodus/main.py
Launch the application with Docker
# 1. Ensure Docker Desktop is running, then build the image
docker build -t nodus:latest .
# 2. Run with environment variable
docker run -p 8501:8501 -e GEMINI_API_KEY=your_key_here nodus:latest
# 3. Or use .env file
docker run -p 8501:8501 --env-file .env nodus:latest
# 4. Access the app at http://localhost:8501
Results
 
 

🎯 What you can build from here
This implementation gives you the foundation, but the possibilities extend far beyond basic extraction:
Immediate enhancements:
- Connect to document databases for automatic knowledge extraction
- Build query interfaces to explore your graphs conversationally
- Implement graph merging to combine knowledge from multiple sources
- Add temporal tracking to see how knowledge evolves over time
Real-world applications:
- Personal knowledge management from articles, videos, and notes
- Research literature mapping and gap analysis
- Organizational memory systems
- Automated documentation from code and conversations
🔗 Explore the code
The complete implementation is available on GitHub. The codebase is intentionally minimal and well-documented, perfect for learning and adaptation.
Key files to explore:
- extractor.py- Core extraction logic and prompt engineering
- visualizer.py- Graph rendering and layout algorithms
- app.py- Streamlit interface and workflow orchestration
💬 What will you extract?
Knowledge graphs transform how we interact with information. Whether you’re managing research, organizing meeting notes, or building AI applications, structured knowledge extraction opens new possibilities.
What’s the first text you’ll convert into a knowledge graph? Share your use case or questions in LinkedIn.


