I'm always excited to take on new projects and collaborate with innovative minds.

Social

Back to Blog
SEO February 5, 2026 14 min read

Programmatic GEO Pages at Scale Using Search Intent Clustering_

How to build scalable local SEO pages using AI validation and intent clustering—without spam penalties

Author

Akshay Dahiya

Growth & MarTech Specialist

For years, "local SEO at scale" has meant one thing: Duplicate a service page, swap the city name, repeat 500 times.

That approach worked—until it didn't.

Today, AI-powered search engines, spam classifiers, and quality systems can detect thin GEO pages instantly. Worse, they don't just ignore them—they penalize entire site sections through intent dilution and internal cannibalization.

The next generation of GEO expansion isn't about cities. It's about search intent clusters, semantic differentiation, and AI validation.

This article walks through:
  • Why traditional programmatic GEO pages fail
  • How to use Google Search Console data to discover real local intent
  • How to cluster keywords using TF-IDF + KMeans
  • How to score cannibalization risk before publishing
  • How to inject GEO context into templates without duplication
  • How to validate pages using AI before they go live

This is not beginner SEO. This is programmatic search engineering.

Why "City + Service" Pages Are Dead

Classic GEO strategy assumes:

  • "Plumber in Berlin"
  • "Plumber in Munich"
  • "Plumber in Hamburg"

…are distinct intents.

They are not.

From a modern search system's perspective, these pages:

  • Share identical entity sets
  • Answer the same user intent
  • Compete with each other internally
  • Add no new informational value

AI search engines interpret this as scaled content abuse, even when done "manually."

The result:

  • Pages index but never rank
  • Rankings fluctuate wildly
  • Crawl budget is wasted
  • Core pages lose authority

The fix is not better writing. The fix is intent separation.

GEO Expansion Starts With Intent, Not Location

A city is not an intent. An intent is a problem + context + expectation.

Example (local SEO):

  • "emergency plumber near me"
  • "licensed commercial plumber berlin"
  • "plumbing inspection for rental property"
  • "after-hours plumbing service"

These are distinct intents, even within the same city.

Modern GEO strategy answers this question first: What different local intents exist—and which locations express them differently?

To answer that, we start with real query data, not keyword tools.

Step 1: Extract GEO Queries From Google Search Console

Your best GEO dataset already exists inside Google Search Console.

From GSC, export:

  • Queries
  • Pages
  • Clicks
  • Impressions
  • Country / city (if available)

What you're looking for:

  • Queries that include location modifiers
  • Queries that imply local intent without naming a city
  • Queries triggering multiple pages (early cannibalization signal)

This dataset becomes the foundation for clustering.

Step 2: Clean and Normalize GEO Keywords

Before clustering, normalize your data:

  • Remove brand terms
  • Standardize city/state names
  • Strip stopwords ("best," "top," "cheap" unless intent-defining)
  • Keep service + context phrases intact

Bad normalization destroys intent signals. Be conservative.

Step 3: Vectorize Queries Using TF-IDF

Now we convert queries into vectors based on semantic weight, not frequency.

Using scikit-learn:

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(
    ngram_range=(1,3),
    min_df=2,
    stop_words='english'
)

vectors = vectorizer.fit_transform(queries)

Why TF-IDF?

  • It downweights generic terms ("service," "company")
  • It emphasizes differentiating phrases ("emergency," "inspection," "commercial")
  • It works well with short search queries

At this stage, each query is a mathematical representation of intent.

Step 4: Cluster Search Intent Using KMeans

Now comes the core step: intent clustering.

from sklearn.cluster import KMeans

kmeans = KMeans(
    n_clusters=8,
    random_state=42,
    n_init='auto'
)

clusters = kmeans.fit_predict(vectors)

Each cluster represents:

  • A distinct local intent
  • Expressed across multiple cities or regions
  • Often hidden when looking at keywords individually

Example clusters might resolve into:

  • Emergency services
  • Commercial services
  • Compliance/inspection
  • Pricing/comparison
  • Same-day availability
  • Residential maintenance

These clusters—not cities—become your page types.

Step 5: Interpret Clusters (This Is Where Most Fail)

Clustering is useless without interpretation.

For each cluster:

  • List top-weighted terms
  • Review representative queries
  • Identify the intent narrative

Example:

Cluster keywords:

  • "after hours"
  • "emergency"
  • "24/7"
  • "urgent"

Intent:

Immediate-response service with high urgency and low price sensitivity.

This cluster deserves:

  • Its own page template
  • Its own conversion messaging
  • Its own internal linking logic

Step 6: Cannibalization Risk Scoring (Before Publishing)

This is the step most SEO teams skip—and pay for later.

Before generating pages, score how similar each new page would be to existing ones.

Concept: If two pages target overlapping entity sets and intent vectors, they will cannibalize.

Simple similarity scoring approach:

from sklearn.metrics.pairwise import cosine_similarity

def cannibalization_score(vec_a, vec_b):
    return cosine_similarity(vec_a, vec_b)[0][0]

High similarity = high risk.

You should:

  • Set a similarity threshold (e.g., >0.75 = do not publish)
  • Merge or differentiate content before launch
  • Prevent index bloat proactively

This alone can save months of cleanup.

Step 7: Programmatic Template Injection (Without Duplication)

Now we generate pages—but not by swapping city names.

Each page template includes:

  • Intent-specific core content
  • Location-specific context blocks
  • Unique entity relationships

Example logic:

def generate_page(intent, location, data):
    return f"""
    <h1>{intent['title']} in {location}</h1>

    <p>{intent['core_description']}</p>

    <p>In {location}, this service is commonly required for
    {data['local_context']}.</p>

    <ul>
        {''.join(f"<li>{fact}</li>" for fact in data['unique_facts'])}
    </ul>
    """

Key rule: The intent stays constant. The context changes.

This creates semantic differentiation, not duplication.

Step 8: AI Validation Before Indexing

Before publishing at scale, validate pages using AI.

You are checking:

  • Does this page answer a distinct question?
  • Is it semantically redundant with another page?
  • Would an AI search engine cite this as a unique source?

Common validation prompts:

  • "What is the primary intent of this page?"
  • "Is this content meaningfully different from X?"
  • "Summarize the unique value in one sentence."

Pages that fail validation:

  • Don't get indexed
  • Get merged
  • Get restructured

This step turns scale from a liability into an advantage.

Why This Works With AI Search Engines

AI-driven search systems evaluate:

  • Intent clarity
  • Entity differentiation
  • Coverage depth

By clustering intents first:

  • You align with how AI understands search
  • You reduce internal competition
  • You create pages that map cleanly to answer synthesis

This is why fewer, better GEO pages now outperform thousands of thin ones.

The New GEO Playbook

Old GEO SEO:
  • City-based duplication
  • Keyword volume obsession
  • Reactive cleanup
Modern GEO SEO:
  • Intent clustering
  • Pre-launch risk scoring
  • AI-assisted validation
  • Entity-driven differentiation

The output isn't "more pages." It's more relevance per page.

Key Takeaways
  • Traditional "city + service" GEO pages are detected as spam by AI search engines
  • Modern GEO strategy starts with intent clustering, not location lists
  • Use TF-IDF and KMeans to discover distinct local intents in GSC data
  • Score cannibalization risk before publishing using cosine similarity
  • Generate templates with intent-specific content + location-specific context
  • Validate pages with AI to ensure semantic differentiation before indexing

Final Thoughts: Scale Without Spam

Programmatic SEO isn't dead. Unintelligent scale is.

The teams winning local and GEO search today:

  • Think like data scientists
  • Build like engineers
  • Validate like editors
  • Optimize for AI systems—not just rankings

If your GEO strategy starts with: "How many cities can we cover?"

You're already behind.

The right question is:

"How many distinct local intents can we own—without overlap?"

Answer that, and scale becomes safe again.

Author
Akshay Dahiya

Growth & MarTech Specialist

Digital marketing professional with 7+ years of experience in SEO, analytics, and marketing automation. Founder of MarAI and passionate about building tools that solve real marketing problems.