IEXDG Nexus Infrastructure

GHL Nexus — Continuous Crawl System

Always-on pipeline that pulls her HighLevel state into the IEXDG RAG, dashboards it, and feeds it to every downstream content/automation decision. Hourly shallow + daily deep, auto-ingested, zero manual steps once configured.

Built Apr 13, 2026 · Robert Dove · 31 memory files · 10,438 RAG chunks
1,952
Contacts
27
Pipelines
109
Workflows
23
Calendars
329
Tags
184
Contact CF
57
Forms
7
Users
0
Active Opps
across all pipelines
0
Bookings
next 60 days
0
Conversations
SMS/chat inactive
10,438
RAG Chunks
Architecture

🛰️ The crawl → ingest → query pipeline

[ GHL services.leadconnectorhq.com API ] | v ┌─────────────────────────────────────────────────────┐ │ HOURLY SHALLOW CRAWL (ghl_nexus_crawl.py) │ │ Windows Task: IEXDG_GHL_Nexus_Crawl │ │ Cadence: every 2 hours │ │ Pulls: location, users, pipelines, calendars, │ │ workflows, tags, custom_fields, forms, │ │ social_accounts, social_posts, 500 contacts │ │ Output: nexus_crawls/iexdg_ghl_YYYYMMDD_HHMMSS.json │ │ nexus_crawls/iexdg_ghl_latest.json │ │ nexus_crawls/iexdg_ghl_latest_summary.md │ └─────────────────────────────────────────────────────┘ | v ┌─────────────────────────────────────────────────────┐ │ DAILY DEEP CRAWL (ghl_nexus_deep_crawl.py) │ │ Windows Task: IEXDG_GHL_Deep_Crawl │ │ Cadence: daily 3:00 AM │ │ Pulls: ALL contacts (paginated), workflow details, │ │ opportunities per pipeline, calendar │ │ appointments, form submissions, │ │ conversations, campaigns, email templates │ │ Output: nexus_crawls/iexdg_ghl_deep_*.json │ │ nexus_crawls/iexdg_ghl_deep_latest.json │ │ nexus_crawls/nexus_*_YYYYMMDD_HHMMSS.md │ └─────────────────────────────────────────────────────┘ | v ┌─────────────────────────────────────────────────────┐ │ RAG INGEST │ │ Each section emits a markdown summary │ │ subprocess call to IEXDG/rag/ingest_session.py │ │ type=operations, auto-chunks + embeds │ │ Current: 10,438 chunks / 9.39 MB │ └─────────────────────────────────────────────────────┘ | v ┌─────────────────────────────────────────────────────┐ │ DOWNSTREAM CONSUMERS │ │ - IEXDG_Nexus_Dashboard.html (live-state UI) │ │ - daily_content_drop.py (context-aware content) │ │ - tactic_picker.py (goal → tactic match) │ │ - dnicole_visual_sentinel.py (brand rules) │ │ - MCP server (future: Agent Studio tool) │ └─────────────────────────────────────────────────────┘
Schedule

⏰ Windows Scheduled Tasks — registered Apr 13

TaskScriptCadenceState
IEXDG_GHL_Nexus_Crawlghl_nexus_crawl.pyEvery 2 hours, starting :05 past the hourRUNNING
IEXDG_GHL_Deep_Crawlghl_nexus_deep_crawl.pyDaily 3:00 AM ETREADY — next run Apr 14 3:00 AM
Why this cadence

Shallow every 2 hours = fast drift detection (new contacts, new tags, new social posts scheduled) without rate-limiting the API. Deep daily = full pagination of 1,952 contacts + 109 workflow bodies + pipeline opportunities, which takes ~15 minutes and would be wasteful to run hourly.

Capture

📦 Everything the crawl captures

Hourly shallow

  • Location profile + business info
  • 7 users + roles + scopes
  • 27 pipelines (names + stages)
  • 23 calendars + groups
  • 109 workflows (id + name + status)
  • 329 tags (id + name)
  • 184 contact custom fields
  • 13 opportunity custom fields
  • 57 forms (basic metadata)
  • 6 social accounts + expiry state
  • 50 most recent social posts
  • First 500 contacts (sample)

Daily deep

  • ALL 1,952 contacts (paginated)
  • 109 workflow FULL bodies (triggers + actions)
  • 27 pipelines × opportunities per stage
  • 10 calendars × upcoming 60-day events
  • 30 forms × last 10 submissions each
  • Last 50 conversations + messages
  • Active campaigns
  • Email templates
  • Notes per-contact (first 50)
  • Tasks per-contact (first 50)
Findings

🚨 What the first full crawl revealed

Critical insight

She has the infrastructure of a 7-figure consultancy — 1,952 contacts, 27 pipelines, 109 workflows, 329 tags, 23 calendars, 184 contact custom fields, 57 forms, 6 social accounts — but the engine isn't running: zero active opportunities, zero upcoming bookings in the next 60 days, zero conversations (SMS/chat).

Pipeline inventory (27 named)

Some active and strategic, many appear stale. Surfaced names include PGCOC Coffee Connection 2025, Sales Pipeline, Scorecard Leads, Standard Seminars & Workshops, Women Consulting Corporate Live! 2024, YOUR PROMOTION Pipeline, 7-FigurED — plus 20 more. Consolidation to 3-5 (Pulse → Reset → Transformation → Speaking → Custom) is the high-leverage move.

Workflow inventory (109)

Mixed published/draft/archived. The Culture Pulse tier flows are present but need verification against Workflow AI Builder rebuild. iexdg_weaponization_audit.py (in build queue) will map active vs zombie per workflow.

Calendar inventory (23, zero upcoming)

Start with Clarity Strategy Call, IEXDG Book Discovery Call, IEXDG Book Custom Series Planning, City of Tucson Leadership & Employee, Booking Request, IEXDG Book Coaching Track, Chat with Dr. DNicole, + 16 more. Zero bookings across all of them in the next 60 days = traffic isn't reaching booking flow, or bookings are happening outside GHL.

Social posting cadence

Only 10 posts ever published. Last published: Jan 30, 2026. 73 days of silence. As of Apr 13 there are 28 DRAFT posts queued across LinkedIn, Facebook, Instagram, and Google Business Profile — all with Apr 13 Visual Standard-compliant images, awaiting her review.

Flow

🔁 Every crawl cycle — step by step

1Windows Task triggers
Windows Task Scheduler fires ghl_nexus_crawl.py or ghl_nexus_deep_crawl.py against the registered cadence. PIT Token pit-8e4c1579… authenticates as IEXDG location user.
2API calls with 2s delay
Each request includes 1-2s sleep to respect rate limits. 429 response triggers 5s × attempt exponential backoff up to 3 retries. Prevents the same ban pattern that cost BSP and BB multiple hours of troubleshooting.
3Structured JSON saved
Every run writes a timestamped JSON: nexus_crawls/iexdg_ghl_YYYYMMDD_HHMMSS.json. Plus the "latest" symlink overwrites iexdg_ghl_latest.json so downstream tools always know where to look.
4Markdown summary emitted
Each section (pipelines, workflows, calendars, forms, conversations, etc.) also emits a human-readable nexus_*_YYYYMMDD_HHMMSS.md file. These are what get chunked into the RAG.
5RAG ingestion
Subprocess call to IEXDG/rag/ingest_session.py --type operations --file PATH. Text gets chunked + embedded (sentence-transformers all-MiniLM-L6-v2, 384-dim). Stored in iexdg_knowledge.db.
6Dashboard auto-refresh
generate_nexus_dashboard.py reads iexdg_ghl_latest.json and emits STRATEGY/IEXDG_Nexus_Dashboard.html — live UI with metrics grid, pipelines table, workflow status, social accounts, tag cloud.
Files

📂 Every file in the crawl system

Scripts

ScriptPurpose
TOOLS/automation_scripts/ghl_nexus_crawl.pyHourly shallow crawl. 13 sections captured.
TOOLS/automation_scripts/ghl_nexus_deep_crawl.pyDaily deep crawl. 8 intensive passes.
TOOLS/automation_scripts/ghl_pull_messages_notes_tasks.pyConversation messages, per-contact notes/tasks (run on-demand).
TOOLS/automation_scripts/generate_nexus_dashboard.pyRenders IEXDG_Nexus_Dashboard.html from latest snapshot.
TOOLS/automation_scripts/schedule_tasks.ps1PowerShell that registers the 2 Windows Scheduled Tasks.
TOOLS/automation_scripts/setup_ghl_hourly_crawl.batAlternative CMD wrapper (when PS blocked).

Output locations

PathWhat
TOOLS/nexus_crawls/iexdg_ghl_YYYYMMDD_HHMMSS.jsonEach shallow crawl, timestamped
TOOLS/nexus_crawls/iexdg_ghl_latest.jsonPointer to newest shallow crawl
TOOLS/nexus_crawls/iexdg_ghl_deep_YYYYMMDD_HHMMSS.jsonEach deep crawl, timestamped
TOOLS/nexus_crawls/iexdg_ghl_deep_latest.jsonPointer to newest deep crawl
TOOLS/nexus_crawls/iexdg_ghl_latest_summary.mdHuman-readable shallow summary
TOOLS/nexus_crawls/nexus_*_YYYYMMDD_HHMMSS.mdDeep crawl per-section markdown (what gets RAG-ingested)
STRATEGY/IEXDG_Nexus_Dashboard.htmlLive UI regenerated from latest shallow
IEXDG/rag/iexdg_knowledge.dbSQLite RAG database with embeddings

Memory files (what the Nexus remembers)

FileContent
iexdg_apr13_breakthroughs.mdGHL unblock + HeyGen + Pip Decks wins
iexdg_apr13_deep_crawl_findings.md1,952 contacts, 0 engine running
iexdg_ghl_marketing_audit_apr13.md5 root causes of low marketing score
iexdg_apr13_autonomous_session.mdContinuous crawl architecture + what auto-runs
iexdg_apr13_timeline_and_bsp_lessons.mdBSP pattern mapping → IEXDG Nexus
iexdg_apr13_visual_fix_applied.md28 drafts regenerated per her Visual Standard
Lineage

🛰️ BSP Morpheus → IEXDG Nexus pattern inheritance

The IEXDG crawl system is built from proven BSP patterns. BSP Morpheus VM runs 398 scripts, 98 systemd timers, 165 databases. Every pattern below already ships in production on BSP and has been translated to IEXDG.

BSP PatternBSP ImplementationIEXDG Translation
Pulsenexus_pulse.py (3× daily unified state aggregator)Shallow crawl every 2 hours (same role)
Sentinelnexus_sentinel.py (30-min self-healing monitor)Planned: dnicole_visual_sentinel.py (already live for visuals)
Pattern Learnernexus_pattern_learner.py (weekly long-memory)Planned: dnicole_pattern_learner.py
Evolution Enginenexus_evolution.py (champion-challenger A/B)Planned: Tactic picker A/B across post variants
Guardian Hashesguardian_hashes.db (file integrity)Planned: Protect dnicole_brand_rules.json, pipdecks_knowledge.json
Weaponization Auditnexus_weaponization_audit.py (new Apr 13 on BSP)Planned: iexdg_weaponization_audit.py (109 workflows / 27 pipelines / 329 tags)
Named ClusterZeus (zeus-closure-bot, zeus-daniel-notify, zeus-health-check)DNicole cluster: dnicole-pulse, dnicole-visual-sentinel, dnicole-first-comment-bot, dnicole-ghl-crawler
Query

🔍 How to query the Nexus state

Via RAG semantic search

cd C:\Users\djbob\Documents\Belay\IEXDG\rag python search_kb.py "which pipelines have 0 opportunities" python search_kb.py "all workflows with 'welcome' in name" python search_kb.py "contacts tagged ate_nurture_complete"

Via JSON (exact data)

import json with open('TOOLS/nexus_crawls/iexdg_ghl_latest.json') as f: state = json.load(f) pipelines = state['pipelines']['data']['pipelines']

Via dashboard (visual)

Open STRATEGY/IEXDG_Nexus_Dashboard.html in a browser. Auto-refreshes with every scheduled run. Metrics grid + pipelines table + workflow status + social accounts + tag cloud.

Downstream

🍳 What the crawl feeds

Content Drop (nightly 2 AM)

  • Reads her 329 tags to understand sector distribution
  • Checks which workflows are active before referencing them
  • Surfaces pipeline stage names for content context
  • Pulls latest social post history to avoid repetition

Tactic Picker

  • Picks Pip Decks tactic per content goal + channel
  • Deduplicates across 28-day rolling window
  • Reads ELCC pillar mappings from RAG

Visual Sentinel

  • Reads dnicole_brand_rules.json SSOT
  • Validates every Ideogram/NanoBanana prompt pre-gen
  • Blocks sloppy images from ever reaching drafts

Dashboard / HTML Reports

  • Real-time GHL state for Robert
  • Marketing audit findings
  • Weekly progress briefs to Dr. DNicole

Planned: First Comment Bot

  • Reads post_id + publish timestamp from crawl
  • Fires 30-60 sec after publish
  • Picks 1 of 4 comment variants per her Apr 13 directive

Planned: Weaponization Audit

  • Audits 109 workflows → active vs zombie
  • Audits 27 pipelines → used vs stale
  • Audits 329 tags → living vs orphaned
  • Outputs a deletion/consolidation plan
Safety

🛡️ Rate-limit + scope hygiene

RuleWhy
2 sec minimum between API callsPrevents the IP-block pattern that cost us 4 months on the main GHL API (Cloudflare block Feb–Apr 13).
Max 20 req/minStandard cross-client policy from memory/operating_rules.md.
429 backoff: 5s × attemptExponential retry with 3 max attempts before giving up on that section.
Read-only by defaultPIT token scope: read everywhere, write ONLY on Social Planner posts (the one write operation needed today).
No destructive operationsNo DELETE, no PATCH on contacts, no write on Location profile (scope lacks locations.write anyway).
Log every runEvery crawl writes a timestamped JSON. Can be diffed against prior snapshots to detect drift.
Gaps

⚠️ Known gaps in the crawl

Next expansion pass

Add these endpoints to the deep crawl schedule. Target: 90%+ GHL state coverage in a single nightly pass.