Banking AI Document Assistant

Document Sources · IBM FileNet P8 (Production) · AWS S3 (POC)

📄 eStatements · 400 docs

⚖️ Dispute Cases · 250 docs

📢 Complaints · 200 docs

🔧 Account Maintenance · 150 docs

🔍 Digital PDF → pdfplumber

🖨️ Scanned PDF → AWS Textract OCR

1,000

Total Documents

~4,000

Text Chunks

256-d

Embedding Dims

<30s

New Doc → Queryable

~$0.10

Ingestion Cost

~$1/mo

Running Cost

↓

Real-Time Ingestion Pipeline — Single Run, Both Stores Kept in Sync

☁️ AWS S3

PDF uploaded with full metadata tags (32 fields)

→

🐍 ingest.py

Downloads PDF, extracts text, chunks 500 tokens, embeds via Titan

dual write

⇊

📦 ChromaDB

Vector chunks upserted · semantic content search · ~3,948 chunks · 256-dim embeddings

🗄️ ClickHouse Cloud · ReplacingMergeTree

Full metadata row upserted · aggregation engine · deduplicates automatically on re-run

Before: generate CSV → manually import to ClickHouse (one-off, goes stale) | Now: python rag/ingest.py → ChromaDB + ClickHouse updated together, every time, for every new document

↓

Two query tracks

✅ Live Now

Track 1 · Pre-POC — Aggregation via ClickHouse NL→SQL

PDF metadata extracted during upload
Customer ID, RM, Branch, Case Summary, Dates → 32 columns

Stored in ClickHouse Cloud
Columnar DB — optimised for billion-row analytics

Nova Lite converts NL → SQL
Schema-aware prompt → valid ClickHouse query

Nova Lite formats results
Raw rows → professional markdown answer with insights

🟢 ClickHouse Cloud

🤖 Nova Lite NL→SQL

Full dataset scan

No TOP_K limit

✓ Aggregation questions → ClickHouse

• "How many complaints raised each year?"
• "Which RM handled the most disputes?"
• "Total compensation paid per branch"
• "Breakdown of dispute types referred to Ombudsman"

query
router

⇄

🚀 Full POC

Track 2 · Full RAG — Content Search via ChromaDB

PDF text extracted & chunked
500-token chunks, 80-token overlap

Embedded via Bedrock Titan
256-dim vectors stored in ChromaDB (~3,948 chunks)

Semantic vector search
Top-K chunks retrieved + metadata filters applied

Nova Lite generates answer
Grounded in retrieved chunks, cites doc IDs

🔵 AWS Bedrock

🤖 Nova Lite LLM

📦 ChromaDB

🔗 LangChain

✓ Content questions → ChromaDB RAG

• "Summarise Mathew Little's complaint"
• "Why was dispute DSP00047 lost?"
• "What did the customer say in this case?"
• "Show high priority complaints from Leeds"

↓

Smart Query Router — Automatic Intent Detection

Every banker question is automatically classified by keyword intent before hitting any backend. No manual switching — the system picks the right engine every time.

how many · count · total · breakdown · per year · by branch · which RM · average · trend

↳ Any of these keywords → routed to ClickHouse NL→SQL (full dataset scan, no limit)

↳ All other questions → routed to ChromaDB semantic vector search (summarise, explain, what did, show case)

⚡ Why Two Engines?

ClickHouse — scans ALL 1,000+ docs, perfect for counts & trends.
ChromaDB — semantic search for what was said inside a specific case.
Neither alone is sufficient — together they cover every question type.

↓

Phase 1 · Chat Interface — Streamlit Community Cloud

✅ Live Now

💬 Natural Language Chat

Bankers type questions in plain English. No SQL, no file browsing, no training required. Responses include source document citations with direct links back to PDFs in S3 / FileNet. All monetary amounts displayed in USD ($).

Intent detection

Auto metadata filters

Source citations

PDF deep links

🔐 Password gate

☁️ Streamlit Community Cloud

Free cloud hosting. Deploys directly from GitHub. Secrets managed via Streamlit Cloud Secrets Manager — no credentials in code.

Free tier hosting

GitHub auto-deploy

st.secrets → os.environ

⚠️ Limitations

~15s cold start after idle

Occasional 500 errors on wake

~$0.001 per question

Hosting: $0

↕ parallel run during transition

Phase 2 · Serverless API — AWS Lambda + API Gateway

🔨 In Progress

🌐

askmybank.ai

GitHub Pages UI

→

🔀

API Gateway

HTTP API · HTTPS

→

Lambda Function

FastAPI + Mangum · 512MB

→

☁️ S3 + NumPy

vectors.npy · ~4MB

🤖 Bedrock

Nova Lite · Titan

🗄️ ClickHouse

NL→SQL aggregations

↙

⏰

EventBridge

Ping every 5 min
→ always warm

λ Lambda — Always Warm

RAG object initialised once per container. Warm invocations reuse it — no cold start. EventBridge pings every 5 minutes so the container never sleeps.

~100ms warm response

~3s cold start (rare)

No 500 errors

☁️ S3 + NumPy Vectors

All 4,000 vector embeddings stored as vectors.npy in S3 (~4MB). Downloaded once on cold start, cached in Lambda memory for all warm calls. Cosine similarity search across 4K vectors takes <5ms.

No external accounts

Pure AWS-native

~$0.0001/month S3 cost

💰 Phase 2 Running Cost

Lambda: $0 (1M req free)

API Gateway: ~$0.001/mo

S3 vectors: ~$0.0001/mo

EventBridge: $0 (free tier)

Bedrock: ~$0.001/query

↓

Production Migration — One Swap Per Component

📁 S3 → IBM FileNet P8

One loader swap. FileNet REST API / CMIS replaces boto3 S3 calls. All chunking and embedding code unchanged.

🤖 Nova Lite → Claude Haiku

One line config change. Better reasoning for complex banking queries and regulated industry compliance.

📦 ChromaDB → pgvector

Scale to multi-user production on PostgreSQL RDS. Same LangChain interface — no chain code changes.

🏦 IBM watsonx (Optional)

Full on-premise IBM stack if data residency or enterprise licensing requires it.

Full Tech Stack

☁️ AWS S3

🤖 AWS Bedrock

✨ Amazon Nova Lite

🔢 Titan Embeddings v2

📦 ChromaDB

🗄️ ClickHouse Cloud

🔄 ReplacingMergeTree

🔗 LangChain

🐍 Python 3.9

🌐 Streamlit Community Cloud

🔐 Password Gate

📄 pdfplumber

🖨️ AWS Textract (OCR)

💱 USD Currency

🏦 IBM FileNet P8 (Prod)

🧠 IBM watsonx (Optional)

🔐 IAM Role-Based Access

📋 Audit Logging

⚡ FastAPI + Mangum (Phase 2)

λ AWS Lambda (Phase 2)

🔀 API Gateway HTTP API (Phase 2)

⏰ EventBridge Warming (Phase 2)

🔢 NumPy Vector Search (Phase 2)

🏦 Banking AI Document Assistant