Asad Ghafoor

Back to projects
AI/ML

Document Processing AI

An AI-powered document processing system with web scraping capabilities that extracts tables from multiple sources, enabling semantic search and RAG-based querying over structured data.

System design

Next.js UIWeb ScraperDocument U…Table Extr…LangChainVector StoreRAG PipelinePostgreSQL

Key features

  • Upload documents (Excel, CSV, PDF, Word) with automated web scraping
  • Neural network-powered table extraction with advanced OCR
  • Vector storage of tables using LangChain + embeddings
  • RAG pipeline to answer natural language queries on tables
  • Web UI built with Next.js (TypeScript)
  • Containerized microservices deployed with Kubernetes

Technologies

LangChainNeural NetworksNext.js (TypeScript)FastAPI (Python)PostgreSQLWeb ScrapingDockerKubernetes