Overview
The AI Document Analyzer is an internal tool that leverages Azure OpenAI GPT-4 to automatically process and extract structured information from various document formats including PDF, DOCX, and XLSX files.
Key Features
- Automated extraction: Identifies key entities, dates, amounts, and clauses
- Multi-format support: Handles PDF, Word, and Excel files
- Structured output: Exports results as JSON or CSV
- Batch processing: Handles up to 100 documents per session
- Azure integration: Fully integrated with Azure Blob Storage and Azure Cognitive Services
Technical Stack
- Backend: Python (FastAPI), Azure OpenAI (GPT-4o)
- Frontend: React + TypeScript
- Infrastructure: Azure Container Apps, Azure Blob Storage
- CI/CD: GitHub Actions → Azure Container Registry
Getting Started
Deploy the application using the provided ARM templates in the /infra folder. Ensure you have the necessary Azure OpenAI quota before deploying.
Limitations
Currently limited to documents under 50 MB. Scanned PDFs require OCR pre-processing via Azure Document Intelligence.