Real Time Ingestion

Real-Time Content Ingestion API Guide

The Real-Time Ingestion API allows developers to upload text content directly for immediate processing and vectorization. Unlike batch upload methods, this API processes content in real-time, making it suitable for dynamic content that needs to be immediately available for search or retrieval.

API Endpoint

POST /ingestion/v1/real_time_upload

Base URL: https://platform.ai.gloo.com

Authentication

The API requires a JWT token with specific claims to authorize access.

The token must be associated with an Client ID that has access to the specified publisher. The system validates that the organization associated with your Client ID has permission to access the publisher specified in your request.

Headers

Authorization: Bearer <your_access_token>
Content-Type: application/json


JWT Token Requirements

Your JWT token must include:

  • A sub claim containing your API client ID
    • A scope claim that includes api/access

      The API client must be associated with an API key that belongs to the same organization as the publisher you're uploading content for.

📘

NOTE: You will only be able to send content to your own publisher, so please double check the publisherId field in the request body for accuracy.

Request Body


The request body should be a JSON object with the following structure:


{
  "content": "This is the full text content that needs to be processed and indexed for search. It can be as long as needed to represent the document content.",
  "filename": "sample_document_name.txt",
  "producer_id": "producer-123",
  "publisher_id": "550e8400-e29b-41d4-a716-446655440000",
  "denomination": "Catholic",
  "evergreen": true,
  "drm": ["aspen", "kallm"],
  "author": ["Jane Doe", "John Smith"],
  "isbn": "978-3-16-148410-0",
  
  "item_title": "Main Document Title",
  "item_subtitle": "An Informative Subtitle",
  "item_image": "https://example.com/images/document-cover.jpg",
  "item_url": "https://example.com/original-content",
  "item_file": "https://example.com/downloads/document.pdf",
  "item_summary": "A brief summary of the document's content and purpose.",
  "item_number": "DOC-2023-001",
  "item_extra": "Additional information about this item",
  "item_tags": ["documentation", "api", "tutorial", "reference"],
  
  "h2_title": "Section Heading",
  "h2_subtitle": "Section Subheading",
  "h2_image": "https://example.com/images/section-image.jpg",
  "h2_url": "https://example.com/section",
  "h2_file": "https://example.com/downloads/section.pdf",
  "h2_summary": "Summary of this specific section",
  "h2_number": "2.1",
  "h2_extra": "Additional section metadata",
  
  "h3_title": "Subsection Heading",
  "h3_subtitle": "Subsection Subheading",
  "h3_image": "https://example.com/images/subsection-image.jpg",
  "h3_url": "https://example.com/subsection",
  "h3_file": "https://example.com/downloads/subsection.pdf",
  "h3_summary": "Summary of this specific subsection",
  "h3_number": "2.1.3",
  "h3_extra": "Additional subsection metadata",
  
  "type": "Article",
  "duration": "15 minutes",
  "pages": "12",
  "publication_date": "2023-07-15",
  "hosted_url": "https://cdn.example.com/hosted-content",
  "pub_type": "technical"
}

Required and Optional Fields

Core Required Fields

Field

Type

Description

publisher_id

UUID string

UUID of the publisher associated with the content *This must be associated with your organization.

content

String

The actual text content to be ingested and chunked


All Available Metadata


Metadata Fields

The below are helpful metadata fields that will be beneficial for further retrieval

FieldTypeRequiredDescription
contentStringYesThe actual text content to be ingested and chunked
filenameStringNoCustom filename for this content
typeStringNoContent type (e.g., article, blog, tutorial)
item_titleStringNo*Title of the content item
item_subtitleStringNoSubtitle of the content item
item_summaryStringNoBrief summary of the content
item_imageStringNoURL to image associated with the content
item_urlStringNoURL to the original content
item_fileStringNoURL to a file associated with the content
item_numberStringNoIdentifying number for the content item
item_extraStringNoAdditional information about the content item
item_tagsArray or StringNoTags associated with the content (can be array or comma-separated string)

Author and Publishing Information

FieldTypeRequiredDescription
authorArray or StringNoAuthor(s) of the content (can be array or comma-separated string)
isbnStringNoISBN if content is from a book
publication_dateStringNoDate when the content was published (recommended format: YYYY-MM-DD)
producer_idStringNoID of the content producer
denominationStringNoReligious denomination (if applicable)
pub_typeStringNoPublication type
hosted_urlStringNoURL where the content is hosted
pagesStringNoNumber of pages (for documents)
durationStringNoDuration (for audio/video content)

Hierarchical Structure (for organized content)

FieldTypeRequiredDescription
h2_titleStringNoTitle for level 2 heading/section
h2_subtitleStringNoSubtitle for level 2 heading/section
h2_imageStringNoImage URL for level 2 heading/section
h2_urlStringNoURL for level 2 heading/section
h2_fileStringNoFile URL for level 2 heading/section
h2_summaryStringNoSummary for level 2 heading/section
h2_numberStringNoNumber for level 2 heading/section
h2_extraStringNoAdditional info for level 2 heading/section
h3_titleStringNoTitle for level 3 heading/section
h3_subtitleStringNoSubtitle for level 3 heading/section
h3_imageStringNoImage URL for level 3 heading/section
h3_urlStringNoURL for level 3 heading/section
h3_fileStringNoFile URL for level 3 heading/section
h3_summaryStringNoSummary for level 2 heading/section