• Getting Started
  • Developer Guides
  • Optional features
    • Advanced Workflow
    • Elemental blocks
    • Content Localisation with Fluent
    • GraphQL
    • GridField Bulk Editing Tools
    • GridField Extensions
    • TinyMCE HTML editor
    • Linkfield
    • Login forms
    • Multi-factor authentication (MFA)
    • Queued Jobs
    • RealMe
    • Static Publish Queue
    • TagField
    • Taxonomies
    • Text Extraction
      • Configuration
      • Usage
      • Tika
    • TOTP Authenticator
    • UserForms
  • Upgrading
  • Changelogs
  • Contributing
  • Project Governance
  1. Optional features
  2. Text Extraction
Version 6Supported

Text extraction#

On this page

  • Installation
  • GitHub repository

This module provides a framework for extracting text content from various file formats, such as PDFs and Office documents. The extracted content can be used programmatically or made available directly on your File objects.

Installation#

bash
composer require silverstripe/textextraction

GitHub repository#

https://github.com/silverstripe/silverstripe-textextraction

Configuration

Configuration options, including enabling extraction for DataObjects, managing cached content length, swapping cache backends, and configuring PDF text extraction

Usage

Various methods for text extraction, including extraction via file path or File object, and using the FileTextExtractable extension

Tika

Using Apache Tika for text extraction, using either CLI or REST server configurations

Edit on GitHub