PDF Scraping

PDF Scraping was added to V³ Solutions’ arsenal of Information Management tools to deliver the valuable information essentially “locked away” in PDF documents. Our extensive due diligence of the available PDF Scraping providers was tightly constrained by the exacting standards we employ in offering Information Delivery solutions. The result of our research and subsequent pilot testing uncovered PDF Scraping from AddToIt, which offers a robust PDF Scraping engine with the flexibility of handling the many challenges of oilfield information. Our search for a PDF Scraping partner was based on three necessary elements.

Contextual Output – unlike the many PDF products on the market, we insisted on XML based output from the tool, thereby providing meaning to the data.

Scripting Language – we insisted on a tool that provided scripting engine capability, allowing us the power to integrate the PDF scraping into our Information Delivery framework.

Proven Technology – we recognized that “true” PDF scraping is a very difficult programming challenge, and as such, we insisted on a PDF scraping provider that has already proven their technology in other industries.

[tab name=’Features’]

Template-based: Templates are created to identify the data to be scraped(extracted) from the PDF files.
XML based: Information scraped from the PDF files is stored in XML files, providing a contextual basis to the information for further processing.
Reliability: The AddToIt PDF Scraping technology has proven to be very reliable, pushing 99.9% accuracy and repeatability.
Flexibility: The AddToIt PDF Scraping system contains a scripting engine, allowing additional structures to be scraped.
Performance: Large PDF files (drilling/completion reports) are scraped in seconds.
XML Standards: Information scraped from the PDF files can be normalized to XML standards data sets, e.g. WITSML-type XML standards.
Web-Based Service: The AddToIt PDF Scraping web engine offers complete flexibility in creating solutions for the oilfield.

[/tab]

[tab name=’Screen Shots’]

Simple Structures

Complex Structures

Normalized Data

[/tab]

[end_tabset]

Do you Wrangler Think?

See what we are mixing up in the labs...

Track all your company's assets with Olympus.