File Format Processing

Libraries for parsing and manipulating specific text formats.

Search every project in one place

Press / to search. Tap a tag to filter. Click any row for details.

Search and filter

Filtering for

Results

Row number Tags
Python tool for converting files and office documents to Markdown.
(Python standard library) Parse TOML files.
Library for converting documents into structured data.
A library capable of splitting, merging, cropping, and transforming PDF pages.
A visual rendering engine for HTML and CSS that can export to PDF.
High-performance document extraction library with a Rust core, supporting 62+ formats including PDF, Office, images with OCR, HTML, email, and archives.
Pdfminer.six is a community maintained fork of the original PDFMiner.
Utilities for converting to and working with CSV.
Reads, queries and modifies Microsoft Word 2007/2008 docx files.
A module for Tabular Datasets in XLS, CSV, JSON, YAML.
A Python implementation of John Gruber’s Markdown.
A Python module for creating Excel .xlsx files.
A BSD-licensed library that makes it easy to call Python from Excel and vice versa.
Python library for creating and updating PowerPoint (.pptx) files.
Fastest and full featured pure Python parsers of Markdown.
YAML implementations for Python.
A powerful library for reading and editing PDF files, based on qpdf.
Editing a docx document by jinja2 template
Parsing and analyzing ELF files and DWARF debugging information.
Markdown parser with 100% CommonMark support, extensions, and syntax plugins.
Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files.
A fast PDF library for text extraction, image extraction, and markdown conversion, powered by Rust.
A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
Allowing Rapid creation of rich PDF documents.

Know a project that belongs here?

Tell us what it does and why it stands out.