1
markitdown
Markdown
119,566
2026-04-20
Markdown
File Format Processing
Text & Documents
→
Python tool for converting files and office documents to Markdown.
2
tomllib
Data Formats
72,556
2026-05-02
Data Formats
File Format Processing
Text & Documents
Built-in
→
(Python standard library) Parse TOML files.
3
docling
General
59,006
2026-04-30
General
File Format Processing
Text & Documents
→
Library for converting documents into structured data.
4
pypdf
PDF
9,977
2026-04-30
PDF
File Format Processing
Text & Documents
→
A library capable of splitting, merging, cropping, and transforming PDF pages.
5
weasyprint
PDF
8,924
2026-04-28
PDF
File Format Processing
Text & Documents
→
A visual rendering engine for HTML and CSS that can export to PDF.
6
kreuzberg
General
8,193
2026-05-02
General
File Format Processing
Text & Documents
→
High-performance document extraction library with a Rust core, supporting 62+ formats including PDF, Office, images with OCR, HTML, email, and archives.
7
pdfminer.six
PDF
6,966
2026-03-13
PDF
File Format Processing
Text & Documents
→
Pdfminer.six is a community maintained fork of the original PDFMiner.
8
csvkit
Data Formats
6,367
2026-03-26
Data Formats
File Format Processing
Text & Documents
→
Utilities for converting to and working with CSV.
9
python-docx
MS Office
5,558
2025-06-16
MS Office
File Format Processing
Text & Documents
→
Reads, queries and modifies Microsoft Word 2007/2008 docx files.
10
tablib
General
4,750
2026-04-06
General
File Format Processing
Text & Documents
→
A module for Tabular Datasets in XLS, CSV, JSON, YAML.
11
markdown
Markdown
4,203
2026-02-09
Markdown
File Format Processing
Text & Documents
→
A Python implementation of John Gruber’s Markdown.
12
xlsxwriter
MS Office
3,936
2026-03-22
MS Office
File Format Processing
Text & Documents
→
A Python module for creating Excel .xlsx files.
13
xlwings
MS Office
3,344
2026-04-27
MS Office
File Format Processing
Text & Documents
→
A BSD-licensed library that makes it easy to call Python from Excel and vice versa.
14
python-pptx
MS Office
3,332
2024-08-07
MS Office
File Format Processing
Text & Documents
→
Python library for creating and updating PowerPoint (.pptx) files.
15
mistune
Markdown
3,022
2026-04-13
Markdown
File Format Processing
Text & Documents
→
Fastest and full featured pure Python parsers of Markdown.
16
pyyaml
Data Formats
2,885
2025-09-25
Data Formats
File Format Processing
Text & Documents
→
YAML implementations for Python.
17
pikepdf
PDF
2,707
2026-04-28
PDF
File Format Processing
Text & Documents
→
A powerful library for reading and editing PDF files, based on qpdf.
18
docxtpl
MS Office
2,626
2025-11-13
MS Office
File Format Processing
Text & Documents
→
Editing a docx document by jinja2 template
19
pyelftools
General
2,233
2026-05-01
General
File Format Processing
Text & Documents
→
Parsing and analyzing ELF files and DWARF debugging information.
20
markdown-it-py
Markdown
1,299
2026-02-18
Markdown
File Format Processing
Text & Documents
→
Markdown parser with 100% CommonMark support, extensions, and syntax plugins.
21
pyexcel
MS Office
1,281
2025-12-10
MS Office
File Format Processing
Text & Documents
→
Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files.
22
pdf_oxide
PDF
717
2026-05-01
PDF
File Format Processing
Text & Documents
→
A fast PDF library for text extraction, image extraction, and markdown conversion, powered by Rust.
23
openpyxl
MS Office
External
—
MS Office
File Format Processing
Text & Documents
→
A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
24
reportlab
PDF
External
—
PDF
File Format Processing
Text & Documents
→
Allowing Rapid creation of rich PDF documents.