1
markitdown
Markdown
154,675
2026-05-26
Markdown
File Format Processing
Text & Documents
→
Python tool for converting files and office documents to Markdown.
2
difflib
General
73,285
2026-06-16
General
Text Processing
Text & Documents
Built-in
→
(Python standard library) Helpers for computing deltas.
3
mimetypes
Text & Documents
73,285
2026-06-16
File Manipulation
Text & Documents
Built-in
→
(Python standard library) Map filenames to MIME types.
4
pathlib
Text & Documents
73,285
2026-06-16
File Manipulation
Text & Documents
Built-in
→
(Python standard library) A cross-platform, object-oriented path library.
5
tomllib
Data Formats
73,285
2026-06-16
Data Formats
File Format Processing
Text & Documents
Built-in
→
(Python standard library) Parse TOML files.
6
docling
General
61,699
2026-06-16
General
File Format Processing
Text & Documents
→
Library for converting documents into structured data.
7
pypdf
PDF
10,065
2026-06-16
PDF
File Format Processing
Text & Documents
→
A library capable of splitting, merging, cropping, and transforming PDF pages.
8
weasyprint
PDF
9,286
2026-06-16
PDF
File Format Processing
Text & Documents
→
A visual rendering engine for HTML and CSS that can export to PDF.
9
kreuzberg
General
8,498
2026-06-16
General
File Format Processing
Text & Documents
→
High-performance document extraction library with a Rust core, supporting 62+ formats including PDF, Office, images with OCR, HTML, email, and archives.
10
watchdog
Text & Documents
7,371
2026-05-07
File Manipulation
Text & Documents
→
API and shell utilities to monitor file system events.
11
pdfminer.six
PDF
6,991
2026-03-13
PDF
File Format Processing
Text & Documents
→
Pdfminer.six is a community maintained fork of the original PDFMiner.
12
csvkit
Data Formats
6,390
2026-06-08
Data Formats
File Format Processing
Text & Documents
→
Utilities for converting to and working with CSV.
13
xmltodict
Text & Documents
5,741
2026-06-15
HTML Manipulation
Text & Documents
→
Working with XML feel like you are working with JSON.
14
python-docx
MS Office
5,639
2025-06-16
MS Office
File Format Processing
Text & Documents
→
Reads, queries and modifies Microsoft Word 2007/2008 docx files.
15
pypinyin
General
5,325
2026-04-19
General
Text Processing
Text & Documents
→
Convert Chinese hanzi (漢字) to pinyin (拼音).
16
tablib
General
4,754
2026-06-16
General
File Format Processing
Text & Documents
→
A module for Tabular Datasets in XLS, CSV, JSON, YAML.
17
markdown
Markdown
4,215
2026-05-26
Markdown
File Format Processing
Text & Documents
→
A Python implementation of John Gruber’s Markdown.
18
ftfy
General
4,043
2024-10-30
General
Text Processing
Text & Documents
→
Makes Unicode text less broken and more consistent automagically.
19
sqlparse
Parser
4,008
2026-06-06
Parser
Text Processing
Text & Documents
→
A non-validating SQL parser.
20
xlsxwriter
MS Office
3,948
2026-03-22
MS Office
File Format Processing
Text & Documents
→
A Python module for creating Excel .xlsx files.
21
python-phonenumbers
Parser
3,749
2026-06-05
Parser
Text Processing
Text & Documents
→
Parsing, formatting, storing and validating international phone numbers.
22
textdistance
General
3,533
2025-04-18
General
Text Processing
Text & Documents
→
Compute distance between sequences with 30+ algorithms.
23
python-pptx
MS Office
3,422
2024-08-07
MS Office
File Format Processing
Text & Documents
→
Python library for creating and updating PowerPoint (.pptx) files.
24
xlwings
MS Office
3,363
2026-06-16
MS Office
File Format Processing
Text & Documents
→
A BSD-licensed library that makes it easy to call Python from Excel and vice versa.
25
mistune
Markdown
3,043
2026-06-16
Markdown
File Format Processing
Text & Documents
→
Fastest and full featured pure Python parsers of Markdown.
26
lxml
Text & Documents
3,037
2026-06-16
HTML Manipulation
Text & Documents
→
A very fast, easy-to-use and versatile library for handling HTML and XML.
27
python-magic
Text & Documents
2,911
2026-05-11
File Manipulation
Text & Documents
→
A Python interface to the libmagic file type identification library.
28
pyyaml
Data Formats
2,900
2025-09-25
Data Formats
File Format Processing
Text & Documents
→
YAML implementations for Python.
29
pikepdf
PDF
2,744
2026-06-08
PDF
File Format Processing
Text & Documents
→
A powerful library for reading and editing PDF files, based on qpdf.
30
docxtpl
MS Office
2,662
2025-11-13
MS Office
File Format Processing
Text & Documents
→
Editing a docx document by jinja2 template
31
chardet
General
2,638
2026-05-05
General
Text Processing
Text & Documents
→
Python character encoding detector.
32
watchfiles
Text & Documents
2,505
2026-06-13
File Manipulation
Text & Documents
→
Simple, modern and fast file watching and code reload in python.
33
pyparsing
Parser
2,474
2026-06-01
Parser
Text Processing
Text & Documents
→
A general purpose framework for generating parsers.
34
pyquery
Text & Documents
2,380
2026-02-18
HTML Manipulation
Text & Documents
→
A jQuery-like library for parsing HTML.
35
pyelftools
General
2,249
2026-06-04
General
File Format Processing
Text & Documents
→
Parsing and analyzing ELF files and DWARF debugging information.
36
shortuuid
Unique identifiers
2,188
2025-12-01
Unique identifiers
Text Processing
Text & Documents
→
A generator library for concise, unambiguous and URL-safe UUIDs.
37
pygments
Parser
2,174
2026-06-11
Parser
Text Processing
Text & Documents
→
A generic syntax highlighter.
38
python-slugify
General
1,618
2026-01-07
General
Text Processing
Text & Documents
→
A Python slugify library that translates unicode to ASCII.
39
pyfiglet
General
1,564
2025-08-15
General
Text Processing
Text & Documents
→
An implementation of figlet written in Python.
40
python-user-agents
Parser
1,516
2023-02-16
Parser
Text Processing
Text & Documents
→
Browser user agent parser.
41
babel
General
1,450
2026-04-20
General
Text Processing
Text & Documents
→
An internationalization library for Python.
42
markdown-it-py
Markdown
1,324
2026-05-19
Markdown
File Format Processing
Text & Documents
→
Markdown parser with 100% CommonMark support, extensions, and syntax plugins.
43
pyexcel
MS Office
1,283
2025-12-10
MS Office
File Format Processing
Text & Documents
→
Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files.
44
justhtml
Text & Documents
1,143
2026-06-12
HTML Manipulation
Text & Documents
→
A pure Python HTML5 parser that just works.
45
pdf_oxide
PDF
834
2026-06-16
PDF
File Format Processing
Text & Documents
→
A fast PDF library for text extraction, image extraction, and markdown conversion, powered by Rust.
46
python-nameparser
Parser
708
2026-06-11
Parser
Text Processing
Text & Documents
→
Parsing human names into their individual components.
47
markupsafe
Text & Documents
691
2025-09-27
HTML Manipulation
Text & Documents
→
Implements a XML/HTML/XHTML Markup safe string for Python.
48
unidecode
General
610
2026-01-05
General
Text Processing
Text & Documents
→
ASCII transliterations of Unicode text.
49
sqids
Unique identifiers
508
2025-03-26
Unique identifiers
Text Processing
Text & Documents
→
A library for generating short unique IDs from numbers.
50
pangu.py
General
276
2023-03-30
General
Text Processing
Text & Documents
→
Paranoid text spacing.
51
tinycss2
Text & Documents
186
2025-11-23
HTML Manipulation
Text & Documents
→
A low-level CSS parser and generator written in Python.
52
beautifulsoup
Text & Documents
External
—
HTML Manipulation
Text & Documents
→
Providing Pythonic idioms for iterating, searching, and modifying HTML or XML.
53
openpyxl
MS Office
External
—
MS Office
File Format Processing
Text & Documents
→
A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
54
reportlab
PDF
External
—
PDF
File Format Processing
Text & Documents
→
Allowing Rapid creation of rich PDF documents.