1
markitdown
Markdown
119,566
2026-04-20
Markdown
File Format Processing
Text & Documents
→
Python tool for converting files and office documents to Markdown.
2
difflib
General
72,556
2026-05-02
General
Text Processing
Text & Documents
Built-in
→
(Python standard library) Helpers for computing deltas.
3
mimetypes
Text & Documents
72,556
2026-05-02
File Manipulation
Text & Documents
Built-in
→
(Python standard library) Map filenames to MIME types.
4
pathlib
Text & Documents
72,556
2026-05-02
File Manipulation
Text & Documents
Built-in
→
(Python standard library) A cross-platform, object-oriented path library.
5
tomllib
Data Formats
72,556
2026-05-02
Data Formats
File Format Processing
Text & Documents
Built-in
→
(Python standard library) Parse TOML files.
6
docling
General
59,006
2026-04-30
General
File Format Processing
Text & Documents
→
Library for converting documents into structured data.
7
pypdf
PDF
9,977
2026-04-30
PDF
File Format Processing
Text & Documents
→
A library capable of splitting, merging, cropping, and transforming PDF pages.
8
weasyprint
PDF
8,924
2026-04-28
PDF
File Format Processing
Text & Documents
→
A visual rendering engine for HTML and CSS that can export to PDF.
9
kreuzberg
General
8,193
2026-05-02
General
File Format Processing
Text & Documents
→
High-performance document extraction library with a Rust core, supporting 62+ formats including PDF, Office, images with OCR, HTML, email, and archives.
10
watchdog
Text & Documents
7,328
2026-04-14
File Manipulation
Text & Documents
→
API and shell utilities to monitor file system events.
11
pdfminer.six
PDF
6,966
2026-03-13
PDF
File Format Processing
Text & Documents
→
Pdfminer.six is a community maintained fork of the original PDFMiner.
12
csvkit
Data Formats
6,367
2026-03-26
Data Formats
File Format Processing
Text & Documents
→
Utilities for converting to and working with CSV.
13
xmltodict
Text & Documents
5,734
2026-03-23
HTML Manipulation
Text & Documents
→
Working with XML feel like you are working with JSON.
14
python-docx
MS Office
5,558
2025-06-16
MS Office
File Format Processing
Text & Documents
→
Reads, queries and modifies Microsoft Word 2007/2008 docx files.
15
pypinyin
General
5,292
2026-04-19
General
Text Processing
Text & Documents
→
Convert Chinese hanzi (漢字) to pinyin (拼音).
16
tablib
General
4,750
2026-04-06
General
File Format Processing
Text & Documents
→
A module for Tabular Datasets in XLS, CSV, JSON, YAML.
17
markdown
Markdown
4,203
2026-02-09
Markdown
File Format Processing
Text & Documents
→
A Python implementation of John Gruber’s Markdown.
18
ftfy
General
4,036
2024-10-30
General
Text Processing
Text & Documents
→
Makes Unicode text less broken and more consistent automagically.
19
sqlparse
Parser
4,003
2025-12-19
Parser
Text Processing
Text & Documents
→
A non-validating SQL parser.
20
xlsxwriter
MS Office
3,936
2026-03-22
MS Office
File Format Processing
Text & Documents
→
A Python module for creating Excel .xlsx files.
21
python-phonenumbers
Parser
3,734
2026-04-25
Parser
Text Processing
Text & Documents
→
Parsing, formatting, storing and validating international phone numbers.
22
textdistance
General
3,527
2025-04-18
General
Text Processing
Text & Documents
→
Compute distance between sequences with 30+ algorithms.
23
xlwings
MS Office
3,344
2026-04-27
MS Office
File Format Processing
Text & Documents
→
A BSD-licensed library that makes it easy to call Python from Excel and vice versa.
24
python-pptx
MS Office
3,332
2024-08-07
MS Office
File Format Processing
Text & Documents
→
Python library for creating and updating PowerPoint (.pptx) files.
25
mistune
Markdown
3,022
2026-04-13
Markdown
File Format Processing
Text & Documents
→
Fastest and full featured pure Python parsers of Markdown.
26
lxml
Text & Documents
3,018
2026-04-24
HTML Manipulation
Text & Documents
→
A very fast, easy-to-use and versatile library for handling HTML and XML.
27
python-magic
Text & Documents
2,907
2026-03-03
File Manipulation
Text & Documents
→
A Python interface to the libmagic file type identification library.
28
pyyaml
Data Formats
2,885
2025-09-25
Data Formats
File Format Processing
Text & Documents
→
YAML implementations for Python.
29
pikepdf
PDF
2,707
2026-04-28
PDF
File Format Processing
Text & Documents
→
A powerful library for reading and editing PDF files, based on qpdf.
30
docxtpl
MS Office
2,626
2025-11-13
MS Office
File Format Processing
Text & Documents
→
Editing a docx document by jinja2 template
31
chardet
General
2,618
2026-04-13
General
Text Processing
Text & Documents
→
Python 2/3 compatible character encoding detector.
32
watchfiles
Text & Documents
2,472
2025-11-28
File Manipulation
Text & Documents
→
Simple, modern and fast file watching and code reload in python.
33
pyparsing
Parser
2,466
2026-04-30
Parser
Text Processing
Text & Documents
→
A general purpose framework for generating parsers.
34
pyquery
Text & Documents
2,380
2026-02-18
HTML Manipulation
Text & Documents
→
A jQuery-like library for parsing HTML.
35
pyelftools
General
2,233
2026-05-01
General
File Format Processing
Text & Documents
→
Parsing and analyzing ELF files and DWARF debugging information.
36
shortuuid
Unique identifiers
2,182
2025-12-01
Unique identifiers
Text Processing
Text & Documents
→
A generator library for concise, unambiguous and URL-safe UUIDs.
37
pygments
Parser
2,151
2026-05-02
Parser
Text Processing
Text & Documents
→
A generic syntax highlighter.
38
python-slugify
General
1,608
2026-01-07
General
Text Processing
Text & Documents
→
A Python slugify library that translates unicode to ASCII.
39
pyfiglet
General
1,556
2025-08-15
General
Text Processing
Text & Documents
→
An implementation of figlet written in Python.
40
python-user-agents
Parser
1,515
2023-02-16
Parser
Text Processing
Text & Documents
→
Browser user agent parser.
41
babel
General
1,442
2026-04-20
General
Text Processing
Text & Documents
→
An internationalization library for Python.
42
markdown-it-py
Markdown
1,299
2026-02-18
Markdown
File Format Processing
Text & Documents
→
Markdown parser with 100% CommonMark support, extensions, and syntax plugins.
43
pyexcel
MS Office
1,281
2025-12-10
MS Office
File Format Processing
Text & Documents
→
Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files.
44
justhtml
Text & Documents
1,134
2026-05-01
HTML Manipulation
Text & Documents
→
A pure Python HTML5 parser that just works.
45
pdf_oxide
PDF
717
2026-05-01
PDF
File Format Processing
Text & Documents
→
A fast PDF library for text extraction, image extraction, and markdown conversion, powered by Rust.
46
python-nameparser
Parser
706
2023-09-21
Parser
Text Processing
Text & Documents
→
Parsing human names into their individual components.
47
markupsafe
Text & Documents
686
2025-09-27
HTML Manipulation
Text & Documents
→
Implements a XML/HTML/XHTML Markup safe string for Python.
48
unidecode
General
606
2026-01-05
General
Text Processing
Text & Documents
→
ASCII transliterations of Unicode text.
49
sqids
Unique identifiers
502
2025-03-26
Unique identifiers
Text Processing
Text & Documents
→
A library for generating short unique IDs from numbers.
50
pangu.py
General
273
2023-03-30
General
Text Processing
Text & Documents
→
51
tinycss2
Text & Documents
184
2025-11-23
HTML Manipulation
Text & Documents
→
A low-level CSS parser and generator written in Python.
52
beautifulsoup
Text & Documents
External
—
HTML Manipulation
Text & Documents
→
Providing Pythonic idioms for iterating, searching, and modifying HTML or XML.
53
openpyxl
MS Office
External
—
MS Office
File Format Processing
Text & Documents
→
A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
54
reportlab
PDF
External
—
PDF
File Format Processing
Text & Documents
→
Allowing Rapid creation of rich PDF documents.