Text & Documents Python Libraries

Search every project in one place

Press / to search. Tap a tag to filter. Click any row for details.

Filtering for

Row number				Tags
1	markitdown Markdown	170,535	2026-07-29	Markdown File Format Processing Text & Documents	→
	Python tool for converting files and office documents to Markdown.
	microsoft/github.com/microsoft/markitdown /2026-07-29
2	difflib General	73,994	2026-07-31	General Text Processing Text & Documents Built-in	→
	(Python standard library) Helpers for computing deltas.
	python/docs.python.org/3/library/difflib.html /2026-07-31
3	mimetypes Text & Documents	73,994	2026-07-31	File Manipulation Text & Documents Built-in	→
	(Python standard library) Map filenames to MIME types.
	python/docs.python.org/3/library/mimetypes.html /2026-07-31
4	pathlib Text & Documents	73,994	2026-07-31	File Manipulation Text & Documents Built-in	→
	(Python standard library) A cross-platform, object-oriented path library.
	python/docs.python.org/3/library/pathlib.html /2026-07-31
5	tomllib Data Formats	73,994	2026-07-31	Data Formats File Format Processing Text & Documents Built-in	→
	(Python standard library) Parse TOML files.
	python/docs.python.org/3/library/tomllib.html /2026-07-31
6	docling General	64,076	2026-07-31	General File Format Processing Text & Documents	→
	Library for converting documents into structured data.
	docling-project/github.com/docling-project/docling /2026-07-31
7	pypdf PDF	10,138	2026-07-30	PDF File Format Processing Text & Documents	→
	A library capable of splitting, merging, cropping, and transforming PDF pages.
	py-pdf/github.com/py-pdf/pypdf /2026-07-30
8	weasyprint PDF	9,453	2026-07-29	PDF File Format Processing Text & Documents	→
	A visual rendering engine for HTML and CSS that can export to PDF.
	Kozea/github.com/Kozea/WeasyPrint /2026-07-29
9	xberg General	8,724	2026-07-31	General File Format Processing Text & Documents	→
	High-performance document intelligence library with a Rust core, extracting text, tables, and metadata from 97+ formats including PDF, Office, images (with OCR), HTML, email, and archives.
	xberg-io/github.com/xberg-io/xberg /2026-07-31
10	watchdog Text & Documents	7,388	2026-07-30	File Manipulation Text & Documents	→
	API and shell utilities to monitor file system events.
	gorakhargosh/github.com/gorakhargosh/watchdog /2026-07-30
11	pdfminer.six PDF	7,012	2026-03-13	PDF File Format Processing Text & Documents	→
	Pdfminer.six is a community maintained fork of the original PDFMiner.
	pdfminer/github.com/pdfminer/pdfminer.six /2026-03-13
12	csvkit Data Formats	6,404	2026-07-30	Data Formats File Format Processing Text & Documents	→
	Utilities for converting to and working with CSV.
	wireservice/github.com/wireservice/csvkit /2026-07-30
13	xmltodict Text & Documents	5,746	2026-06-15	HTML Manipulation Text & Documents	→
	Working with XML feel like you are working with JSON.
	martinblech/github.com/martinblech/xmltodict /2026-06-15
14	python-docx MS Office	5,689	2025-06-16	MS Office File Format Processing Text & Documents	→
	Reads, queries and modifies Microsoft Word 2007/2008 docx files.
	python-openxml/github.com/python-openxml/python-docx /2025-06-16
15	pypinyin General	5,342	2026-04-19	General Text Processing Text & Documents	→
	Convert Chinese hanzi (漢字) to pinyin (拼音).
	mozillazg/github.com/mozillazg/python-pinyin /2026-04-19
16	tablib General	4,755	2026-07-31	General File Format Processing Text & Documents	→
	A module for Tabular Datasets in XLS, CSV, JSON, YAML.
	jazzband/github.com/jazzband/tablib /2026-07-31
17	markdown Markdown	4,228	2026-07-30	Markdown File Format Processing Text & Documents	→
	A Python implementation of John Gruber’s Markdown.
	Python-Markdown/github.com/Python-Markdown/markdown /2026-07-30
18	ftfy General	4,052	2024-10-30	General Text Processing Text & Documents	→
	Makes Unicode text less broken and more consistent automagically.
	rspeer/github.com/rspeer/python-ftfy /2024-10-30
19	sqlparse Parser	4,009	2026-07-27	Parser Text Processing Text & Documents	→
	A non-validating SQL parser.
	andialbrecht/github.com/andialbrecht/sqlparse /2026-07-27
20	xlsxwriter MS Office	3,961	2026-07-02	MS Office File Format Processing Text & Documents	→
	A Python module for creating Excel .xlsx files.
	jmcnamara/github.com/jmcnamara/XlsxWriter /2026-07-02
21	python-phonenumbers Parser	3,763	2026-07-26	Parser Text Processing Text & Documents	→
	Parsing, formatting, storing and validating international phone numbers.
	daviddrysdale/github.com/daviddrysdale/python-phonenumbers /2026-07-26
22	textdistance General	3,536	2025-04-18	General Text Processing Text & Documents	→
	Compute distance between sequences with 30+ algorithms.
	life4/github.com/life4/textdistance /2025-04-18
23	python-pptx MS Office	3,472	2024-08-07	MS Office File Format Processing Text & Documents	→
	Python library for creating and updating PowerPoint (.pptx) files.
	scanny/github.com/scanny/python-pptx /2024-08-07
24	xlwings MS Office	3,390	2026-07-31	MS Office File Format Processing Text & Documents	→
	A BSD-licensed library that makes it easy to call Python from Excel and vice versa.
	xlwings/github.com/xlwings/xlwings /2026-07-31
25	mistune Markdown	3,057	2026-07-25	Markdown File Format Processing Text & Documents	→
	Fastest and full featured pure Python parsers of Markdown.
	lepture/github.com/lepture/mistune /2026-07-25
26	lxml Text & Documents	3,044	2026-07-20	HTML Manipulation Text & Documents	→
	A very fast, easy-to-use and versatile library for handling HTML and XML.
	lxml/github.com/lxml/lxml /2026-07-20
27	pyyaml Data Formats	2,925	2026-06-17	Data Formats File Format Processing Text & Documents	→
	YAML implementations for Python.
	yaml/github.com/yaml/pyyaml /2026-06-17
28	python-magic Text & Documents	2,913	2026-07-20	File Manipulation Text & Documents	→
	A Python interface to the libmagic file type identification library.
	ahupp/github.com/ahupp/python-magic /2026-07-20
29	pikepdf PDF	2,770	2026-07-31	PDF File Format Processing Text & Documents	→
	A powerful library for reading and editing PDF files, based on qpdf.
	pikepdf/github.com/pikepdf/pikepdf /2026-07-31
30	docxtpl MS Office	2,682	2026-07-07	MS Office File Format Processing Text & Documents	→
	Editing a docx document by jinja2 template
	elapouya/github.com/elapouya/python-docx-template /2026-07-07
31	chardet General	2,650	2026-07-31	General Text Processing Text & Documents	→
	Python character encoding detector.
	chardet/github.com/chardet/chardet /2026-07-31
32	watchfiles Text & Documents	2,517	2026-06-13	File Manipulation Text & Documents	→
	Simple, modern and fast file watching and code reload in python.
	samuelcolvin/github.com/samuelcolvin/watchfiles /2026-06-13
33	pyparsing Parser	2,478	2026-07-19	Parser Text Processing Text & Documents	→
	A general purpose framework for generating parsers.
	pyparsing/github.com/pyparsing/pyparsing /2026-07-19
34	pyquery Text & Documents	2,381	2026-07-27	HTML Manipulation Text & Documents	→
	A jQuery-like library for parsing HTML.
	gawel/github.com/gawel/pyquery /2026-07-27
35	pyelftools General	2,270	2026-07-30	General File Format Processing Text & Documents	→
	Parsing and analyzing ELF files and DWARF debugging information.
	eliben/github.com/eliben/pyelftools /2026-07-30
36	pygments Parser	2,196	2026-07-28	Parser Text Processing Text & Documents	→
	A generic syntax highlighter.
	pygments/github.com/pygments/pygments /2026-07-28
37	shortuuid Unique identifiers	2,193	2026-06-20	Unique identifiers Text Processing Text & Documents	→
	A generator library for concise, unambiguous and URL-safe UUIDs.
	skorokithakis/github.com/skorokithakis/shortuuid /2026-06-20
38	python-slugify General	1,622	2026-01-07	General Text Processing Text & Documents	→
	A Python slugify library that translates unicode to ASCII.
	un33k/github.com/un33k/python-slugify /2026-01-07
39	pyfiglet General	1,579	2026-07-04	General Text Processing Text & Documents	→
	An implementation of figlet written in Python.
	pwaller/github.com/pwaller/pyfiglet /2026-07-04
40	python-user-agents Parser	1,515	2023-02-16	Parser Text Processing Text & Documents	→
	Browser user agent parser.
	selwin/github.com/selwin/python-user-agents /2023-02-16
41	babel General	1,459	2026-07-31	General Text Processing Text & Documents	→
	An internationalization library for Python.
	python-babel/github.com/python-babel/babel /2026-07-31
42	markdown-it-py Markdown	1,347	2026-07-08	Markdown File Format Processing Text & Documents	→
	Markdown parser with 100% CommonMark support, extensions, and syntax plugins.
	executablebooks/github.com/executablebooks/markdown-it-py /2026-07-08
43	pyexcel MS Office	1,291	2026-07-01	MS Office File Format Processing Text & Documents	→
	Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files.
	pyexcel/github.com/pyexcel/pyexcel /2026-07-01
44	justhtml Text & Documents	1,147	2026-07-26	HTML Manipulation Text & Documents	→
	A pure Python HTML5 parser that just works.
	EmilStenstrom/github.com/EmilStenstrom/justhtml/ /2026-07-26
45	pdf_oxide PDF	918	2026-07-28	PDF File Format Processing Text & Documents	→
	A fast PDF library for text extraction, image extraction, and markdown conversion, powered by Rust.
	yfedoseev/github.com/yfedoseev/pdf_oxide /2026-07-28
46	html-to-markdown Text & Documents	819	2026-07-31	HTML Manipulation Text & Documents	→
	A fast, CommonMark-compliant HTML to Markdown converter with a Rust core, tolerant of malformed HTML.
	xberg-io/github.com/xberg-io/html-to-markdown /2026-07-31
47	python-nameparser Parser	713	2026-07-31	Parser Text Processing Text & Documents	→
	Parsing human names into their individual components.
	derek73/github.com/derek73/python-nameparser /2026-07-31
48	markupsafe Text & Documents	694	2025-09-27	HTML Manipulation Text & Documents	→
	Implements a XML/HTML/XHTML Markup safe string for Python.
	pallets/github.com/pallets/markupsafe /2025-09-27
49	unidecode General	611	2026-01-05	General Text Processing Text & Documents	→
	ASCII transliterations of Unicode text.
	avian2/github.com/avian2/unidecode /2026-01-05
50	sqids Unique identifiers	516	2025-03-26	Unique identifiers Text Processing Text & Documents	→
	A library for generating short unique IDs from numbers.
	sqids/github.com/sqids/sqids-python /2025-03-26
51	parsy Parser	450	2026-06-22	Parser Text Processing Text & Documents	→
	Easy, generic parser combinator library for creating parsers.
	python-parsy/github.com/python-parsy/parsy /2026-06-22
52	tree-sitter-language-pack Parser	441	2026-07-31	Parser Text Processing Text & Documents	→
	A comprehensive collection of tree-sitter parsers for 300+ languages, distributed as prebuilt wheels.
	xberg-io/github.com/xberg-io/tree-sitter-language-pack /2026-07-31
53	pangu.py General	278	2026-07-31	General Text Processing Text & Documents	→
	Paranoid text spacing.
	vinta/github.com/vinta/pangu.py /2026-07-31
54	tinycss2 Text & Documents	189	2026-06-21	HTML Manipulation Text & Documents	→
	A low-level CSS parser and generator written in Python.
	Kozea/github.com/Kozea/tinycss2 /2026-06-21
55	beautifulsoup Text & Documents	External	—	HTML Manipulation Text & Documents	→
	Providing Pythonic idioms for iterating, searching, and modifying HTML or XML.
	www.crummy.com/software/BeautifulSoup/bs4/doc/
56	openpyxl MS Office	External	—	MS Office File Format Processing Text & Documents	→
	A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
	openpyxl.readthedocs.io/en/stable/
57	reportlab PDF	External	—	PDF File Format Processing Text & Documents	→
	Allowing Rapid creation of rich PDF documents.
	www.reportlab.com/opensource/

Contribute

Know a project that belongs here?

Tell us what it does and why it stands out.

Submit a project Star the repository

Search every project in one place

Search and filter

Results

Know a project that belongs here?