API

hotpdf.hotpdf.HotPdf([pdf_file, password, ...])

hotpdf.memory_map.MemoryMap()

hotpdf.sparse_matrix.SparseMatrix([rows, ...])

2D representation of a PDF in plain text format.

hotpdf.span_map.SpanMap()

Hashmap to store spans and their child words for fast referencing and character grouping.

hotpdf.trie.TrieNode()

hotpdf.trie.Trie()

hotpdf.utils

hotpdf.processor

hotpdf.data.classes.HotCharacter(value, x, ...)

A hot character is a character on a page with certain attributes.

hotpdf.data.classes.Span(characters, span_id)

A span is a group of characters that are close to each other.

hotpdf.data.classes.ElementDimension(x0, y0, ...)

ElementDimension is the dimension of an element in hotpdf.