html-table-parser-python3

This module consists of just one small class. Its purpose is to parse HTML tables without help of external modules. Everything I use is part of python 3. Instead of installing this module, you can just copy the class located in parse.py into your own code.

import urllib.request
from pprint import pprint
from html_table_parser import HTMLTableParser # import the parser class somehow

target = 'http://www.twitter.com'

# get website content
req = urllib.request.Request(url=target)
f = urllib.request.urlopen(req)
xhtml = f.read().decode('utf-8')

# instantiate the parser and feed it
p = HTMLTableParser()
p.feed(xhtml)
pprint(p.tables)

p.tables returns nested lists of tables containing rows containig cells as string. Tags in cells are stripped and the tags text content is joined. The console output for parsing all tables on the twitter home page looks like this:

>>> 
[[['', 'Anmelden']],
 [['Land', 'Code', 'Für Kunden von'],
  ['Vereinigte Staaten', '40404', '(beliebig)'],
  ['Kanada', '21212', '(beliebig)'],
  ...
  ['3424486444', 'Vodafone'],
  ['Zeige SMS-Kurzwahlen für andere Länder']]]

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
html_table_parser		html_table_parser
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_of_usage.py		example_of_usage.py
html_table_converter		html_table_converter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

html-table-parser-python3

About

Releases

Packages

Languages

License

kayuse/html-table-parser-python3

Folders and files

Latest commit

History

Repository files navigation

html-table-parser-python3

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages