DomainClassifier/params.json

1 line
7.6 KiB
JSON
Raw Permalink Normal View History

2013-06-14 19:15:40 +00:00
{"name":"Domainclassifier","tagline":"DomainClassifier is a simple Python library to extract and classify Internet domains/hostnames/IP addresses from raw text files following their existence, localization or attributes.","body":"DomainClassifier\r\n================\r\n\r\nDomainClassifier is a simple Python library to extract and classify Internet\r\ndomains/hostnames/IP addresses from raw text files following their existence,\r\nlocalization or attributes.\r\n\r\nDomainClassifier can be used to extract Internet hosts from any free texts.\r\n\r\n![An overview of the DomainClassifier methods](https://raw.github.com/adulau/DomainClassifier/master/doc/domainclassifier-flow.png)\r\n\r\nHow To Use It\r\n-------------\r\n\r\n```python\r\nimport domainclassifier\r\n\r\nc = domainclassifier.Extract( rawtext = \"www.xxx.com this is a text with a domain called test@foo.lu another test abc.lu something a.b.c.d.e end of 1.2.3.4 foo.be www.belnet.be ht\r\ntp://www.cert.be/ www.public.lu www.allo.lu quuxtest www.eurodns.com something-broken-www.google.com www.google.lu trailing test www.facebook.com www.nic.ru www.youporn.com 8.8.8.\r\n8 201.1.1.1\")\r\n\r\n# extracting potentially valid domains from rawtext\r\nprint c.domain()\r\n\r\n# reduce set of potentially valid domains to existing domains\r\n# (based on SOA,A,AAAA,CNAME,MX records)\r\nprint c.validdomain(extended=True)\r\n\r\n# reduce set of valid domains with DNS records associated to a\r\n# specified country\r\nprint \"US:\"\r\nprint c.localizedomain(cc='US')\r\nprint \"LU:\"\r\nprint c.localizedomain(cc='LU')\r\nprint \"BE:\"\r\nprint c.localizedomain(cc='BE')\r\nprint \"Ranking:\"\r\nprint c.rankdomain()\r\n\r\n# extract valid IPv4 addresses (using the potential list of valid domains)\r\nprint \"List of ip addresses:\"\r\nprint c.ipaddress(extended=True)\r\n\r\n# some more filtering\r\nprint \"Include dot.lu:\"\r\nprint c.include(expression=r'\\.lu$')\r\nprint \"Exclude dot.lu:\"\r\nprint c.exclude(expression=r'\\.lu$')\r\n```\r\n\r\n### Sample output\r\n\r\n```python\r\n['www.xxx.com', 'foo.lu', 'abc.lu', 'a.b.c.d.e', '1.2.3.4', 'foo.be', 'www.belnet.be', 'www.cert.be', 'www.public.lu', 'www.allo.lu', 'www.eurodns.com', 'something-broken-www.google.com', 'www.google.lu', 'www.facebook.com', 'www.nic.ru', 'www.youporn.com', '8.8.8.8', '201.1.1.1']\r\n[('www.xxx.com', 'A', <DNS IN A rdata: 67.23.112.226>), ('abc.lu', 'SOA', <DNS IN SOA rdata: neptun.vo.lu. Administrator.vo.lu. 2006063001 86400 7200 2419200 3600>), ('abc.lu', 'MX', <DNS IN MX rdata: 10 proteus.vo.lu.>), ('foo.be', 'A', <DNS IN A rdata: 188.65.217.78>), ('foo.be', 'AAAA', <DNS IN AAAA rdata: 2001:6f8:202:2df::2>), ('foo.be', 'SOA', <DNS IN SOA rdata: ka.quuxlabs.com. adulau.foo.be. 2010121901 21600 3600 604800 86400>), ('foo.be', 'MX', <DNS IN MX rdata: 10 mail.foo.be.>), ('www.belnet.be', 'A', <DNS IN A rdata: 193.190.130.15>), ('www.belnet.be', 'AAAA', <DNS IN AAAA rdata: 2001:6a8:3c80:8300::15>), ('www.belnet.be', 'CNAME', <DNS IN CNAME rdata: fiorano.belnet.be.>), ('www.cert.be', 'A', <DNS IN A rdata: 193.190.198.61>), ('www.cert.be', 'AAAA', <DNS IN AAAA rdata: 2001:6a8:3c80::61>), ('www.cert.be', 'SOA', <DNS IN SOA rdata: ns.belnet.be. hostmaster.belnet.be. 2013053039 360 180 1209600 3600>), ('www.cert.be', 'MX', <DNS IN MX rdata: 10 asp-mxa.belnet.be.>), ('www.cert.be', 'CNAME', <DNS IN CNAME rdata: cert.be.>), ('www.public.lu', 'A', <DNS IN A rdata: 194.154.200.74>), ('www.allo.lu', 'A', <DNS IN A rdata: 80.90.47.69>), ('www.eurodns.com', 'A', <DNS IN A rdata: 80.92.65.165>), ('www.google.lu', 'A', <DNS IN A rdata: 173.194.66.94>), ('www.google.lu', 'AAAA', <DNS IN AAAA rdata: 2a00:1450:400c:c03::5e>), ('www.facebook.com', 'A', <DNS IN A rdata: 31.13.64.1>), ('www.facebook.com', 'AAAA', <DNS IN AAAA rdata: 2a03:2880:10:8f07:face:b00c::1>), ('www.facebook.com', 'MX', <DNS IN MX rdata: 10 msgin.t.facebook.com.>), ('www.facebook.com', 'CNAME', <DNS IN CNAME rdata: star.c10r.facebook.com.>), ('www.nic.ru', 'A', <DNS IN A rdata: 194.85.61.42>), ('www.nic.ru', 'MX', <DNS IN MX rdata: 0 nomail.nic.ru.>), ('