XML parsing (minidom) check

This is the place for queries that don't fit in any of the other categories.

XML parsing (minidom) check

Postby KHarvey » Wed Apr 03, 2013 6:13 pm

I have a custom OS that I am coding on, so I have not tried compiling or adding any XML modules to Python. I am just using the default xml.dom

I have an XML document that looks kind of like this:
Code: Select all
<?xml version="1.0"?>
<?xml-stylesheet href="file:///C:/Program Files (x86)/Nmap/nmap.xsl" type="text/xsl"?>
<!-- Nmap 6.01 scan initiated Wed Apr 03 09:01:01 2013 as: nmap -O -oX myscan.xml 10.1.20.1, 10.1.2.10, 10.1.2.50, 10.1.2.42, 10.1.110.30, 10.1.2.18 -->
<nmaprun scanner="nmap" args="nmap -O -n -PN -oX myscan.xml 10.1.20.1, 10.1.2.10, 10.1.2.50, 10.1.2.42, 10.1.110.30, 10.1.2.18" start="1365001261" startstr="Wed Apr 03 09:01:01 2013" version="6.01" xmloutputversion="1.04">
<scaninfo type="syn" protocol="tcp" numservices="1000" services="1,3-4,6-7,9,13,17,19-26,30,32-33,37,42-43,49,53,70,79-85,88-90,99-100,106,109-111,113,119,125,135,139,143-144,146,161,163,179,199,211-212,222,254-256,259,264,280,301,306,311,340,366,389,406-407,416-417,425,427,443-445,458,464-465,481,497,500,512-515,524,541,543-545,548,554-555,563,587,593,616-617,625,631,636,646,648,666-668,683,687,691,700,705,711,714,720,722,726,749,765,777,783,787,800-801,808,843,873,880,888,898,900-903,911-912,981,987,990,992-993,995,999-1002,1007,1009-1011,1021-1100,1102,1104-1108,1110-1114,1117,1119,1121-1124,1126,1130-1132,1137-1138,1141,1145,1147-1149,1151-1152,1154,1163-1166,1169,1174-1175,1183,1185-1187,1192,1198-1199,1201,1213,1216-1218,1233-1234,1236,1244,1247-1248,1259,1271-1272,1277,1287,1296,1300-1301,1309-1311,1322,1328,1334,1352,1417,1433-1434,1443,1455,1461,1494,1500-1501,1503,1521,1524,1533,1556,1580,1583,1594,1600,1641,1658,1666,1687-1688,1700,1717-1721,1723,1755,1761,1782-1783,1801,1805,1812,1839-1840,1862-1864,1875,1900,1914,1935,1947,1971-1972,1974,1984,1998-2010,2013,2020-2022,2030,2033-2035,2038,2040-2043,2045-2049,2065,2068,2099-2100,2103,2105-2107,2111,2119,2121,2126,2135,2144,2160-2161,2170,2179,2190-2191,2196,2200,2222,2251,2260,2288,2301,2323,2366,2381-2383,2393-2394,2399,2401,2492,2500,2522,2525,2557,2601-2602,2604-2605,2607-2608,2638,2701-2702,2710,2717-2718,2725,2800,2809,2811,2869,2875,2909-2910,2920,2967-2968,2998,3000-3001,3003,3005-3007,3011,3013,3017,3030-3031,3052,3071,3077,3128,3168,3211,3221,3260-3261,3268-3269,3283,3300-3301,3306,3322-3325,3333,3351,3367,3369-3372,3389-3390,3404,3476,3493,3517,3527,3546,3551,3580,3659,3689-3690,3703,3737,3766,3784,3800-3801,3809,3814,3826-3828,3851,3869,3871,3878,3880,3889,3905,3914,3918,3920,3945,3971,3986,3995,3998,4000-4006,4045,4111,4125-4126,4129,4224,4242,4279,4321,4343,4443-4446,4449,4550,4567,4662,4848,4899-4900,4998,5000-5004,5009,5030,5033,5050-5051,5054,5060-5061,5080,5087,5100-5102,5120,5190,5200,5214,5221-5222,5225-5226,5269,5280,5298,5357,5405,5414,5431-5432,5440,5500,5510,5544,5550,5555,5560,5566,5631,5633,5666,5678-5679,5718,5730,5800-5802,5810-5811,5815,5822,5825,5850,5859,5862,5877,5900-5904,5906-5907,5910-5911,5915,5922,5925,5950,5952,5959-5963,5987-5989,5998-6007,6009,6025,6059,6100-6101,6106,6112,6123,6129,6156,6346,6389,6502,6510,6543,6547,6565-6567,6580,6646,6666-6669,6689,6692,6699,6779,6788-6789,6792,6839,6881,6901,6969,7000-7002,7004,7007,7019,7025,7070,7100,7103,7106,7200-7201,7402,7435,7443,7496,7512,7625,7627,7676,7741,7777-7778,7800,7911,7920-7921,7937-7938,7999-8002,8007-8011,8021-8022,8031,8042,8045,8080-8090,8093,8099-8100,8180-8181,8192-8194,8200,8222,8254,8290-8292,8300,8333,8383,8400,8402,8443,8500,8600,8649,8651-8652,8654,8701,8800,8873,8888,8899,8994,9000-9003,9009-9011,9040,9050,9071,9080-9081,9090-9091,9099-9103,9110-9111,9200,9207,9220,9290,9415,9418,9485,9500,9502-9503,9535,9575,9593-9595,9618,9666,9876-9878,9898,9900,9917,9929,9943-9944,9968,9998-10004,10009-10010,10012,10024-10025,10082,10180,10215,10243,10566,10616-10617,10621,10626,10628-10629,10778,11110-11111,11967,12000,12174,12265,12345,13456,13722,13782-13783,14000,14238,14441-14442,15000,15002-15004,15660,15742,16000-16001,16012,16016,16018,16080,16113,16992-16993,17877,17988,18040,18101,18988,19101,19283,19315,19350,19780,19801,19842,20000,20005,20031,20221-20222,20828,21571,22939,23502,24444,24800,25734-25735,26214,27000,27352-27353,27355-27356,27715,28201,30000,30718,30951,31038,31337,32768-32785,33354,33899,34571-34573,35500,38292,40193,40911,41511,42510,44176,44442-44443,44501,45100,48080,49152-49161,49163,49165,49167,49175-49176,49400,49999-50003,50006,50300,50389,50500,50636,50800,51103,51493,52673,52822,52848,52869,54045,54328,55055-55056,55555,55600,56737-56738,57294,57797,58080,60020,60443,61532,61900,62078,63331,64623,64680,65000,65129,65389"/>
<verbose level="0"/>
<debugging level="0"/>
<host starttime="1365001265" endtime="1365001285"><status state="up" reason="arp-response"/>
<address addr="10.1.2.10" addrtype="ipv4"/>
<address addr="00:50:56:A4:2C:95" addrtype="mac" vendor="VMware"/>
<hostnames>
</hostnames>
<ports><extraports state="closed" count="993">
<extrareasons reason="resets" count="993"/>
</extraports>
<port protocol="tcp" portid="135"><state state="open" reason="syn-ack" reason_ttl="128"/><service name="msrpc" method="table" conf="3"/></port>
<port protocol="tcp" portid="139"><state state="open" reason="syn-ack" reason_ttl="128"/><service name="netbios-ssn" method="table" conf="3"/></port>
<port protocol="tcp" portid="445"><state state="open" reason="syn-ack" reason_ttl="128"/><service name="microsoft-ds" method="table" conf="3"/></port>
<port protocol="tcp" portid="10000"><state state="open" reason="syn-ack" reason_ttl="128"/><service name="snet-sensor-mgmt" method="table" conf="3"/></port>
<port protocol="tcp" portid="49152"><state state="open" reason="syn-ack" reason_ttl="128"/><service name="unknown" method="table" conf="3"/></port>
<port protocol="tcp" portid="49153"><state state="open" reason="syn-ack" reason_ttl="128"/><service name="unknown" method="table" conf="3"/></port>
<port protocol="tcp" portid="49154"><state state="open" reason="syn-ack" reason_ttl="128"/><service name="unknown" method="table" conf="3"/></port>
</ports>
<os><portused state="open" proto="tcp" portid="135"/>
<portused state="closed" proto="tcp" portid="1"/>
<portused state="closed" proto="udp" portid="40013"/>
<osmatch name="Microsoft Windows 7 or Windows Server 2008 SP1" accuracy="100" line="46366">
<osclass type="general purpose" vendor="Microsoft" osfamily="Windows" osgen="7" accuracy="100"><cpe>cpe:/o:microsoft:windows_7</cpe></osclass>
<osclass type="general purpose" vendor="Microsoft" osfamily="Windows" osgen="2008" accuracy="100"><cpe>cpe:/o:microsoft:windows_server_2008::sp1</cpe></osclass>
</osmatch>
</os>
<uptime seconds="4527850" lastboot="Sat Feb 09 22:17:17 2013"/>
<distance value="1"/>
<tcpsequence index="257" difficulty="Good luck!" values="3C1328EC,648FFF37,C377E3EF,1D4F32A9,762682D2,ECF8C088"/>
<ipidsequence class="Incremental" values="73BF,73C1,73C2,73C5,73C6,73C9"/>
<tcptssequence class="100HZ" values="1AFCF232,1AFCF23C,1AFCF246,1AFCF250,1AFCF25A,1AFCF264"/>
<times srtt="113" rttvar="212" to="100000"/>
</host>
<host starttime="1365001265" endtime="1365001287"><status state="up" reason="arp-response"/>
<address addr="10.1.2.42" addrtype="ipv4"/>
<address addr="00:E0:D8:10:6F:1E" addrtype="mac" vendor="Lanbit Computer"/>
<hostnames>
</hostnames>
<ports><extraports state="filtered" count="995">
<extrareasons reason="no-responses" count="995"/>
</extraports>
<port protocol="tcp" portid="22"><state state="open" reason="syn-ack" reason_ttl="200"/><service name="ssh" method="table" conf="3"/></port>
<port protocol="tcp" portid="23"><state state="open" reason="syn-ack" reason_ttl="200"/><service name="telnet" method="table" conf="3"/></port>
<port protocol="tcp" portid="25"><state state="open" reason="syn-ack" reason_ttl="200"/><service name="smtp" method="table" conf="3"/></port>
<port protocol="tcp" portid="80"><state state="open" reason="syn-ack" reason_ttl="200"/><service name="http" method="table" conf="3"/></port>
<port protocol="tcp" portid="443"><state state="open" reason="syn-ack" reason_ttl="200"/><service name="https" method="table" conf="3"/></port>
</ports>
<os><portused state="open" proto="tcp" portid="22"/>
</os>
<distance value="1"/>
<tcpsequence index="31" difficulty="Good luck!" values="14BED,14BF9,14C01,14C0B,14C15,14C1F"/>
<ipidsequence class="Incremental" values="D4F,D50,D51,D52,D53,D54"/>
<tcptssequence class="none returned (unsupported)"/>
<times srtt="2148" rttvar="2232" to="100000"/>
</host>
<host starttime="1365001265" endtime="1365001285"><status state="up" reason="arp-response"/>
<address addr="10.1.2.50" addrtype="ipv4"/>
<address addr="00:14:C2:C1:20:39" addrtype="mac" vendor="Hewlett-Packard Company"/>
<hostnames>
</hostnames>
<ports><extraports state="closed" count="983">
<extrareasons reason="resets" count="983"/>
</extraports>
<port protocol="tcp" portid="21"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="ftp" method="table" conf="3"/></port>
<port protocol="tcp" portid="22"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="ssh" method="table" conf="3"/></port>
<port protocol="tcp" portid="23"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="telnet" method="table" conf="3"/></port>
<port protocol="tcp" portid="37"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="time" method="table" conf="3"/></port>
<port protocol="tcp" portid="111"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="rpcbind" method="table" conf="3"/></port>
<port protocol="tcp" portid="113"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="ident" method="table" conf="3"/></port>
<port protocol="tcp" portid="512"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="exec" method="table" conf="3"/></port>
<port protocol="tcp" portid="513"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="login" method="table" conf="3"/></port>
<port protocol="tcp" portid="514"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="shell" method="table" conf="3"/></port>
<port protocol="tcp" portid="543"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="klogin" method="table" conf="3"/></port>
<port protocol="tcp" portid="2049"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="nfs" method="table" conf="3"/></port>
<port protocol="tcp" portid="6000"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="X11" method="table" conf="3"/></port>
<port protocol="tcp" portid="6001"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="X11:1" method="table" conf="3"/></port>
<port protocol="tcp" portid="7100"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="font-service" method="table" conf="3"/></port>
<port protocol="tcp" portid="32768"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="filenet-tms" method="table" conf="3"/></port>
<port protocol="tcp" portid="32769"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="filenet-rpc" method="table" conf="3"/></port>
<port protocol="tcp" portid="32776"><state state="open" reason="syn-ack" reason_ttl="64"/><service name="sometimes-rpc15" method="table" conf="3"/></port>
</ports>
<os><portused state="open" proto="tcp" portid="21"/>
<portused state="closed" proto="tcp" portid="1"/>
<portused state="closed" proto="udp" portid="40371"/>
<osmatch name="Linux 2.4.21" accuracy="100" line="29874">
<osclass type="general purpose" vendor="Linux" osfamily="Linux" osgen="2.4.X" accuracy="100"><cpe>cpe:/o:linux:kernel:2.4.21</cpe></osclass>
</osmatch>
<osmatch name="Linux 2.4.21 - 2.4.27" accuracy="100" line="29976">
<osclass type="general purpose" vendor="Linux" osfamily="Linux" osgen="2.4.X" accuracy="100"><cpe>cpe:/o:linux:kernel:2.4</cpe></osclass>
</osmatch>
</os>
<uptime seconds="37467516" lastboot="Wed Jan 25 16:22:51 2012"/>
<distance value="1"/>
<tcpsequence index="200" difficulty="Good luck!" values="3A937D20,3A794163,3B0F1465,3A801ED0,3ABFCA70,3B15D18F"/>
<ipidsequence class="All zeros" values="0,0,0,0,0,0"/>
<tcptssequence class="100HZ" values="DF52E308,DF52E312,DF52E31C,DF52E326,DF52E330,DF52E33A"/>
<times srtt="49" rttvar="95" to="100000"/>
</host>
<runstats><finished time="1365001292" timestr="Wed Apr 03 09:01:32 2013" elapsed="30.83" summary="Nmap done at Wed Apr 03 09:01:32 2013; 11 IP addresses (6 hosts up) scanned in 30.83 seconds" exit="success"/><hosts up="6" down="5" total="11"/>
</runstats>
</nmaprun>


I have a piece of test code that appears to be working, but I wanted to make sure that is was done correctly.
Code: Select all
import xml.dom import minidom

withopen("myscan.xml") as nmap_scan:
   xml_nmap_dom = minidom.parse(nmap_scan)
   for xml_nmap_node in xml_nmap_dom.getElementsByTagName("host"):
      for xml_nmap_subnode in xml_nmap_node.childNodes:
         if xml_nmap_subnode.nodeName == "address":
            if "ip" in xml_nmap_subnode.getAttribute("addrtype"):
               print xml_nmap_subnode.getAttribute("addr")
         if xml_nmap_subnode.nodeName == "os":
            for xml_nmap_os_subnode in xml_nmap_subnode.childNOdes:
               if xml_nmap_os_subnode.nodeName == "osmatch":
                  if "100" in xml_nmap_os_subnode.getAttribute("accuracy"):
                     print xml_nmap_os_subnode.getAttribute("name")


Pretty much I am just pulling the IP address and the OS from the XML.
As I said this is just test code and I am just printing the variables to make sure they are correct. I will be assigning the variables to either a tuple or a dictionary for processing later on in my script.

While the code itself appears to be correct, it still looks a little off to me. I guess when I start nesting multiple for if statements it starts to look bad.

I know I should wrap this into a class, but I haven't become comfortable with classes yet. I just haven't had the time in the last week to sit down and figure out the parts that I don't understand.

Is there a better way to handle how I am parsing the XML?
KHarvey
 
Posts: 34
Joined: Tue Mar 19, 2013 5:13 pm
Location: US

Re: XML parsing (minidom) check

Postby snippsat » Thu Apr 04, 2013 5:27 am

but I wanted to make sure that is was done correctly

It's not working as it is now,you have syntax error in both line 1 and 3.

I guess when I start nesting multiple for if statements it starts to look bad.

Yes that don't look so good.you should try to avoid to much nesting

I have just looked at an tested minidom little.
Python has 2 star when it`s come to parsing,BeautifulSoup and lxml
I have used both,mostly BeautifulSoup wish i have used for many years.
lxml is advance and has a lot of stuff like Xpaht and CSS selector,both can read not correct(malformed)html/xml(this is very important when scraping the web).
The build in parser for Python blow up if html/xml is not correct.

A quick demo with BeautifulSoup.
Code: Select all
from bs4 import BeautifulSoup

xml = '''\
<address addr="10.1.2.50" addrtype="ipv4"/>
<address addr="10.1.2.42" addrtype="ipv4"/>
<osmatch name="Linux 2.4.21" accuracy="100" line="29874">'''

soup = BeautifulSoup(xml)
ip = soup.find_all('address')
os = soup.find('osmatch')
print [i['addr'] for i in ip] #['10.1.2.50', '10.1.2.42']
print os['name'] #Linux 2.4.21
User avatar
snippsat
 
Posts: 85
Joined: Thu Feb 21, 2013 12:04 am

Re: XML parsing (minidom) check

Postby stranac » Thu Apr 04, 2013 9:30 am

KHarvey wrote:I have a custom OS that I am coding on, so I have not tried compiling or adding any XML modules to Python. I am just using the default xml.dom


If you have t use a built-in module, I'd recommend xml.etree.ElementTree.
It's much easier to work with, and more powerful, than xml.dom.minidom.
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 886
Joined: Thu Feb 07, 2013 3:42 pm

Re: XML parsing (minidom) check

Postby KHarvey » Thu Apr 04, 2013 5:04 pm

stranac wrote:
KHarvey wrote:I have a custom OS that I am coding on, so I have not tried compiling or adding any XML modules to Python. I am just using the default xml.dom


If you have t use a built-in module, I'd recommend xml.etree.ElementTree.
It's much easier to work with, and more powerful, than xml.dom.minidom.


Sweet, thank you very much stranac.

I had not used that module before.

I could probably compile some other modules, but I am a bit of a minimalist, and sometimes it is a pain to compile modules. It took me forever to get mypyconn compiled and working.

So this is my code now (using the same XML from above):
Code: Select all
import xml.etree.ElementTree

xml_tree = xml.etree.ElementTree.parse("myscan.xml")
xml_root = xml_tree.getroot()

for xml_host in xml_root.findall("host"):
   if xml_host.find("address").attrib["addrtype"] == "ipv4":
      print xml_host.find("address").attrib["addr"]
   try:
      if xml_host.find("./os/osmatch").attrib["accuracty"] == "100":
         print xml_host.find("./os/osmatch").attrib["name"]
   except:
      AttributeError


I had to do the exception in case there was no actual osmatch in the XML and it appears to be working. Much simpler and to the point, I like it.

Although, since I am no longer using "with" do I need to close the xml.etree.ElementTree when I am done with it?
KHarvey
 
Posts: 34
Joined: Tue Mar 19, 2013 5:13 pm
Location: US

Re: XML parsing (minidom) check

Postby setrofim » Thu Apr 04, 2013 5:19 pm

KHarvey wrote:Although, since I am no longer using "with" do I need to close the xml.etree.ElementTree when I am done with it?

No you don't. If you pass in a file path on construction, ElementTree will open the file, read its entire contents, build the tree and the close the file. So by the time you get an ElementTree object, the xml file has already been read, and there are no handles open to it.
setrofim
 
Posts: 285
Joined: Mon Mar 04, 2013 7:52 pm


Return to General Coding Help

Who is online

Users browsing this forum: No registered users and 3 guests