searching and enlisting thru dictionary from a file

This is the place for queries that don't fit in any of the other categories.

searching and enlisting thru dictionary from a file

Postby lovecodecakes » Mon Feb 18, 2013 8:19 pm

I have an html source code that I converted to txt.saved it in a list and wanted list the rank of name popularity,boy & girl both.
Uploading it here says The upload was rejected because the uploaded file was identified as a possible attack vector.
So Im gona paste it in the following post here. Copy and paste just as is here in a notepad.

The code is:

Code: Select all
def indexer(fname): #lister is alist arg
    diction={}
    for line in re.findall(r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>',fname,re.IGNORECASE):
        #print namelist
        diction[line[0]]=line[1:] #namelist
    return diction


which gives:in short it works nicely
Code: Select all
'550': ('Bryson', 'Cayla'),
 '551': ('Carter', 'Brandie'),
 '552': ('Jace', 'Chantal'),
 '553': ('Don', 'Brittanie'),
 '554': ('Jimmie', 'Terri'),
 '555': ('Marquise', 'Susana'),
 '556': ('Everett', 'Araceli'),
 '557': ('Malik', 'Alexia'),
 '558': ('Lane', 'Eliza'),
 '559': ('Arnold', 'Ashlyn'),
 '56': ('Corey', 'Vanessa'),
 '560': ('Marcel', 'Edith'),
 '561': ('Johnnie', 'Joanne'),
 '562': ('Ahmad', 'Jena'),
 '563': ('Santiago', 'Skylar'),
 '564': ('Tyree', 'Kirstin'),
 '565': ('Guy', 'Shelly'),
 '566': ('Milton', 'Kiera'),
 '567': ('Salvatore', 'Katelin'),
 '568': ('Jerrod', 'Aisha'),
 '569': ('Darrius', 'Sade'),
 '57': ('Bryan', 'Kathryn'),
 '570': ('Kristian', 'Kyra'),
 '571': ('Lamont', 'Paulina'),
 '572': ('Mitchel', 'Tamika'),
 '573': ('Nigel', 'Tess'),
 '574': ('Freddie', 'Cecily'),
 '575': ('Kareem', 'Judy'),
 '576': ('Alvaro', 'Maegan'),
 '577': ('Toby', 'Betty'),
 '578': ('Jakob', 'Madeleine'),
 '579': ('Bradford', 'Allie'),
 '58': ('Ethan', 'Morgan'),
 '580': ('Blair', 'Lynn'),
 '581': ('Thaddeus', 'Sherry'),
.
.
 '997': ('Eliezer', 'Asha'),
 '998': ('Jory', 'Jada'),
 '999': ('Misael', 'Leila')}


But as you can see evrytime it rolls over to next digits after ??9, it goes to 2 digit.like 579>58>580. This way I don't get the actual count through the dictionary method. why is that happening?


At the same time:
Code: Select all
def indexer(fname): #lister is alist arg
    diction={}
    namelist=[]
    for line in re.findall(r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>',fname,re.IGNORECASE):
        #print names
        #print line[1:]
        namelist.append(line[1:])
        #print namelist
        diction[line[0]]=namelist
        del namelist[:]
    print diction

this shows following o/p:
Code: Select all
.
.
 '555': [],
 '556': [],
 '557': [],
 '558': [],
 '559': [],
 '56': [],
 '560': [],
 '561': [],
 '562': [],
 '563': [],
 '564': [],
 '565': [],
 '566': [],
 '567': [],
 '568': [],
 '569': [],
 '57': [],
 '570': [],
 '571': [],
 '572': [],
 '573': [],
 '574': [],
 '575': [],
 '576': [],
 '577': [],
 '578': [],
 '579': [],
 '58': [],
 '580': [],
 '581': [],
 '582': [],
 '583': [],
 '584': [],
 '585': [],
 '586': [],
 '587': [],
 '588': [],
 '589': [],
 '59': [],
 '590': [],
 '591': [],
 '592': [],
 '593': [],
 '594': [],
 '595': [],
 '596': [],
 '597': [],
 '598': [],
 '599': [],
 '6': [],
 '60': [],
 '600': [],
 '601': [],
 '602': [],
 '603': [],
 '604': [],
 '605': [],
 '606': [],
 '607': [],
 '608': [],
 '609': [],
 '61': [],
 '610': [],
 '611': [],
 '612': [],
 '613': [],
 '614': [],
 '615': [],
 '616': [],
 '617': [],
 '618': [],
 '619': [],
 '62': [],
 '620': [],
 '621': [],
 '622': [],
 '623': [],
 '624': [],
 '625': [],
 '626': [],
 '627': [],
 '628': [],
 '629': [],
 '63': [],
 '630': [],
 '631': [],
 '632': [],
 '633': [],
 '634': [],
 '635': [],
 '636': [],
 '637': [],
 '638': [],
 '639': [],
 '64': [],
 '640': [],
 '641': [],
 '642': [],
 '643': [],
 '644': [],
 '645': [],
 '646': [],
 '647': [],
 '648': [],
 '649': [],
 '65': [],
 '650': [],
 '651': [],
 '652': [],
 '653': [],
 '654': [],
 '655': [],
 '656': [],
 '657': [],
 '658': [],
 '659': [],
 '66': [],
 '660': [],
 '661': [],
 '662': [],
 '663': [],
 '664': [],
 '665': [],
 '666': [],
 '667': [],
 '668': [],
 '669': [],
 '67': [],
 '670': [],
 '671': [],
 '672': [],
 '673': [],
 '674': [],
 '675': [],
 '676': [],
 '677': [],
 '678': [],
 '679': [],
 '68': [],
 '680': [],
 '681': [],
 '682': [],
 '683': [],
 '684': [],
 '685': [],
 '686': [],
 '687': [],
 '688': [],
 '689': [],
 '69': [],
 '690': [],
 '691': [],
 '692': [],
 '693': [],
 '694': [],
 '695': [],
 '696': [],
 '697': [],
 '698': [],
 '699': [],
 '7': [],
 '70': [],
 '700': [],
 '701': [],
 '702': [],
 '703': [],
 '704': [],
 '705': [],
 '706': [],
 '707': [],
 '708': [],
 '709': [],
 '71': [],
 '710': [],
 '711': [],
 '712': [],
 '713': [],
 '714': [],
 '715': [],
 '716': [],
 '717': [],
 '718': [],
 '719': [],
 '72': [],
 '720': [],
 '721': [],
 '722': [],
 '723': [],
 '724': [],
 '725': [],
 '726': [],
 '727': [],
 '728': [],
 '729': [],
 '73': [],
 '730': [],
 '731': [],
 '732': [],
 '733': [],
 '734': [],
 '735': [],
 '736': [],
 '737': [],
 '738': [],
 '739': [],
 '74': [],
 '740': [],
 '741': [],
 '742': [],
 '743': [],
 '744': [],
 '745': [],
 '746': [],
 '747': [],
 '748': [],
 '749': [],
 '75': [],
 '750': [],
 '751': [],
 '752': [],
 '753': [],
 '754': [],
 '755': [],
 '756': [],
 '757': [],
 '758': [],
 '759': [],
 '76': [],
 '760': [],
 '761': [],
 '762': [],
 '763': [],
 '764': [],
 '765': [],
 '766': [],
 '767': [],
 '768': [],
 '769': [],
 '77': [],
 '770': [],
 '771': [],
 '772': [],
 '773': [],
 '774': [],
 '775': [],
 '776': [],
 '777': [],
 '778': [],
 '779': [],
 '78': [],
 '780': [],
 '781': [],
 '782': [],
 '783': [],
 '784': [],
 '785': [],
 '786': [],
 '787': [],
 '788': [],
 '789': [],
 '79': [],
 '790': [],
 '791': [],
 '792': [],
 '793': [],
 '794': [],
 '795': [],
 '796': [],
 '797': [],
 '798': [],
 '799': [],
 '8': [],
 '80': [],
 '800': [],
 '801': [],
 '802': [],
 '803': [],
 '804': [],
 '805': [],
 '806': [],
 '807': [],
 '808': [],
 '809': [],
 '81': [],
 '810': [],
 '811': [],
 '812': [],
 '813': [],
 '814': [],
 '815': [],
 '816': [],
 '817': [],
 '818': [],
 '819': [],
 '82': [],
 '820': [],
 '821': [],
 '822': [],
 '823': [],
 '824': [],
 '825': [],
 '826': [],
 '827': [],
 '828': [],
 '829': [],
 '83': [],
 '830': [],
 '831': [],
 '832': [],
 '833': [],
 '834': [],
 '835': [],
 '836': [],
 '837': [],
 '838': [],
 '839': [],
 '84': [],
 '840': [],
 '841': [],
 '842': [],
 '843': [],
 '844': [],
 '845': [],
 '846': [],
 '847': [],
 '848': [],
 '849': [],
 '85': [],
 '850': [],
 '851': [],
 '852': [],
 '853': [],
 '854': [],
 '855': [],
 '856': [],
 '857': [],
 '858': [],
 '859': [],
 '86': [],
 '860': [],
 '861': [],
 '862': [],
 '863': [],
 '864': [],
 '865': [],
 '866': [],
 '867': [],
 '868': [],
 '869': [],
 '87': [],
 '870': [],
 '871': [],
 '872': [],
 '873': [],
 '874': [],
 '875': [],
 '876': [],
 '877': [],
 '878': [],
 '879': [],
 '88': [],
 '880': [],
 '881': [],
 '882': [],
 '883': [],
 '884': [],
 '885': [],
 '886': [],
 '887': [],
 '888': [],
 '889': [],
 '89': [],
 '890': [],
 '891': [],
 '892': [],
 '893': [],
 '894': [],
 '895': [],
 '896': [],
 '897': [],
 '898': [],
 '899': [],
 '9': [],
 '90': [],
 '900': [],
 '901': [],
 '902': [],
 '903': [],
 '904': [],
 '905': [],
 '906': [],
 '907': [],
 '908': [],
 '909': [],
 '91': [],
 '910': [],
 '911': [],
 '912': [],
 '913': [],
 '914': [],
 '915': [],
 '916': [],
 '917': [],
 '918': [],
 '919': [],
 '92': [],
 '920': [],
 '921': [],
 '922': [],
 '923': [],
 '924': [],
 '925': [],
 '926': [],
 '927': [],
 '928': [],
 '929': [],
 '93': [],
 '930': [],
 '931': [],
 '932': [],
 '933': [],
 '934': [],
 '935': [],
 '936': [],
 '937': [],
 '938': [],
 '939': [],
 '94': [],
 '940': [],
 '941': [],
 '942': [],
 '943': [],
 '944': [],
 '945': [],
 '946': [],
 '947': [],
 '948': [],
 '949': [],
 '95': [],
 '950': [],
 '951': [],
 '952': [],
 '953': [],
 '954': [],
 '955': [],
 '956': [],
 '957': [],
 '958': [],
 '959': [],
 '96': [],
 '960': [],
 '961': [],
 '962': [],
 '963': [],
 '964': [],
 '965': [],
 '966': [],
 '967': [],
 '968': [],
 '969': [],
 '97': [],
 '970': [],
 '971': [],
 '972': [],
 '973': [],
 '974': [],
 '975': [],
 '976': [],
 '977': [],
 '978': [],
 '979': [],
 '98': [],
 '980': [],
 '981': [],
 '982': [],
 '983': [],
 '984': [],
 '985': [],
 '986': [],
 '987': [],
 '988': [],
 '989': [],
 '99': [],
 '990': [],
 '991': [],
 '992': [],
 '993': [],
 '994': [],
 '995': [],
 '996': [],
 '997': [],
 '998': [],
 '999': []}


When del is happening afterwards, why is list still empty??

The site in question is http://codes.lv/codes/Google_app_engine ... y1990.html
Last edited by stranac on Mon Feb 18, 2013 9:47 pm, edited 1 time in total.
Reason: Merged the three posts to one, and replaced html source with a link
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm

Re: searching and enlisting thru dictionary from a file

Postby metulburr » Mon Feb 18, 2013 9:07 pm

If your over all goal is to parse html, have you tried BeautifulSoup or some other parser?
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1387
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: searching and enlisting thru dictionary from a file

Postby stranac » Mon Feb 18, 2013 10:00 pm

lovecodecakes wrote:But as you can see evrytime it rolls over to next digits after ??9, it goes to 2 digit.like 579>58>580. This way I don't get the actual count through the dictionary method. why is that happening?

Dicts are unordered, and the order of what is displayed when you print one is completely random.

The items seem to be printed sorted in your case.
The reason '579' comes before '58' is because string(which is what your keys are) are sorted alphabetically.

That said, I wouldn't use a dict at all.
You can just append the names to a list, and get them by index.
The most popular one will be names[0], the next names[1], and so on...

lovecodecakes wrote:When del is happening afterwards, why is list still empty??

Code: Select all
some_list = [1, 2, 3]
some_dict['key'] = some_list

After this, some_list and some_dict['key'] refer to the same list.
That is, they are just 2 names for the same object.
If you empty the list, the object both names point to will be empty.
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1096
Joined: Thu Feb 07, 2013 3:42 pm

Re: searching and enlisting thru dictionary from a file

Postby lovecodecakes » Tue Feb 19, 2013 8:21 am

Sort of like pointer?
stranac wrote:
lovecodecakes wrote:When del is happening afterwards, why is list still empty??

Code: Select all
some_list = [1, 2, 3]
some_dict['key'] = some_list

After this, some_list and some_dict['key'] refer to the same list.
That is, they are just 2 names for the same object.
If you empty the list, the object both names point to will be empty.
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm

Re: searching and enlisting thru dictionary from a file

Postby lovecodecakes » Tue Feb 19, 2013 8:32 am

You mean something like this?:
Code: Select all
namelist=[]
def indexer(fname): #lister is alist arg
    global namelist
    #diction={}
    alist=[]
    for line in re.findall(r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>',fname,re.IGNORECASE):
        #print names
        #print line[1:]
        namelist.append(line)
        #print namelist
        #alist[line[0]]=line[1:]
    return namelist



stranac wrote:That said, I wouldn't use a dict at all.
You can just append the names to a list, and get them by index.
The most popular one will be names[0], the next names[1], and so on...
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm

Re: searching and enlisting thru dictionary from a file

Postby lovecodecakes » Tue Feb 19, 2013 8:33 am

I have heard of it. But what is it? an ext lib?
metulburr wrote:If your over all goal is to parse html, have you tried BeautifulSoup or some other parser?
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm

Re: searching and enlisting thru dictionary from a file

Postby Mekire » Tue Feb 19, 2013 9:24 am

lovecodecakes wrote:Sort of like pointer?

Sort of exactly like a pointer:
Code: Select all
mydict = {}
mylist = [1,3,4,5,6]
mydict["a"] = mylist

print(hex(id(mylist)))
print(hex(id(mydict["a"])))
print(mylist is mydict["a"])
Code: Select all
0x22bbad0
0x22bbad0
True
As you can see the memory addresses are the same; although whether or not thinking about it in this respect will help you or not is debatable.

I actually just wrote a lengthy post somewhat related to this if you are interested:
http://python-forum.org/viewtopic.php?f=10&t=272

-Mek
User avatar
Mekire
 
Posts: 986
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: searching and enlisting thru dictionary from a file

Postby lovecodecakes » Tue Feb 19, 2013 9:52 am

got it
Mekire wrote:
lovecodecakes wrote:Sort of like pointer?

Sort of exactly like a pointer:
Code: Select all
mydict = {}
mylist = [1,3,4,5,6]
mydict["a"] = mylist

print(hex(id(mylist)))
print(hex(id(mydict["a"])))
print(mylist is mydict["a"])
Code: Select all
0x22bbad0
0x22bbad0
True
As you can see the memory addresses are the same; although whether or not thinking about it in this respect will help you or not is debatable.

I actually just wrote a lengthy post somewhat related to this if you are interested:
http://python-forum.org/viewtopic.php?f=10&t=272

-Mek
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm

Re: searching and enlisting thru dictionary from a file

Postby stranac » Tue Feb 19, 2013 10:40 am

lovecodecakes wrote:You mean something like this?:
Code: Select all
namelist=[]
def indexer(fname): #lister is alist arg
    global namelist
    #diction={}
    alist=[]
    for line in re.findall(r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>',fname,re.IGNORECASE):
        #print names
        #print line[1:]
        namelist.append(line)
        #print namelist
        #alist[line[0]]=line[1:]
    return namelist



More like this:
def indexer(fname):
return re.findall(r'<td>\d+</td><td>(\w+)</td>\<td>(\w+)</td>', fname, re.IGNORECASE)

But I would probably use an html parsing library, as suggested by metulburr.

lovecodecakes wrote:I have heard of it. But what is it? an ext lib?

Yes, BeautifulSoup is a 3rd party html parsing library.
But I would recommend lxml.html, as it is faster, more powerful, and simpler to use.
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1096
Joined: Thu Feb 07, 2013 3:42 pm

Re: searching and enlisting thru dictionary from a file

Postby lovecodecakes » Tue Feb 19, 2013 10:47 am

Would go through that
stranac wrote:
But I would probably use an html parsing library, as suggested by metulburr.

lovecodecakes wrote:I have heard of it. But what is it? an ext lib?

Yes, BeautifulSoup is a 3rd party html parsing library.
But I would recommend lxml.html, as it is faster, more powerful, and simpler to use.
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm

Re: searching and enlisting thru dictionary from a file

Postby lovecodecakes » Tue Feb 19, 2013 1:36 pm

UPDATE: Corrected code:
Code: Select all
namelist=[]
def indexer(fname): #lister is alist arg
    global namelist
    diction={}
    #alist=[]
    for line in re.findall(r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>',fname,re.IGNORECASE):
        #print line[1:]
        namelist.append(line[1:])
        diction[line[0]]=namelist[namelist.index(line[1:])]
        #print namelist
        #print line[0]
        #alist[line[0]]=line[1:]
    return diction
    #print diction

but is there any better waY instead
Code: Select all
diction[line[0]]=namelist[namelist.index(line[1:])]

??

Ok, so I tried it again with dictionary, not expecting sorting, but just to call it by a key:
Code: Select all
namelist=[]
def indexer(fname): #lister is alist arg
    global namelist
    diction={}
    for line in re.findall(r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>',fname,re.IGNORECASE):
        #print line[1:]
        diction[line[0]]=namelist.append(line[1:])
    return diction


And it's popping up with 'None' where it should have just returned what's in line[1:] from
Code: Select all
diction[line[0]]=namelist.append(line[1:]) Isn't it?

I mean why is it showing up with None?
'997': None,
'998': None,
'999': None}
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm


Return to General Coding Help

Who is online

Users browsing this forum: Google [Bot], Majestic-12 [Bot] and 4 guests