HTML Bitcoin address parsing using Beautiful soup

+1 vote

I'm trying to use the beautiful soup lib to parse a webpage and find bitcoin addresses.

I've managed to pull the class containing a generated address out of the whole html document:

<div class="roundpic qrcode" data-height="80" data-text="bitcoin:1JL7kugm1vDLqyzrVPAPdcbjH3PTxcPcud?amount=0.0573" data-width="80" style="margin: auto"></div>, <div class="roundpic qrcode" data-height="160" data-text="bitcoin:1JL7kugm1vDLqyzrVPAPdcbjH3PTxcPcud?amount=0.0573" data-width="160" style="padding: 10px"></div>

What would be the best way to isolate the address? I know the length can be between 27-34 characters but it will always appear in between 'bitcoin:' and '?'. Is there a regex I could use?

Sep 5, 2018 in Blockchain by digger
• 26,740 points
1,308 views

3 answers to this question.

0 votes

You don't really need a regex. Basic string operations work just fine:

import re
from bs4 import BeautifulSoup

html = '''
<div class="roundpic qrcode" data-height="80" data-text="bitcoin:1JL7kugm1vDLqyzrVPAPdcbjH3PTxcPcud?amount=0.0573" data-width="80" style="margin: auto"></div>
<div class="roundpic qrcode" data-height="160" data-text="bitcoin:1JL7kugm1vDLqyzrVPAPdcbjH3PTxcPcud?amount=0.0573" data-width="160" style="padding: 10px"></div>
'''

soup = BeautifulSoup(html)

for div in soup.find_all('div', {'data-text': re.compile(r'^bitcoin:')}):
    address, amount = div.get('data-text').replace('bitcoin:', '').split('?amount=')
answered Sep 5, 2018 by slayer
• 29,350 points
0 votes

This is very easy using python regular expressions:

import re
in_str=<div class="roundpic qrcode" data-height="80" data-text="bitcoin:1JL7kugm1vDLqyzrVPAPdcbjH3PTxcPcud?amount=0.0573" data-width="80" style="margin: auto"></div>, <div class="roundpic qrcode" data-height="160" data-text="bitcoin:1JL7kugm1vDLqyzrVPAPdcbjH3PTxcPcud?amount=0.0573" data-width="160" style="padding: 10px"></div>
bitcoin_addresses=re.findall(r'bitcoin:(.*?)\?amount',in_str)
answered Sep 5, 2018 by Him
0 votes

Regex is not meant for parsing. As you have already parsed the DOM anyway, why don't you access it directly? :)

# imports
from bs4 import BeautifulSoup as Soup

# parse the HTML
s = Soup("<html><body><div data-text='bitcoin:a'></div><div data-text='bitcoin:b'></div><div data-text='bitcoin:c'></div></body></html>")

# find the divs with "data-text"-attribute
divs = ( d for d in s.findAll(name="div") if d.has_key('data-text') )

# extract the value of "data-text"
data_texts = map(lambda x: x["data-text"], divs)

# find only bitcoins
bitcoins = filter(lambda s: s.startswith("bitcoin:"), data_texts)

# strip the prefix
extract = map(lambda s: s[8:], bitcoins)

# result
print extract

Result:

['a', 'b', 'c']
answered Sep 5, 2018 by Lewis

Related Questions In Blockchain

0 votes
1 answer

Bitcoin: parsing Blockchain API using JSON

Because you only add one address to ...READ MORE

answered Aug 22, 2018 in Blockchain by digger
• 26,740 points
2,960 views
0 votes
1 answer

How are unique public address generated in bitcoin?

In brief, the public addresses are generated ...READ MORE

answered Jul 16, 2018 in Blockchain by slayer
• 29,350 points
558 views
0 votes
1 answer

How to extract h160 address of bitcoin blockchain?

You can extract the data using the ...READ MORE

answered Jul 16, 2018 in Blockchain by digger
• 26,740 points
2,244 views
0 votes
1 answer

How to get all address and send ethers in solidity using a loop?

I found a similar code somewhere: contract  Holders{ uint ...READ MORE

answered Jul 31, 2018 in Blockchain by digger
• 26,740 points
2,541 views
+1 vote
1 answer

Protocols used in a distributed/dlt system for the nodes to establish communication

yes all are over TCP/IP connections secured ...READ MORE

answered Aug 6, 2018 in Blockchain by aryya
• 7,450 points
1,129 views
0 votes
1 answer

Truffle tests not running after truffle init

This was a bug. They've fixed it. ...READ MORE

answered Sep 11, 2018 in Blockchain by Christine
• 15,790 points
1,663 views
0 votes
1 answer

Hyperledger Sawtooth vs Quorum in concurrency and speed Ask

Summary: Both should provide similar reliability of ...READ MORE

answered Sep 26, 2018 in IoT (Internet of Things) by Upasana
• 8,620 points
1,215 views
0 votes
1 answer

How to generate coin address using bitcoin-ruby?

The only difference between the addresses is ...READ MORE

answered Aug 29, 2018 in Blockchain by slayer
• 29,350 points
553 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP