WebScraping with Python and BeautifulSoup: Part 1 of 3

Picture of Lillian Pierson, P.E.

Lillian Pierson, P.E.

Reading Time: 3 minutes

To put webscraping with Python and BeautifulSoup in real-world context, imagine you’re living in New York City and a massive “bomb cyclone” hits town one winter. It takes out all power and water services. It manages to kill more than 150 people. The roads are covered in snow and debris and there is no way to bring in food and water supplies. More people have been severely hurt and will die if you can’t get them immediate medical care, water, and food.

WebScraping with Python and BeautifulSoup

In this case, since you’re in the United States, you don’t have to worry too much. You know that FEMA, Department of Homeland Security, and Red Cross all have your back. Things will be up and running within a week, with additional casualties kept to bare minimum.

But what if you weren’t so fortunate? What if you were living in a less developed nation and got hit by a storm of such devastating nature. What then?

Well, that’s exactly the situation that that tens of millions of Filipinos find themselves in on a semi-regular basis. In less developed countries, like the Philippines, people really depend on the international community to step in and help. This help comes in the forms of International Red Cross, UN assistance, and countless other humanitarian response organizations. It also comes from hundreds of digital humanitarians who step in and provide volunteer software development and data science services.

Back with Typhoon Yolanda, I worked on one such digital humanitarian deployment, where we used web-scraping to build a population density estimate that humanitarian organizations could use to plan out their emergency response. In this case, the Philippine government didn’t have a population map to know how many people were living in each affected area, so we had to try to make one FAST. You can read more about that activation here.

Webscraping With Python and BeautifulsoupBut this article is not a use case, it’s a demo to introduce you to an important and valuable skill – webscraping. More specifically, webscraping with Python and BeautifulSoup.

At the time I wrote this article there were precisely 615 active postings for webscraping jobs available on Upwork. Whether you want to up your skills for your job, pick up a little cash on a side project, or even if you want to build your own tech business, learning to scrape free-range data straight from the internet is a great superpower to have.

A series on webscraping with Python and BeautifulSoup

In today’s demo, I am going to teach you about the basics of WebScraping with Python and BeautifulSoup. You’re going to see the objects that comprise Beautiful Soup, and how to work with them.

In a follow-up demo, I’m going to teach you to work with parsed data in BeautifulSoup. In the second follow-up, you’re going to learn how to scrape a webpage and save your results to a working directory on your machine. Be sure to subscribe to my mailing list in the footer of this post so you can get those delivered straight to your inbox when their published.

Part 1: Working with objects in BeautifulSoup

So let’s get started with the basics on webscraping with Python and BeautifulSoup. There are 4 main object types in BeautifulSoup. Those are:

  1. BeautifulSoup object: The BeautifulSoup object is a representation of the document you’re scraping. It is easily navigable and searchable.
  2. Tag object: Tag objects correspond to XML and HTML elements in an original document. You can navigate and reference data using tag attributes.
  3. NavigableString object: A NavigableString object is to a bit of text within a tag. Beautiful Soup uses the NavigableString class as a container for bits of text.
  4. Comment object: The Comment object is a type of NavigableString object that you can use for commenting your code.

HI, I’M LILLIAN PIERSON.
I’m a fractional CMO that specializes in go-to-market and product-led growth for B2B tech companies.
If you’re looking for marketing strategy and leadership support with a proven track record of driving breakthrough growth for B2B tech startups and consultancies, you’re in the right place. Over the last decade, I’ve supported the growth of 30% of Fortune 10 companies, and more tech startups than you can shake a stick at. I stay very busy, but I’m currently able to accommodate a handful of select new clients. Visit this page to learn more about how I can help you and to book a time for us to speak directly.

Get Featured

We love helping tech brands gain exposure and brand awareness among our active audience of 530,000 data professionals. If you’d like to explore our alternatives for brand partnerships and content collaborations, you can reach out directly on this page and book a time to speak.

Join The Convergence Newsletter

See what 26,000 other data professionals have discovered from the powerful data science, AI, and data strategy advice that’s only available inside this free community newsletter.

Join The Convergence Newsletter for free below.

Our newsletter is exclusively written for operators in the data & AI industry.

Hi, I'm Lillian Pierson, Data-Mania's founder. We welcome you to our little corner of the internet. Data-Mania offers fractional CMO and marketing consulting services to deep tech B2B businesses.

The Convergence community is sponsored by Data-Mania, as a tribute to the data community from which we sprung. You are welcome anytime.

Get more actionable advice by joining The Convergence Newsletter for free below.

See what 26,000 other data professionals have discovered from the powerful data science, AI, and data strategy advice that’s only available inside this free community newsletter.

Join The Convergence Newsletter for free below.
We are 100% committed to you having an AMAZING ✨ experience – that, of course, involves no spam.

Fractional CMO for deep tech B2B businesses. Specializing in go-to-market strategy, SaaS product growth, and consulting revenue growth. American expat serving clients worldwide since 2012.

© Data-Mania, 2012 - 2024+, All Rights Reserved - Terms & Conditions - Privacy Policy | PRODUCTS PROTECTED BY COPYSCAPE

The Convergence is sponsored by Data-Mania, as a tribute to the data community from which we sprung.

Get The Newsletter

See what 26,000 other data professionals have discovered from the powerful data science, AI, and data strategy advice that’s only available inside this free community newsletter.

Join The Convergence Newsletter for free below.
* Zero spam. Unsubscribe anytime.