The most important when trying to scrape Facebook content is to emulate humans and target the simplest URL entry of Facebook. We will use Python to do a little scrape task in this example.
We need to Scrape Facebook Fan Page users who act with a share for a post.
To install Python on Windows, you will need to follow the A Simple Way To Installing And Run Python And PIP On Windows guide.
Sametime will need to install more Python Package using PIP
as the following
> pip install xlsxwriter selenium numpy codecs
And to emulate a human action, we will use Selenium, The Automation web driver. You can download it from here, apply a code delay between actions like the login process, move to the internal post URL, etc.
And will go to Facebook login, Then target post using the mobile browser old version of the Facebook URL scheme.
In This example, we targeted users who share a post, i.e., https://www.facebook.com/hahahahaaa.vn/posts/3144355232550870 so that we will use a mobile browser Facebook URL scheme for the share action: https://m.facebook.com/browse/shares?id=3144355232550870.
Note: you will only need to replace your targeted post id with the existing one.
The Python Code
We will scrape the data using XPath
of elements, so we may need to recheck if XPath is still OK. or change it to the correct values, then the code can go like the following.
#!/usr/bin/env python # coding: utf-8 from time import sleep import xlsxwriter from selenium import webdriver import numpy as np import random import time import codecs # Enter Credintial information for your facebook account. # We consider that chromedriver.exe file same path as the running python code file. url ="https://m.facebook.com/browse/shares?id=[your post id]" usr ="user@email.com" pwd ="password" driver = webdriver.Chrome(executable_path=r'chromedriver.exe') driver.get('https://m.facebook.com/login') print("Trying Opening the Facebook Login Page") # Make a human delay... sleep(3) # looking for login form using field id username_box = driver.find_element_by_id('m_login_email') username_box.send_keys(usr) print("usr email is entered") # Make a human delay... sleep(3) password_box = driver.find_element_by_id('m_login_password') password_box.send_keys(pwd) print("user password is entered") # Make a human delay... sleep(3) # Click the login button. login_box = driver.find_element_by_name('login') login_box.click() # Make a human delay... sleep(3) # Replace Mobile Url Web version with basic Url version. url=url.replace('https://m.','https://mbasic.') # Go to the Basic url of the targeted post.. driver.get(url) print(url) # Make a human delay... sleep(3) # Build WorkSheet Of The Output Data. cellindex=1 filexls=open("Output"+".xlsx",'a') workbook = xlsxwriter.Workbook("Output_"+".xlsx") worksheet=workbook.add_worksheet() worksheet.write('A1','FB User Name') worksheet.write('B1','FB User ID') worksheet.write('C1','FB User URL') # Build Scrape Index index=0 cntnu=True while(x==True): # try to collect data till finishing and no more load try: index=index+1 user_info = driver.find_elements_by_xpath("//*[contains(@class,'_4mn c')]/a") for li in user_info: cellindex=cellindex+1 link=li.get_attribute('href') name=li.text name=name.replace('\nFollow','') # Restore Usual Web URL link=link.replace('https://m.','https://www.') id=((((link.replace('https://www.facebook.com/profile.php?id=','')).replace('/?fref=pb','')).replace('?fref=pb','')).replace('https://www.facebook.com/','')).replace('&fref=pb','') worksheet.write('A'+str(cellindex),name) worksheet.write('B'+str(cellindex),id) worksheet.write('C'+str(cellindex),link) # Make human delay sleep(3) # Load more shares load = driver.find_element_by_xpath("//*[contains(@id,'m_more_item')]/a") load.click() # Make human delay sleep(5) print(index) cntnu=True except: # no more data or no more loads cntnu=False workbook.close()