One of my favorite features of Google Sheets is the ability to use the IMPORTXML function to pull in content from anywhere on the web. It’s like having your own little Google-backed scraper to conveniently pull data into your spreadsheet.
For example, if I wanted to get some data about my Google Play apps, I could use the formulas below. These are not strictly optimized – but they do show a variety of xpath queries. (note: before scraping content, its best to make sure you are not violating the T&Cs of the content owner)
Publisher Name
=IMPORTXML(URL, "//div[@itemprop='author']//span") |
Publisher Name
=IMPORTXML(URL, "//div[@itemprop='author']//span")
App Category
=MID(IMPORTXML(URL, "//a[@class='document-subtitle category']/@href"),22,999) |
App Category
=MID(IMPORTXML(URL, "//a[@class='document-subtitle category']/@href"),22,999)
App Rating
=IMPORTXML(URL, "//meta[@itemprop='ratingValue']/@content") |
App Rating
=IMPORTXML(URL, "//meta[@itemprop='ratingValue']/@content")
# Downloads - lower value
=value(LEFT(IMPORTXML(URL, "//div[@itemprop='numDownloads']"),FIND(" ",IMPORTXML(URL, "//div[@itemprop='numDownloads']"))-1)) |
# Downloads - lower value
=value(LEFT(IMPORTXML(URL, "//div[@itemprop='numDownloads']"),FIND(" ",IMPORTXML(URL, "//div[@itemprop='numDownloads']"))-1))
APK Size
=left(IMPORTXML(URL, "//div[@itemprop='fileSize']"),len(IMPORTXML(URL, "//div[@itemprop='fileSize']"))-1) |
APK Size
=left(IMPORTXML(URL, "//div[@itemprop='fileSize']"),len(IMPORTXML(URL, "//div[@itemprop='fileSize']"))-1)
Last Updated Date
=mid(IMPORTXML(URL, "//div[@itemprop='author']//div"),3,99) |
Last Updated Date
=mid(IMPORTXML(URL, "//div[@itemprop='author']//div"),3,99)
Has a Video on the Listing Page?
=if(iserror(IMPORTXML(URL, "//span[@class='video-image-wrapper']")),"","Y") |
Has a Video on the Listing Page?
=if(iserror(IMPORTXML(URL, "//span[@class='video-image-wrapper']")),"","Y")
Developer Email Address
=MID(IMPORTXML(URL, "//div[@class='details-section-contents']//a[contains(@href, 'mailto')]"),7,999) |
Developer Email Address
=MID(IMPORTXML(URL, "//div[@class='details-section-contents']//a[contains(@href, 'mailto')]"),7,999)