Find new hashtags in your niche and get their stats with shell scripting

published on

I'm going to be using Instaloader and some shell scripting to scrape a hashtag, find what other hashtags users are using and sort them by frequency.

If users in a hashtag are all using similar hashtags we can guess that they will be relevant to your niche and viable for you to check out and perhaps use yourself. This is a nice way to find hidden gems. Hashtags that people in your niche have discovered and that you can take advantage of. Sometimes smaller is better, as I will explain and show you.

After that we will code some shell scripts that will allow us to do some research on each of these hashtags and determine just how viable they may be with multiple factors taken into account.

  • Scrape a hashtag with Instaloader
  • Find what other hashtags users are posting to
  • Check each hashtag and get a little bit of information about them to score them

Scrape a hashtag with Instaloader

This is simple enough. We just want to choose a hashtag to work with... lets pick #marijuana (Instagram tag search result) and begin.

Opening up a terminal we have to type the following command to begin the scraping with Instaloader.

instaloader "#marijuana" --no-pictures --no-profile-pic --no-videos --no-compress-json --no-captions

And after about 5 minutes we grabbed the meta data in a .json file of around 5000 posts:

Since the #marijuana hashtag is incredibly popular, most of these 5k~ posts are from just the last 3 days.

Find what other hashtags users are posting to

Now that we have the data we will code our shell script which will loop through all the meta files provided by Instaloader.  The script will be grabbing the user captions from the file and adding these captions into a list.

  • Loops through each file in the #marijuana folder containing all the json data files
  • Gets the json line containing "text" which holds the posts comment/caption
  • Replaces all # and adds a space infront of them... '#' becomes ' #', incase users are not adding a space between each hashtag and #stuffing#like#this
  • Replaces space characters with a new line
  • Displays only lines that start with #, converts everything to lowercase
  • Adds all the hashtags to a textfile

Once the loop is finished we can use a simple command which will sort the whole hashtags_list.txt file based on frequency of occurence.

cat hashtags_list.txt | sort | uniq -c | sort -nr

Turning this HUUUUGE hashtags_list.txt file (97,000 lines) full of duplicates:

Into this much smaller, no duplicates, sorted by occurences file:

Check each hashtag and get information about them to review the stats and compare

So now that we have a bunch of #marijuana related hashtags, lets get information about each one so we can find the diamonds.

  • Get the total posts count to the hashtag
  • Download 1000 of the latest posts
  • Get the average likes count from 1000 posts
  • Get the average comments count from 1000 posts
  • How many posts out of 1000 recieved 0 likes
  • How many posts out of 1000 recieved 0 comments
  • Count how many posts were posted today out of 1000

Here is the shell script I quickly made up for this demonstration:

Which adds the following formatted entries to a textfile:
  • #hashtag - How many posts it has all together
  • The average likes count from 1000~ posts
  • Average comment count from 1000~ posts
  • How many of the 1000~ posts have 0 likes
  • How many of the 1000~ posts have 0 comments
  • What were the dates of the posts

If you wanted a more efficient way to view the data, output the formats into CSV (Comma Seperated Values) so that you can open them in a spreadsheet program like Excel. I will do this in my next tutorial on this subject with a different niche.

Interesting... so now what?

We have looked through a very popular hashtag, found similar related hashtags and gathered some interesting statistics about them. All with the power of shell scripting.

Now we have a better picture of which hashtags to investigate further and hopefully dominate our niche in.

Using the statistics we can:

  • Find hashtags that are "fresh" with a low overall amount of posts
  • Find hashtags that do not get posted to a lot, meaning your posts will stay in the search for longer and have a higher visibility
  • Find hashtags that recieve a great amount of likes on average
  • Find hashtags with a low percentage of recieving no likes
  • Filter out the lowest performing hashtags so that we don't waste space with them

Some example results

#cannabiscommunity had some of the lowest 0 Comments count at 514 out of 1000. Compared to #smoke which has 610 out of 1000. This makes sense because the word community implies people are more friendly, more likely to reply to things and be social. This would be a good keyword to focus on if you were looking for feedback or wanted to start a discussion about marijuana. Similarly #highsociety has a low 0 Comments out of 1000 count. 

#growyourown has the lowest 0 Comments out of 1000 score at 485. This seems like a hashtag fit for people interested in cultivating marijuana and they are quite social about it.

Low value hashtags: #420 and #dank had some of the highest 0 Likes out of 1000 counts. This implies that a lot of the posts in these hashtags are low value and don't resonate well with the audience. It would be worth it to completely ignore them in your marijuana posts since hashtag space is valuable, you could replace them with something better and more targeted instead. 

#dank also has 44 million posts, which is much higher than most of the other hashtags I've looked at here. This is because it is a generic term and not entirely related to marijuana. For this reason it should be completely discarded from any serious marijuana post.

#cbd has one of the lowest Like avg at 15 while most others have an average between 25-40 this is most likely because of how much selling is going on in that hashtag. CBD products are very popular and plenty of posts will be shilling for sales. For this reason it would be wise to avoid this hashtag if your intention is to grow your likes and grow your following on your marijuana post.

#stoned has a Like avg of 73 and only 5.9 million posts. This is a hashtag you should use every time you post about marijuana, lots of people seem to enjoy browsing it.

#420life - 87.0 Like avg, 2.58 Comment avg, and only 2.6 million posts. 
#weedstagram420 - 112 Like avg, 3.24 Comment avg and 4.7 million posts.

These two hashtags are something to seriously focus on for the marijuana niche.

Wrapping it up

So now you can see how digging into the data of your hashtags can be quite interesting and valuable. We find that some of the more commonly used hashtags are completely useless or don't provide great enough returns to even bother using them.

We find some really high performing hashtags that should be in every single one of your posts that you might have overlooked. And that's just some quick details gleamed from the top 30 hashtags I looked at. There's still... 300~ more viable canditates to research in the niche. I could go at this all day finding interesting things from the stats.

There are tons of hidden gems out there that you could be focusing on, instead of doing the generic super popular ones. 

Remember if they are very popular and fast moving, that means your post is just going to move on down the results away from eyeballs that much quicker.


Powered by Hacks
© 2018, Sharegrams