Saturday, December 15, 2012

View Twitter profile images

A while ago I've noticed that Twitter changed their design for the users images.
Previewing an image, we used to have the option to receive a picture grid of all the images the user posted however it seems that Twitter disabled the feature allowing visitors to view one image at a time.
The missing grid view is a pain and a lot of people want it back as seen in this support thread at Twitter: https://dev.twitter.com/discussions/9843

So I made a decision to create an application that will let me get a profiles images easily.

The Obvious way to go about programming this is using the Twitter API (after reading the documentation of course :-) ).
I've notice two Twitter api functions that allow us to reach our goal using the plain and simple HTTP GET protocol.

1 - http://api.twitter.com/1/statuses/user_timeline.xml?screen_name=CodeBTL
The user_ timeline.xml command returns a simple XML file with the recent tweets.
The function supports additional parameters like count that allows you top specify the max amount of tweets to receive and trim_user that removes appended user data.
Xml result from the Twitter user_timeline function





Notice that for each tweet we get the tweet id and the tweet text.

2 - http://api.twitter.com/1/statuses/show.xml?id=279580723336331264&include_entities=1
The show.xml function receives a tweet id and returns the XML description.
Like most functions on twitter, the function supports additional parameters the most important one for us is the include_entities that can show us if any media links exist in the tweet allowing us to take the link and display it.
Xml result from the Twitter show function
When testing my application I found a big problem.
Twitter limits unauthenticated requests to 150 per hour and each of our GET REQUEST's counts as one. Reference https://dev.twitter.com/docs/rate-limiting 
That means we can only check under 150 tweets for images and this really limits us.
I tried registering and application with Twitter and authenticating a user for the requests but the limit seems to be final.
The only option a saw to bypass the limit is not use the Twitter API and to do some HTML scraping.

I created a simple form allowing the user to enter the twitter username he want to retrieve images of.

Once selected I used the first URL on this post figuring that for now 150 user requests per hour was enough. The result is an XML file with up to 200 status nodes.

I iterate through all of the status texts looking for links.
Twitter changes all posted links (for media files uploaded too) to the twitter format of t.co
to check we can use a regular expression
Match m = Regex.Match(TweetText, @"(?<twitterURL>t.co)/(?<subdir>[^\s]*)");

If we found a link we cant just download since there will be a redirection from twitter to the location of the original link. In order to capture the actual image we send an HTTP HEAD request to get the redirect URL like the following:
    var request = (HttpWebRequest)WebRequest.Create(new Uri(@"http://t.co/" + m.Groups["subdir"].Value));
    request.Method = "HEAD";
    request.AllowAutoRedirect = false;
    string location;
    using (var response = request.GetResponse() as HttpWebResponse)  

    {
         location = response.GetResponseHeader("Location");
    }


Upon receiving the new location we can notice that we are not redirected to the image itself but to a web page displaying the image.

Currently I support two types of images, hosted on twitter (URL starts with twitter.com and contains /photo/ ) or hosted on instagram (URL starts with instagr.am).
To finish our scraping session we need to find on the web page the correct image tag.
For twitter, the img tag class attribute has the value "large media-slideshow-image".
For instagram, the img tag class attribute has the value "photo".


Since all of the code up to this point was self written, I didn't want to use the HTML Agility Pack or a different third party component. So using regular expressions again I write the GetImageTags function
        private List<ImgTag> GetImageTags(String html)
        {
            List<ImgTag> imgTags = new List<ImgTag>();
            MatchCollection m1 = Regex.Matches(html, @"(<img.*?>.*?>)", RegexOptions.Singleline);
            foreach (Match m in m1)
            {
                string value = m.Groups[1].Value;
                ImgTag imgTag = new ImgTag();
                Match m2 = Regex.Match(value, @"src=\""(.*?)\""", RegexOptions.Singleline);
                if (m2.Success)
                {
                    imgTag.src = m2.Groups[1].Value;                   
                }
                m2 = Regex.Match(value, @"class=\""(.*?)\""", RegexOptions.Singleline);
                if (m2.Success)
                {
                    imgTag.classAtt = m2.Groups[1].Value;
                }
                imgTags.Add(imgTag);
            }
            return imgTags;
        }



If we retrieve the src attribute value, all we have left to do is download the image.

You can get the Twitter image downloader application at my codeplex project page https://twitterimagedownload.codeplex.com/
Take a look at the source code. Recommendations and remarks welcome.

Update 29/6/2013 : I've updated the project to support the Twitter API Ver. 1.1.
That should fix the crash issue that occurred when fetching the images.


Twitter Image Downloader

20 comments:

  1. Great!
    However it's only checking for the last 200 tweets. I just modified it to check for 3200 which is twitters maximum. Gonna post it on the codeplex site. (Maybe I'll add an option to choose how many tweets it should check because it takes pretty long for 3200)

    ReplyDelete
    Replies
    1. Where did you post it? Link?

      Delete
    2. I finished it some time ago - but didn't release it. Now twitter updated it API (in a pretty shitty way) so I have to fix some stuff and will release it on the codeplex site. I also just saw that CBTL already got that working so it hopefully won't take long. It also has some other extra features besides up to 3200 pics.

      Delete
  2. after showing the photos how i can download the photo that i want?

    ReplyDelete
    Replies
    1. I have read your blog its very attractive and impressive. I like it your blog.

      Social Media Marketing Agency Social Media Marketing Services

      Delete
  3. Hi Anonymous,
    The Demo App downloads the images automatically to the local folder where the application was launched from.

    ReplyDelete
  4. Where is the modified version with the 3200 max? Thank you.

    ReplyDelete
    Replies
    1. Updates on the 3200 max version?

      Delete
    2. I finished it some time ago - but didn't release it. Now twitter updated it API (in a pretty shitty way) so I have to fix some stuff and will release it on the codeplex site. I also just saw that CBTL already got that working so it hopefully won't take long. It also has some other extra features besides up to 3200 pics.

      Delete
  5. Great software!
    I'd be keen to get the 3200 max version too. Any news on a link to a compiled version?

    ReplyDelete
    Replies
    1. I'll update it with the new max version next week scott

      Delete
  6. Don't work now :(
    It works great before, but now just launched and after clicking on Download button crashed :(
    OS: Win 8 x64

    ReplyDelete
    Replies
    1. darkpaska, thanks for the heads up.
      It seems that twitter stopped supporting the twitter API ver. 1.0.
      I've updated the app to work with the newer 1.1 API so if you download the new version it should work for you.

      Delete
  7. Nice app. Thanks and questions from this *NON*-developer:
    1. Could you overcome the 3200 limitation using the Streaming API?
    2. Could I use your App + MY Public Twitter API (if there is such a thing) to overcome the 3200 limit? I ask because some sites, like VirusTotal, make this a possibility to prevent apps, like X-Ray by Raymond.cc, from exceeding request limits.

    Regards.

    ReplyDelete
  8. Awesome program dudes (y). But there's prob. The downloads a limited to like 20 images. Please find a solution. Cheers

    ReplyDelete
  9. No funciona, sale el mensaje "Done" y no aparece ninguna imágen, ninguna carpeta, es basura.

    ReplyDelete
  10. Try Sone Image Downloader, it also downloads all Instagrams and works with the current Twitter API - http://www.michelstevelmans.com/sone-image-downloader/

    ReplyDelete
  11. any updates about it in 2014 ? which is roadmap ? project is dead ?

    ReplyDelete
  12. how to use this?
    when I click download it doesn't work

    ReplyDelete