blog

  • Home
  • blog
  • Building Microsoft’s What-Dog AI in under 100 Lines of Code

Building Microsoft’s What-Dog AI in under 100 Lines of Code

Rather recently, Microsoft released an app using AI to detect a dog’s breed. When I tested it on my beagle, though…

The app identifies a beagle as a Saluki

Hmm, not quite, app. Not quite.

In my non-SitePoint time, I also work for Diffbot – the startup you may have heard of over the past few weeks – who also dabble in AI. To test how they compare, in this tutorial we’ll recreate Microsoft’s application using Diffbot’s technology to see if it does a better job at recognizing the adorable beasts we throw at it!

We’ll build a very primitive single-file “app” for uploading images and outputting the information about the breed under the form.

Prerequisites

If you’d like to follow along, please register for a free 14-day token at Diffbot.com, if you don’t have an account there yet.

To install the client, we use the following composer.json file:

{
    "require": {
        "swader/diffbot-php-client": "^2",
        "php-http/guzzle6-adapter": "^1.0"
    },
    "minimum-stability": "dev",
    "prefer-stable": true,
    "require-dev": {
        "symfony/var-dumper": "^3.0"
    }
}

Then, we run composer install.

The minimum stability flag is there because a part of the Puli package is still in beta, and it’s a dependency of the PHP HTTP project now. The prefer stable directive is there to make sure the highest stable version of a package is used if available. We also need an HTTP client, and in this case I opted for Guzzle6, though the Diffbot PHP client supports any modern HTTP client via Httplug, so feel free to use your own favorite.

Once these items have been installed, we can create an index.php file, which will contain all of our application’s logic. But first, bootstrapping:

<?php

require 'vendor/autoload.php';

$token = 'my_token';

The Upload

Let’s build a primitive upload form above the PHP content of our index.php file.

<form action="/" method="post" enctype="multipart/form-data">
    <h2>Please either paste in a link to the image, or upload the image directly.</h2>
    <h3>URL</h3>
    <input type="text" name="url" id="url" placeholder="Image URL">
    <h3>Upload</h3>
    <input type="file" name="file" id="file">
    <input type="submit" value="Analyze">
</form>

<?php

...

We’re focusing on the PHP side only here, so we’ll leave out the CSS. I apologize to your eyes.

Ugly form

We’ll be using Imgur to host the images, so that we don’t have to host the application in order to make the calls to Diffbot (the images will be public even if our app isn’t, saving us hosting costs). Let’s first register an application on Imgur via this link:

Imgur registration

This will produce a client ID and a secret, though we’ll only be using the client ID (anonymous uploads), so we should add it to our file:

$token = 'my_token';
$imgur_client = 'client';

Analyzing the Images

So, how will the analysis happen, anyway?

As described in the docs, Diffbot’s Image API can accept a URL and then scans the page for images. All found images are additionally analyzed and some data is returned about them.

The data we need are the tags Diffbot attaches to the image entries. tags is an array of JSON objects, each of which contains a tag label, and a link to http://dbpedia.org for the related resource. We won’t be needing these links in this tutorial, but we will be looking into them in a later piece. The tags array takes a form similar to this:

"tags": [
        {
          "id": 4368,
          "label": "Beagle",
          "uri": "http://dbpedia.org/resource/Beagle"
        },
        {
          "id": 2370241,
          "label": "Treeing Walker Coonhound",
          "uri": "http://dbpedia.org/resource/Treeing_Walker_Coonhound"
        }
      ]

As you can see, each tag has the aforementioned values. If there’s only one tag, only one object will be present. By default, Diffbot returns up to 5 tags per entry – so each image can have up to 5 tags, and they don’t have to be directly related (e.g. submitting an image of a running shoe might return both the tag Nike and the tag shoe).

It is these tag labels we’ll be using as suggested guesses of dog breeds. Once the request goes through and returns the tags in the response, we’ll print the suggested labels below the image.

Continue reading %Building Microsoft’s What-Dog AI in under 100 Lines of Code%

LEAVE A REPLY