Build a Face-Recognizing and Talking Doorbell for $100

It might feel a bit dizzy for everyone…

THU Data Party has a new account with Tsinghua University verification:

This will serve as the official WeChat account of the Data School and the Federation

Continuing to spread data science knowledge

Scan the code to follow us!

Scan the QR code to follow the new account

Data Party THU (ID: DatapiTHU)

New platform, let’s grow together!

Source: OReillyData

Author: Lukas Biewald

This article is 3900 words long, recommended reading time is 7 minutes

Teaching you how to use Amazon Echo and Raspberry Pi to recognize visitors at the door.

Recently, when I was preparing to install a doorbell in my new house, I thought: why not let my doorbell tell me who is at the door?

Most of the DIY projects I undertake tend to cost more than other comparable products, even if I value my time at $0 per hour. I think this might be related to supply chain and economies of scale. However, I have more fun making these things myself. In this project, I built a door camera that is not only cheaper than my Dropcam but also has some genuinely useful features that I haven’t seen on the market for some reason.

Build a Face-Recognizing and Talking Doorbell for $100

Figure 1 My front door has a doorbell, an August smart lock, and a Raspberry Pi for facial recognition.

Image provided by Lukas Biewald

We will build a Raspberry Pi-based security camera device costing $60 to take photos and upload them to the cloud for facial recognition. You can also upload the data stream to Amazon S3, making it a complete Dropcam alternative. Nest charges $100 per year to save the last 10 days of video, but you can keep a year’s worth of video files in S3 for about $20. If you use Amazon Glacier, this cost will drop to around $4.

Using Amazon Rekognition for Machine Learning

This tutorial will focus on the machine learning part—using Amazon’s new Rekognition service to recognize your visitors’ faces and send the recognition results to your Amazon Echo, so you always know who is at your door. To build a reliable service, we will also use one of Amazon’s coolest and most useful products: Lambda.

Components:

Amazon Echo Dot ($50)
Raspberry Pi 3 ($38) (This project can also use Raspberry Pi 2 plus a wireless USB adapter)
Raspberry Pi compatible camera ($16)
Raspberry Pi case ($6)
16GB SD card ($8)

Total: $118

We will use Amazon’s S3, Lambda, and Rekognition services for facial matching. These services start free, and later you will only spend a few cents per month to recognize thousands of visitors at your door.

Setting Up the Raspberry Pi System

If you have completed any of my Raspberry Pi tutorials, you will be familiar with most of the content in this tutorial.

First, download Noobs from the Raspberry Pi Foundation and follow the installation instructions. This mainly involves copying Noobs to the SD card, inserting the SD card into your board, and plugging in the mouse, keyboard, and display to your board, then following the installation instructions. This has become easier since the new desktop environment Pixel was launched.

Figure 2 The Raspberry Pi connected to a mini display and keyboard on my desk. Image provided by Lukas Biewald

Then name your Raspberry Pi system something memorable so you can SSH into it. There is a good guide on howtogeek for this—you need to modify the /etc/hosts and /etc/hostname files and name your Raspberry Pi system. I like to name all my security camera Raspberry Pis after characters from my favorite TV show “It’s Always Sunny in Philadelphia,” so I named the front door camera “Dennis.” This means I don’t need to remember an IP address; I can SSH into dennis.local anytime, even if I reset the router.

Next, you should connect the camera to the Raspberry Pi board. Remember that the ribbon cable should face the Ethernet port—this is an issue I may have Googled a hundred times. Note: If you want a wider field of view, you can buy a wide-angle camera; if you want to add night vision, you can buy an infrared camera.

Figure 3 The Raspberry Pi with the camera and case ready for installation. Image provided by Lukas Biewald

You might also want to put the whole device in a protective case to shield it from the elements. You will also need to connect the Raspberry Pi to the power supply via a micro USB cable. (I have already drilled a small hole in the wall to connect my Dropcam to an indoor power outlet, so I already have a USB cable in the right place.)

So far, I have installed several such devices around the house. The camera ribbon cable is thin, and you can install the Raspberry Pi inside the room and run the cable as I did from the door in my lab (garage).

Figure 4 The camera on the Raspberry Pi comes out from my garage door. Image provided by Lukas Biewald

Next, you need to install the RPi-Cam-Web interface. This is a very useful software that provides a continuous data stream from the camera via http protocol. Please follow the installation instructions and choose NGINX as the web server. There is a very useful configuration file in /etc/raspimjpeg that can be used to configure a lot of options.

Configure Amazon S3 and Amazon Rekognition

If you have not yet created an AWS account, you need to do so now. You should first create an IAM user and allow that user to access S3, Rekognition, and Lambda (we will use Lambda later).

Install the AWS command line interface:

sudo apt install awscli

Set your Raspberry Pi region to US East (as of this writing, Rekognition is only available in this region)

Create a facial recognition group:

aws create-collection –collection-id friends

You can use the Unix shell script I wrote to quickly add your friends’ face images:

aws s3 cp $1 s3://doorcamera > output

aws rekognition index-faces \
–image “{\”S3Object\”:{\”Bucket\”:\”doorcamera\”,\”Name\”:\”$1\”}}” \
–collection-id “friends” –detection-attributes “ALL” \
–external-image-id “$2”

Copy this to a file as a shell script. Or type it in the command line and replace $1 with the local filename of your friend’s image, and $2 with your friend’s name.

Amazon’s Rekognition service uses machine learning to get the distances between points on a face image and uses these points to match facial images indexed in its database. Therefore, you can train the system with just one image of your friend and still get good results.

Now you can use a similar script to test this facial recognition system:

aws s3 cp $1 s3://doorcamera > output

aws rekognition search-faces-by-image –collection-id “friends” \
–image “{\”S3Object\”:{\”Bucket\”:\”doorcamera\”,\”Name\”:\”$2\”}}”

You will get back a large JSON file, which contains not only the matching results but also other information about the image, including gender, emotion, facial hair, and a bunch of other interesting things.

{

“FaceRecords”: [

{

“FaceDetail”: {

“Confidence”: 99.99991607666016,

“Eyeglasses”: {

“Confidence”: 99.99878692626953,

“Value”: false

“Sunglasses”: {

…

Next, we can write a Python script to download an image from our Raspberry Pi camera and check for faces in it. In fact, I used a second Raspberry Pi to implement this function, but running it on the same machine would be easier. Just look at the /dev/shm/mjpeg/cam.jpg file, and you will know the image file corresponding to a camera.

In either case, we need to expose this interface functionality in the web server for later use. I used Flask as my web server.

from flask import Flask, request

import cameras as c

app = Flask(__name__)

@app.route(‘/faces/<path:camera>’)

def face_camera(camera):

data = c.face_camera(camera)

return “,”.join(data)

if __name__ == ‘__main__’:

app.run(host=’0.0.0.0′, port=5000)

I have put the parsing code (and all other code mentioned in this article) on github.com/lukas/facerec.

If you have made it this far, you will have some very interesting things to play with. I found Amazon’s services to be excellent at recognizing my friends. The only thing that seemed a bit troublesome was recognizing my emotions (though this may be more my issue than Amazon’s).

In fact, I sewed one of the Raspberry Pi cameras into a plush toy and placed a rather creepy face-recognizing teddy bear sentinel on my desk.

Figure 5 The face-recognizing teddy bear Freya.

Image provided by Lukas Biewald

The face recognition camera works with Amazon Echo

Amazon’s Echo makes high-quality voice commands very simple and provides a perfect interface for this type of project. Unfortunately, the best way to use Echo is to have it communicate directly with a stable web service, but we want to keep the Raspberry Pi camera behind the router firewall on a local network—which makes configuration a bit tricky.

We connect the Echo to an AWS Lambda service, which communicates with our Raspberry Pi system through an SSH tunnel. This might be a bit complicated, but it is the simplest way.

Figure 6 Architecture diagram. Image provided by Lukas Biewald

Exposing HTTP face recognition API through SSH tunnel

So far, we have built a small web application for facial recognition, and we also need to make it accessible to the outside world. As long as we have a web server somewhere, we can configure an SSH tunnel. However, there is a sweet little application called localtunnel that does all this for us, and you can easily install it:

npm install -g localtunnel

I like to wrap it in a small script to keep it running and prevent it from crashing. Please change MYDOMAIN to something meaningful to you:

until lt –port 5000 -s MYDOMAIN; do

echo ‘lt crashed… respawning…’

sleep 1

done

Now you can ping your server by visiting http://MYDOMAIN.localtunnel.me.

Creating an Alexa Skill

To use our Echo, we need to create a new Alexa Skill. Amazon has a great getting started guide, or you can visit the Alexa developer portal directly.

First, we need to set an intent:

{

“intents”: [

{

“intent”: “PersonCameraIntent”,

“slots”: [

{

“name”: “camera”,

“type”: “LIST_OF_CAMERAS”

}

]

}]}

Then we give Alexa some sample utterances:

PersonCameraIntent tell me who {camera} is seeing

PersonCameraIntent who is {camera} seeing

PersonCameraIntent who is {camera} looking at

PersonCameraIntent who does {camera} see

Next, we need to give Alexa an endpoint, for which we will use a Lambda function.

Configuring a Lambda Function

If you have never used a Lambda function before, you are in for a treat! Lambda functions are a simple way to define a unified API for a simple function on Amazon’s servers, and you only pay when it is called.

Alexa Skill is a perfect use case for Lambda functions, so Amazon has set up a template that is available for Alexa Skills. When Alexa matches one of the PersonCameraIntents we listed, it will call our Lambda function. Change MYDOMAIN to the domain name you used in your local tunnel script, and everything should work fine.

You can also use other interesting things from the metadata sent by Amazon Rekognition. For example, it can guess facial expressions, so I use it to determine whether the visitor at my door is a happy guest or an angry one. You can also make Echo tell you whether the visitor has facial hair, is wearing sunglasses, and other features:

def face_camera(intent, session):

card_title = “Face Camera”

if ‘camera’ in intent[‘slots’]:

robot = intent[‘slots’][‘camera’][‘value’]

try:

response = urlopen(‘http://MYDOMAIN.localtunnel.me/faces/%s’ % robot)

data = response.read()

if (data== “Not Found”):

speech_output = “%s didn’t see a face” % robot

else:

person, gender, emotion = data.split(“,”)

if person == “” or person is None:

speech_output = “%s didn’t recognize the person, but ” % robot

else:

speech_output = “%s recognized %s and ” % (robot, person)

if gender == “Male”:

speech_output += “he ”

else:

speech_output += “she ”

speech_output += “seems %s” % emotion.lower()

except URLError as e:

speech_output = “Strange, I couldn’t wake up %s” % robot

except socket.timeout, e:

speech_output = “The Optics Lab Timed out”

else:

speech_output = “I don’t know what robot you’re talking about”

should_end_session = False

return build_response({}, build_speechlet_response(

card_title, speech_output, None, should_end_session))

When you talk to Alexa, it actually parses your speech to find an intent and runs a Lambda function. That function will call an external server and communicate with your Raspberry Pi through an SSH tunnel. The Raspberry Pi will take a picture from the camera, upload it to S3, run a deep learning inference algorithm to match it against your friend’s face images, and send the parsing results back to your Echo. Then Echo will talk to you. But this whole process happens very quickly! To see it for yourself, check out my video:

You can take the same technology and apply or extend it to many cool aspects. For example, I put the code from my “Robot Raspberry Pi/TensorFlow project” onto my robot. Now they can talk to me and tell me what they are looking at. I’m also considering using this GitHub project to connect the Raspberry Pi to my August lock, so my door will automatically open for my friends or lock automatically for an angry visitor at the door.

This article originally appeared in English: “Build a talking, face-recognizing doorbell for about $100”.

Author Bio

Lukas Biewald

Lukas Biewald is the founder and CEO of CrowdFlower. CrowdFlower started in 2009 as a data enrichment platform that helps businesses get on-demand human labor to collect, generate training data, and participate in human-machine learning loops. After earning a bachelor’s degree in mathematics and a master’s degree in computer science from Stanford University, Lukas led the search relevance team at Yahoo Japan. He then went to Powerset, working as a senior data scientist. In 2008, Powerset was acquired by Microsoft. Lukas was also named one of Forbes’ 30 Under 30. Lukas is also an expert Go player.

The bottom menu of the public account has surprises!

For organizations, individuals, please check “Federation”

For past exciting content, please check “Search in Account”

To join as a volunteer or contact us, please check “About Us”

Related posts

Leave a Comment Cancel reply