Build a Talking, Face-Recognizing Doorbell for About $100

Reprinted with permission from OReillyData

Author | Lukas Biewald OReillyData

Using Amazon Echo and Raspberry Pi to build a doorbell yourself: you can identify thousands of visitors at your doorstep for just a few cents a month.

Recently, when I was preparing to install a doorbell in my new house, I thought: why not let my doorbell tell me who is at the door?

Most of the projects I build myself cost more than other equivalent products, even though I have valued my time at $0 per hour. I think this might be related to supply chains and economies of scale. However, I have had much more fun making these things myself. In this project, I built a door camera that is not only cheaper than my Dropcam but also has some genuinely useful features that I have not seen on the market for some reason.

Figure 1: My front door has a doorbell, an August smart lock, and a Raspberry Pi for facial recognition. Image courtesy of Lukas Biewald

We will build a Raspberry Pi-based security camera device costing $60 to take pictures and upload them to the cloud for facial recognition. You can also upload the data stream to Amazon S3, making it a complete Dropcam replacement. Nest charges $100 a year to save the last 10 days of video, but you can keep a year’s worth of video files on S3 for about $20. If you use Amazon Glacier, this cost will drop to around $4.

Using Amazon Rekognition for Machine Learning

This tutorial will focus on the machine learning part — using Amazon’s new Rekognition service to identify your visitors and then sending the recognition results to your Amazon Echo, so you can always know who is at your doorstep. To build a reliable service, we will also use one of Amazon’s coolest and most useful products: Lambda.


Amazon Echo Dot ($50)

Raspberry Pi 3 ($38) (This project can also use Raspberry Pi 2 with a wireless USB adapter)

Raspberry Pi compatible camera ($16)

Raspberry Pi case ($6)

16GB SD card ($8)

Total: $118

We will use Amazon’s S3, Lambda, and Rekognition services for facial matching. These services start free, and then you only spend a few cents a month to identify thousands of visitors at your doorstep.

Setting Up the Raspberry Pi System

If you have completed any of my Raspberry Pi tutorials, you will be familiar with most of the content in this tutorial.

First, download Noobs from the Raspberry Pi Foundation and follow the installation instructions. This mainly includes copying Noobs to the SD card, inserting the SD card into your board, and then connecting a mouse, keyboard, and display to your board and following the installation instructions. These operations have become easier since the new desktop environment Pixel was launched.

Figure 2: The Raspberry Pi connected to a mini display and keyboard on my desk. Image courtesy of Lukas Biewald

Then name your Raspberry Pi system something memorable so you can SSH into it. There is a good guide on howtogeek for this — you need to modify the /etc/hosts and /etc/hostname files to name your Raspberry Pi system. I like to name all my security camera Raspberry Pis after characters from my favorite TV show, “It’s Always Sunny in Philadelphia,” so I named the front door camera “Dennis.” This means I can access dennis.local via SSH without remembering an IP address, even if the router is reset.

Next, you should connect the camera to the Raspberry Pi board. Remember that the ribbon cable should face the Ethernet port — I might have Googled this a hundred times. Note: If you want a wider field of view, you can buy a wide-angle camera; if you want to increase night vision capabilities, you can buy an infrared camera.

Figure 3: The Raspberry Pi with camera and case ready for installation. Image courtesy of Lukas Biewald

You may also want to put the entire device in a protective case to shield it from the elements. You will also need to connect the Raspberry Pi to power via a micro USB cable. (I have drilled a small hole in the wall to connect my Dropcam to an indoor power outlet, so I already have a USB cable in the right place.)

So far, I have installed several such devices around the house. The camera ribbon cable is thin, so you can mount the Raspberry Pi inside the room and run the cable through the door as I did in my lab (garage).

Figure 4: The camera on the Raspberry Pi comes through my garage door. Image courtesy of Lukas Biewald

Next, you need to install the RPi-Cam-Web interface. This is a very useful software that provides a continuous data stream from the camera via HTTP protocol. Please follow the installation instructions and choose NGINX as the web server. There is a very helpful configuration file in /etc/raspimjpeg that can be used to configure many options.

Configuring Amazon S3 and Amazon Rekognition

If you have not yet created an AWS account, you need to create one now. You should create an IAM user first and allow that user access to S3, Rekognition, and Lambda (which we will use later).

Install the AWS command line interface:

sudo apt install awscli

Set your Raspberry Pi’s region to US East (as of the writing of this article, Rekognition is only available in this region)

Create a facial recognition group:

aws create-collection –collection-id friends

You can use the Unix shell script I wrote to quickly add your friends’ face images:

aws s3 cp $1 s3://doorcamera > output

aws rekognition index-faces \

–image “{\”S3Object\”:{\”Bucket\”:\”doorcamera\”,\”Name\”:\”$1\”}}” \

–collection-id “friends” –detection-attributes “ALL” \

–external-image-id “$2”

Copy this to a file as a shell script. Or enter it in the command line and replace $1 with the local filename of your friend’s picture and $2 with your friend’s name.

Amazon’s Rekognition service uses machine learning to get the distance between points on a face image and uses those points to match it against the indexed face images. So you can train the system with just one image of your friend and still get good results.

Now you can test this facial recognition system with a similar script:

aws s3 cp $1 s3://doorcamera > output

aws rekognition search-faces-by-image –collection-id “friends” \

–image “{\”S3Object\”:{\”Bucket\”:\”doorcamera\”,\”Name\”:\”$2\”}}”

You will receive a large JSON file in return, which contains not only the matching results but also other aspects of the image, including gender, emotion, facial hair, and a bunch of other interesting things.


“FaceRecords”: [


“FaceDetail”: {

“Confidence”: 99.99991607666016,

“Eyeglasses”: {

“Confidence”: 99.99878692626953,

“Value”: false


“Sunglasses”: {

Next, we can write a Python script to download an image from our Raspberry Pi camera and check for faces. In fact, I used a second Raspberry Pi to achieve this, but running it on the same machine would be easier. Just look at the /dev/shm/mjpeg/cam.jpg file, and you will know the image file corresponding to a camera.

Either way, we need to expose this interface function in our web server for later use. I used Flask as my web server.

from flask import Flask, request

import cameras as c

app = Flask(__name__)


def face_camera(camera):

data = c.face_camera(camera)

return “,”.join(data)

if __name__ == ‘__main__’:’′, port=5000)

I have put the parsing code (and all the other code mentioned in this article) on

If you have made it this far, you now have something very interesting to play with. I found Amazon’s service to be excellent at recognizing my friends. The only part that seems a bit troublesome is recognizing my emotions (though this might be more my problem than Amazon’s).

In fact, I sewed one of the Raspberry Pi cameras into a plush toy and placed a very creepy facial recognition teddy bear sentinel on my desk.

Figure 5: The facial recognition teddy bear Freya. Image courtesy of Lukas Biewald

Facial Recognition Camera Used with Amazon Echo

Amazon’s Echo makes high-quality voice commands very simple and has a perfect interface for this kind of project. Unfortunately, the best way to use Echo is to have it communicate directly with a stable web service, but we want to put the Raspberry Pi camera behind the router firewall on a local network — which makes configuration a bit tricky.

We will connect the Echo to an AWS Lambda service, which will communicate with our Raspberry Pi system through an SSH tunnel. This might be a bit complex, but it is the simplest way.

Figure 6: Architecture diagram. Image courtesy of Lukas Biewald

Exposing the HTTP Facial Recognition API via SSH Tunnel

So far, we have built a small web application for facial recognition, and we still need to make it accessible to the outside world. As long as we have a web server somewhere, we can configure an SSH tunnel. However, there is a sweet little application called localtunnel that does all this for us, and you can easily install it:

npm install -g localtunnel

I like to wrap it in a little script to keep it active and prevent it from crashing. Please change MYDOMAIN to something meaningful to you:

until lt –port 5000 -s MYDOMAIN; do

echo ‘lt crashed… respawning…’

sleep 1


Now you can ping your server by visiting

Creating an Alexa Skill

To use our Echo, we need to create a new Alexa Skill. Amazon has a great getting started guide, or you can go directly to the Alexa developer portal.

First, we need to set up an intent:


“intents”: [


“intent”: “PersonCameraIntent”,

“slots”: [


“name”: “camera”,





Then we give Alexa some sample utterances:

PersonCameraIntent tell me who {camera} is seeing

PersonCameraIntent who is {camera} seeing

PersonCameraIntent who is {camera} looking at

PersonCameraIntent who does {camera} see

Next, we need to give Alexa an endpoint, for which we will use a Lambda function.

Configuring a Lambda Function

If you have never used a Lambda function before, this article has earned you a reward! Lambda functions are a simple way to define a unified API for a simple function on Amazon servers and only pay when it is called.

Alexa Skill is a perfect use case for Lambda functions, so Amazon has already set up a template available for Alexa Skills. When Alexa matches one of the PersonCameraIntents we listed, it will call our Lambda function. Change MYDOMAIN to the domain name you used in your local tunnel script, and everything will work great.

You can also use other interesting aspects from the metadata sent by Amazon Rekognition. For example, it can guess facial expressions, so I use it to determine whether the visitor at my door is a happy guest or an angry visitor. You can also have Echo tell you whether the visitor has a beard, is wearing sunglasses, and other features:

def face_camera(intent, session):

card_title = “Face Camera”

if ‘camera’ in intent[‘slots’]:

robot = intent[‘slots’][‘camera’][‘value’]


response = urlopen(‘’ % robot)

data =

if (data== “Not Found”):

speech_output = “%s didn’t see a face” % robot


person, gender, emotion = data.split(“,”)

if person == “” or person is None:

speech_output = “%s didn’t recognize the person, but ” % robot


speech_output = “%s recognized %s and ” % (robot, person)

if gender == “Male”:

speech_output += “he “


speech_output += “she “

speech_output += “seems %s” % emotion.lower()

except URLError as e:

speech_output = “Strange, I couldn’t wake up %s” % robot

except socket.timeout, e:

speech_output = “The Optics Lab Timed out”


speech_output = “I don’t know what robot you’re talking about”

should_end_session = False

return build_response({}, build_speechlet_response(

card_title, speech_output, None, should_end_session))

When you talk to Alexa, it actually parses your speech to find an intent and runs a Lambda function. That function will call an external server and communicate with your Raspberry Pi through an SSH tunnel. The Raspberry Pi will take an image from the camera, upload it to S3, run a deep learning inference algorithm to match it against your friend’s face image, and send the parsing results to your Echo. Then Echo will speak to you. But this whole process happens very quickly! To see it for yourself, please watch my video:

You can take the same technology and apply or extend it to many cool aspects. For example, I put the code from my “robot Raspberry Pi/TensorFlow project” onto my robot. Now they can talk to me and tell me what they are looking at. I am also considering using this GitHub project to connect the Raspberry Pi to my August lock, so my door will automatically open for my friends or lock automatically for an angry visitor at the door.

This article originally appeared in English: “Build a talking, face-recognizing doorbell for about $100”.

Lukas Biewald

Lukas Biewald is the founder and CEO of CrowdFlower. Founded in 2009, CrowdFlower is a data enrichment platform that helps businesses obtain on-demand human labor to collect, generate training data, and engage in human-machine learning loops. After earning a Bachelor’s degree in Mathematics and a Master’s degree in Computer Science from Stanford University, Lukas led Yahoo Japan’s search relevance team. He then went to Powerset, working as a senior data scientist. Powerset was acquired by Microsoft in 2008. Lukas was also named one of the 30 under 30 by Forbes. Lukas is also an expert Go player.

