Create Your Own Smart Baby Monitor with Raspberry Pi

Original from Big Data Digest

Produced by Big Data Digest

Source: Medium

Translated by: Chen Zhiyan

As a new dad and programmer, the question I ponder most in my new role is “Can the task of caring for a baby really be automated?“

Of course, it might be possible, even if there is a diaper-changing robot (assuming enough parents agree to test such devices on their toddling children), there are really very few parents willing to automate baby care.

As a father, the first thing I realized is: babies cry a lot, and even when I’m home, I can’t always hear my child crying.

Usually, commercial baby monitors can fill this gap, acting as a intercom that lets you hear the baby’s cries from another room.

But I quickly realized: commercial baby monitors are not as smart as I imagined:

They only serve as a transmitter: carrying sound from the source to the speaker without being able to detect the meaning of the baby’s cries;
When parents move to another room, they have to take the speaker with them, and cannot play the sound on any other existing audio devices;
Speakers are usually low-power speakers that cannot connect to external speakers – meaning that if I’m playing music in another room, I might not hear the baby’s cries, even if the monitor and I are in the same room;
Most speakers operate on low-power radio waves, meaning if the baby is in his/her room and you have to walk downstairs, they can’t work.

Therefore, I came up with the idea of creating a better “smart baby monitor” myself.

Without hesitation, I first defined some necessary functions for this “smart baby monitor”.

It can run on an affordable Raspberry Pi (RaspberryPI) with a USB microphone.
It should detect when the child starts/stops crying and notify me (ideally on my phone), track data points on my dashboard, or perform corresponding tasks. It should not just be an intercom that simply transmits sound from one source to another compatible device.
It should be able to transmit audio to devices like speakers, smartphones, and computers.
It should not be affected by the distance between the source and the speaker, and should not require moving speakers around the house.
It should also have a camera that can monitor the child in real-time, so when he starts crying, I can capture a picture or a short video of the crib to check what’s wrong.

Let’s see how a new dad uses his engineer’s brain and open-source tools to accomplish this task.

Collecting Audio Samples

First, purchase a Raspberry Pi (RaspberryPi), burn a Linux operating system onto an SD card (it is recommended to use RaspberryPI3 or a higher version), and run the Tensorflow model. You can also buy a USB microphone compatible with the Raspberry Pi.

Then install the necessary items:

[sudo] apt-get install ffmpeg lame libatlas-base-dev alsa-utils[sudo] pip3 install tensorflow

The first step is to record enough audio samples of when the baby cries and when he does not cry. These samples will later be used to train the audio detection model.

Note: In this example, I will show how to use sound detection to recognize a baby’s cry; the same precise program can be used to detect any other type of sound – as long as they are long enough (for example: alarms or drilling sounds from a neighbor’s house).

First, check the audio input devices:

arecord -l

On the Raspberry Pi (RaspberryPI), you get the following output (note, there are two USB microphones):

**** List of CAPTURE Hardware Devices ****card 1: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]  Subdevices: 0/1  Subdevice #0: subdevice #0card 2: Device_1 [USB PnP Sound Device], device 0: USB Audio [USB Audio]  Subdevices: 0/1  Subdevice #0: subdevice #0

I use the second microphone to record sound – that is card 2, device 0. The ALSA method to identify it is either hw:2,0 (direct access to the hardware device) or plughw:2,0 (which will input sampling rate and format conversion plugins if needed). Make sure there is enough space on the SD card, then start recording some audio:

arecord -D plughw:2,0 -c 1 -f cd | lame - audio.mp3

With the child in the same room, record audio for a few minutes or a few hours – preferably long periods of silence, baby cries, and other unrelated sounds – after recording, press Ctrl-C. Repeat this process as many times as possible, at different times of the day or on different days to obtain different audio samples.

Labeling Audio Samples

Once you have enough audio samples, you can copy them to your computer to train the model – you can use SCP to copy files or copy directly from the SD card.

Store them all in the same directory, for example: ~/datasets/sound-detect/audio. Additionally, create a new folder for each sample audio file, containing an audio file (named audio.mp3) and a label file (named labels.json), using it to label the positive/negative audio segments in the audio file. The structure of the original dataset is as follows:

~/datasets/sound-detect/audio  -&gt; sample_1    -&gt; audio.mp3    -&gt; labels.json  -&gt; sample_2    -&gt; audio.mp3    -&gt; labels.json  ...

Next: label the recorded audio files – if it contains several hours of the baby crying, it can be particularly abusive. Open each dataset audio file in your favorite audio player or Audacity, and create a new label.json file in each sample directory. Determine the exact start and end times of the crying and label them in labels.json as a key-value structure of time_string -> label. Example:

{  "00:00": "negative",  "02:13": "positive",  "04:57": "negative",  "15:41": "positive",  "18:24": "negative"}

In the example above, all audio segments from 00:00 to 02:12 will be labeled as negative, from 02:13 to 04:56 will be labeled as positive, and so on.

Generating the Dataset

Once all audio samples are labeled, the next step is to generate the dataset, which will finally be input into the Tensorflow model. First, create a generic library named micmon and a set of utilities for sound monitoring. Then, start installing:

git clone [email protected]:/BlackLight/micmon.gitcd micmon[sudo] pip3 install -r requirements.txt[sudo] python3 setup.py build install

This model is designed based on the frequency samples of the audio rather than the raw audio, because here we want to detect a specific sound that has a specific “spectrum” label, i.e., the fundamental frequency (or the narrow band range of the fundamental frequency) and a set of specific harmonics. The ratio of these harmonic frequencies to the fundamental wave is neither affected by amplitude (the frequency ratio is constant, regardless of input amplitude) nor by phase (continuous sounds will have the same spectral characteristics regardless of when recording starts).

This property, which is independent of amplitude and phase, makes it more likely to train a robust sound detection model rather than simply feeding raw audio samples into the model. Additionally, the model can be simpler (it can group multiple frequencies without affecting performance, thus effectively achieving dimensionality reduction), regardless of how long the sample duration is, the model will take 50-100 frequency bands as input values, and the length of the input increases with the duration of the sample, making overfitting less likely.

micmon can calculate the FFT (Fast Fourier Transform) of certain segments of audio samples, divide the resulting spectrum into low-pass and high-pass filter frequency bands, and save the results to a set of numpy compressed (.npz) files. This can be achieved by executing the micmon-datagen command on the command line:

micmon-datagen \
    --low 250 --high 2500 --bins 100 \
    --sample-duration 2 --channels 1 \
    ~/datasets/sound-detect/audio  ~/datasets/sound-detect/data

In the example above, we generate a dataset from the original audio samples stored in ~/dataset/sound-detect/audio and store the generated spectral data in ~/datasets/sound-detect/data. –low and ~/datasets/sound-detect/data. –high indicate the lowest and highest frequencies, with the default lowest frequency being 20Hz (the lowest frequency audible to the human ear) and the highest frequency being 20kHz (the highest frequency audible to healthy young people).

By limiting this range, we capture as much as possible of the other types of audio background and unrelated harmonic sounds we wish to detect. In this case, the range of 250-2500 Hz is sufficient to detect a baby’s cry.

Baby cries are usually high-frequency (the highest note that an opera soprano can reach is around 1000 Hz), so I set at least double the highest frequency to ensure enough high harmonics (harmonics are higher frequencies), but also do not set the highest frequency too high to prevent harmonics from other background sounds. I cut off audio signals with frequencies below 250 Hz – baby cries are unlikely to occur in the low-frequency range; for example, you can open some positive audio samples and use an equalizer/spectrum analyzer to check which frequencies dominate in positive samples and focus the dataset on these frequencies. –bins specifies the number of groups in frequency space (default: 100), a larger value means higher frequency resolution/granularity, but if it is too high, it may make the model prone to overfitting.

The script will segment the raw audio into smaller segments and calculate the spectral labels for each segment. The example duration specifies how long each audio segment lasts (default: 2 seconds). For longer-lasting sounds, taking a larger value will work better, but it will also reduce the detection time and may fail on short sounds. For shorter-lasting sounds, a lower value can be taken, but the captured segments may not have enough information to reliably identify the sound.

In addition to the micmon-datagen script, you can also use the micmon API to write scripts to generate the dataset. Example:

import os
from micmon.audio import AudioDirectory, AudioPlayer, AudioFile
from micmon.dataset import DatasetWriter
basedir = os.path.expanduser('~/datasets/sound-detect')
audio_dir = os.path.join(basedir, 'audio')
datasets_dir = os.path.join(basedir, 'data')
cutoff_frequencies = [250, 2500]
# Scan the base audio_dir for labelled audio samples
audio_dirs = AudioDirectory.scan(audio_dir)
# Save the spectrum information and labels of the samples to a
# different compressed file for each audio file.
for audio_dir in audio_dirs:    dataset_file = os.path.join(datasets_dir, os.path.basename(audio_dir.path) + '.npz')    print(f'Processing audio sample {audio_dir.path}')    with AudioFile(audio_dir) as reader, \            DatasetWriter(dataset_file,                          low_freq=cutoff_frequencies[0],                          high_freq=cutoff_frequencies[1]) as writer:        for sample in reader:            writer += sample

Whether using micmon-datagen or using the micmon Python API to generate the dataset, at the end of the process, you should find a bunch of .npz files in the ~/datasets/sound-detect/data directory, each corresponding to a labeled audio raw file. After that, you can use this dataset to train a neural network for sound detection.

Training the Model

micmon uses Tensorflow+Keras to define and train the model, and with the Python API, it can be easily implemented. For example:

import os
from tensorflow.keras import layers
from micmon.dataset import Dataset
from micmon.model import Model
# This is a directory that contains the saved .npz dataset files
datasets_dir = os.path.expanduser('~/datasets/sound-detect/data')
# This is the output directory where the model will be saved
model_dir = os.path.expanduser('~/models/sound-detect')
# This is the number of training epochs for each dataset sample
epochs = 2
# Load the datasets from the compressed files.
# 70% of the data points will be included in the training set,
# 30% of the data points will be included in the evaluation set
# and used to evaluate the performance of the model.
datasets = Dataset.scan(datasets_dir, validation_split=0.3)
labels = ['negative', 'positive']
freq_bins = len(datasets[0].samples[0])
# Create a network with 4 layers (one input layer, two intermediate layers and one output layer).
# The first intermediate layer in this example will have twice the number of units as the number
# of input units, while the second intermediate layer will have 75% of the number of
# input units. We also specify the names for the labels and the low and high frequency range
# used when sampling.
model = Model(    [        layers.Input(shape=(freq_bins,)),        layers.Dense(int(2 * freq_bins), activation='relu'),        layers.Dense(int(0.75 * freq_bins), activation='relu'),        layers.Dense(len(labels), activation='softmax'),    ],    labels=labels,    low_freq=datasets[0].low_freq,    high_freq=datasets[0].high_freq)
# Train the model
for epoch in range(epochs):    for i, dataset in enumerate(datasets):        print(f'[epoch {epoch+1}/{epochs}] ')        model.fit(dataset)        evaluation = model.evaluate(dataset)        print(f'Validation set loss and accuracy: {evaluation}')# Save the model
model.save(model_dir, overwrite=True)

After running this script (when satisfied with the model’s accuracy), you should find the saved new model in the ~/models/sound-detect directory. In my case, collecting around 5 hours of sound was enough, and by defining an optimal frequency range for training the model, the accuracy was over 98%. If training the model on a computer, simply copy it to the Raspberry Pi, and you are ready for the next step.

Using the Model for Prediction

At this point, create a script: using the previously trained model to notify us when the baby starts crying:

import os
from micmon.audio import AudioDevice
from micmon.model import Model
model_dir = os.path.expanduser('~/models/sound-detect')
model = Model.load(model_dir)
audio_system = 'alsa'        # Supported: alsa and pulseaudio
device = 'plughw:2,0'  # Get list of recognized input devices with arecord -l
with AudioDevice(audio_system, device=audio_device) as source:    for sample in source:        source.pause()  # Pause recording while we process the frame        prediction = model.predict(sample)        print(prediction)        source.resume() # Resume recording

Run the script on the Raspberry Pi, and let it run for a while – if no cries are detected in the past 2 seconds, it will print negative in the standard output; if cries are detected in the past 2 seconds, it will print positive in the standard output.

However, if the baby cries, simply printing the message to standard output isn’t very effective – we want a clear real-time notification!

This functionality can be implemented using Platypush. In this example, we will use the pushbullet integration to send a message to our phone when a cry is detected. Next, install Redis (Platypush is used to receive messages) and Platypush, using HTTP and Pushbullet for integration:

[sudo] apt-get install redis-server[sudo] systemctl start redis-server.service[sudo] systemctl enable redis-server.service[sudo] pip3 install 'platypush[http,pushbullet]'

Install the Pushbullet application on your smartphone and obtain an API token at pushbullet.com. Then create a ~/.config/platypush/config.yaml file that enables HTTP and Pushbullet integration:

backend.http:  enabled: True
pushbullet:  token: YOUR_TOKEN

Next, modify the previous script to not print the message to standard output but trigger a custom event CustomEvent that can be captured by the Platypush hook:

#!/usr/bin/python3
import argparse
import logging
import os
import sys
from platypush import RedisBus
from platypush.message.event.custom import CustomEvent
logger = logging.getLogger('micmon')

def get_args():    parser = argparse.ArgumentParser()    parser.add_argument('model_path', help='Path to the file/directory containing the saved Tensorflow model')    parser.add_argument('-i', help='Input sound device (e.g. hw:0,1 or default)', required=True, dest='sound_device')    parser.add_argument('-e', help='Name of the event that should be raised when a positive event occurs', required=True, dest='event_type')    parser.add_argument('-s', '--sound-server', help='Sound server to be used (available: alsa, pulse)', required=False, default='alsa', dest='sound_server')    parser.add_argument('-P', '--positive-label', help='Model output label name/index to indicate a positive sample (default: positive)', required=False, default='positive', dest='positive_label')    parser.add_argument('-N', '--negative-label', help='Model output label name/index to indicate a negative sample (default: negative)', required=False, default='negative', dest='negative_label')    parser.add_argument('-l', '--sample-duration', help='Length of the FFT audio samples (default: 2 seconds)', required=False, type=float, default=2., dest='sample_duration')    parser.add_argument('-r', '--sample-rate', help='Sample rate (default: 44100 Hz)', required=False, type=int, default=44100, dest='sample_rate')    parser.add_argument('-c', '--channels', help='Number of audio recording channels (default: 1)', required=False, type=int, default=1, dest='channels')    parser.add_argument('-f', '--ffmpeg-bin', help='FFmpeg executable path (default: ffmpeg)', required=False, default='ffmpeg', dest='ffmpeg_bin')    parser.add_argument('-v', '--verbose', help='Verbose/debug mode', required=False, action='store_true', dest='debug')    parser.add_argument('-w', '--window-duration', help='Duration of the look-back window (default: 10 seconds)', required=False, type=float, default=10., dest='window_length')    parser.add_argument('-n', '--positive-samples', help='Number of positive samples detected over the window duration to trigger the event (default: 1)', required=False, type=int, default=1, dest='positive_samples')    opts, args = parser.parse_known_args(sys.argv[1:])    return opts

def main():    args = get_args()    if args.debug:        logger.setLevel(logging.DEBUG)    model_dir = os.path.abspath(os.path.expanduser(args.model_path))    model = Model.load(model_dir)    window = []    cur_prediction = args.negative_label    bus = RedisBus()    with AudioDevice(system=args.sound_server,                     device=args.sound_device,                     sample_duration=args.sample_duration,                     sample_rate=args.sample_rate,                     channels=args.channels,                     ffmpeg_bin=args.ffmpeg_bin,                     debug=args.debug) as source:        for sample in source:            source.pause()  # Pause recording while we process the frame            prediction = model.predict(sample)            logger.debug(f'Sample prediction: {prediction}')            has_change = False            if len(window) &lt; args.window_length:                window += [prediction]            else:                window = window[1:] + [prediction]            positive_samples = len([pred for pred in window if pred == args.positive_label])            if args.positive_samples &lt;= positive_samples and \
                    prediction == args.positive_label and \
                    cur_prediction != args.positive_label:                cur_prediction = args.positive_label                has_change = True                logging.info(f'Positive sample threshold detected ({positive_samples}/{len(window)})')            elif args.positive_samples &gt; positive_samples and \
                    prediction == args.negative_label and \
                    cur_prediction != args.negative_label:                cur_prediction = args.negative_label                has_change = True                logging.info(f'Negative sample threshold detected ({len(window)-positive_samples}/{len(window)})')            if has_change:                evt = CustomEvent(subtype=args.event_type, state=prediction)                bus.post(evt)            source.resume() # Resume recording

if __name__ == '__main__':    main()

Save the above script as ~/bin/micmon_detect.py. If positive_samples are detected within the sliding window time (to reduce noise caused by prediction errors or temporary faults), the script triggers an event and it only triggers the event when the current prediction changes from negative to positive. It is then dispatched to Platypush. For other different sound models (not necessarily baby cries), this script is also generic and will work for other positive/negative labels, other frequency ranges, and other types of output events.

Create a Platypush hook to respond to the event and send notifications to the device. First, create the Platypush script directory:

mkdir -p ~/.config/platypush/scriptscd ~/.config/platypush/scripts# Define the directory as a module
touch __init__.py# Create a script for the baby-cry events
vi babymonitor.py

The content of babymonitor.py is:

from platypush.context import get_plugin
from platypush.event.hook import hook
from platypush.message.event.custom import CustomEvent

@hook(CustomEvent, subtype='baby-cry', state='positive')
def on_baby_cry_start(event, **_):    pb = get_plugin('pushbullet')    pb.send_note(title='Baby cry status', body='The baby is crying!')

@hook(CustomEvent, subtype='baby-cry', state='negative')
def on_baby_cry_stop(event, **_):    pb = get_plugin('pushbullet')    pb.send_note(title='Baby cry status', body='The baby stopped crying - good job!')

Create a service file for Platypush and start/enable the service so that it starts in the terminal:

mkdir -p ~/.config/systemd/userwget -O ~/.config/systemd/user/platypush.service \
    https://raw.githubusercontent.com/BlackLight/platypush/master/examples/systemd/platypush.service
systemctl --user start platypush.servicesystemctl --user enable platypush.service

Create a service file for the baby monitor – for example:

~/.config/systemd/user/babymonitor.service:

[Unit]Description=Monitor to detect my baby's criesAfter=network.target sound.target[Service]ExecStart=/home/pi/bin/micmon_detect.py -i plughw:2,0 -e baby-cry -w 10 -n 2 ~/models/sound-detectRestart=alwaysRestartSec=10[Install]WantedBy=default.target

This service will start the microphone monitor on the ALSA device plughw:2,0, and if at least 2 positive 2-second samples are detected in the past 10 seconds, and the previous state was negative, it will trigger the state=positive event; if fewer than 2 positive samples are detected in the past 10 seconds, and the previous state was positive, it will trigger state=negative. Then you can start/enable the service:

systemctl --user start babymonitor.servicesystemctl --user enable babymonitor.service

Confirm that notifications will be received on your phone once the baby starts crying. If you do not receive notifications, you can check the audio sample labels, the neural network architecture and parameters, or whether the sample length/window/bands and other parameters are correct.

Additionally, this is a relatively basic automation example – more automation tasks can be added to it. For example, requests can be sent to another Platypush device (e.g., in the bedroom or living room) to announce loudly that the baby is crying using the TTS plugin. You can also extend the micmon_detect.py script so that the captured audio samples can also be streamed via HTTP – for example, using a Flask wrapper and ffmpeg for audio conversion. Another interesting use case is to send data points to a local database when the baby starts/stops crying (you can refer to my previous article on “How to Build Your Home Infrastructure for Data Collection and Visualization and Be the Real Owner” https://towardsdatascience.com/how-to-build-your-home-infrastructure-for-data-collection-and-visualization-and-be-the-real-owner-af9b33723b0c): this is a set of quite useful data that can be used to track when the baby is sleeping, awake, or needs feeding. While monitoring the baby has always been the reason I developed micmon, the same program can also be used to train and detect models for other types of sounds. Finally, consider using a good power supply or lithium battery pack so that the monitor can be portable.

Installing the Baby Camera

With a good audio feed and detection method, you can also add a video feed to keep an eye on the child. Initially, I installed a PiCamera on the Raspberry Pi 3 for audio detection, but later found this configuration quite impractical. Think about it: a Raspberry Pi 3, an additional battery pack, and a camera combined would be quite cumbersome; if you find a lightweight camera that can be easily mounted on a stand or flexible arm and can be moved around, you can closely monitor the child wherever he/she is. Ultimately, I chose the smaller Raspberry Pi Zero, which is compatible with the PiCamera, and paired it with a small battery.

Create Your Own Smart Baby Monitor with Raspberry Pi

Prototype of the baby monitor camera module

Again, insert an SD card burned with an operating system compatible with the Raspberry Pi. Then insert a camera compatible with the Raspberry Pi into its slot, ensuring that the camera module is enabled in raspi-config, and install Platypush integrated with PiCamera:

[sudo] pip3 install 'platypush[http,camera,picamera]'

Then add camera configuration in ~/.config/platypush/config.yaml:

camera.pi:    listen_port: 5001

Check this configuration when Platypush restarts, and obtain snapshots from the camera via HTTP:

wget http://raspberry-pi:8008/camera/pi/photo.jpg

Or open the video in a browser:

http://raspberry-pi:8008/camera/pi/video.mjpg

Likewise, when the application starts, you can create a hook that starts the camera feed via TCP/H264:

mkdir -p ~/.config/platypush/scriptscd ~/.config/platypush/scripts
touch __init__.py
vi camera.py

You can also watch the video via VLC:

vlc tcp/h264://raspberry-pi:5001

Watch the video on your phone using the VLC app or the RPi Camera Viewer app.

From idea to final implementation looks pretty good, and this can be considered a new dad’s self-redemption from the mundane tasks of caregiving.

Original link:

https://towardsdatascience.com/create-your-own-smart-baby-monitor-with-a-raspberrypi-and-tensorflow-5b25713410ca

Intern/Full-Time Editor Recruitment

Join us to experience every detail of a professional technology media writing, growing with a group of the best people in the most promising industry. Located in Beijing, Tsinghua East Gate, reply with “Recruitment” on the Big Data Digest homepage dialogue page to learn more. Please send your resume directly to [email protected]

Reply “Volunteer” to join us