ESP32-Camera Streaming and Image Capture

This topic shows many of the ways one can stream video and/or capture images from the esp32-camera by AIthinker. For this purpose the example “camera_web_server” found in the esp-who repository (https://github.com/espressif/esp-who.git) was built and flashed to a device (use esp-idf v3.2 or v3.3 in order to build successfully). The example’s main web page is only one of many ways to stream and capture. In the following various approaches are demonstrated. At the end is described how to configure the camera’s web server to handle WIFI interruptions that can kill the streaming.

Using the Browser

Using VLC

ffmpeg

ffplay

Capture and Display Stream in opencv

Capture Stream and Save to File with opencv

Streaming with Gstreamer

Continuous Streaming Problem Solved

Camera Failure Update

Browser

The relevant code for the browser is found in app_httpd.c. The default port is port 80 (found in esp_http_server.h). In particular look at app_httpd_main() to find the URI for capturing an image directly.


    httpd_uri_t capture_uri = {
        .uri       = "/capture",
        .method    = HTTP_GET,
        .handler   = capture_handler,
        .user_ctx  = NULL
    };

Therefore, one can capture an image simply by entering http://<your IP>/capture in any browser. In a similar vein one can stream video in any browser. Note the URI:


  httpd_uri_t stream_uri = {
        .uri       = "/stream",
        .method    = HTTP_GET,
        .handler   = stream_handler,
        .user_ctx  = NULL
    };
........
config.server_port += 1;

However, the server port is incremented by one for streaming. Therefore, entering http://<your IP>:81/stream in a browser will start the video stream.

Using VLC

That one now knows the URL it would seem straightforward to use VLC by inputting the network stream URL. However, it was found that only capturing an image ( http://<your IP>/capture ) worked by displaying the image for 10 seconds.

When streaming video was attempted ( http://<your IP>:81/stream ) VLC failed to display it even though Current Media Information-Statistics showed that input bytes were streaming. It appears that VLC cannot determine what codec to use. This is a strange result since browsers easily understand that TCP packets are being sent over HTTPD.

ffmpeg

ffmpeg is used to either capture an image and save it to a file or to stream a video to a file.

ffmpeg -i http://yourIP/capture -c copy -y my.jpg

ffmpeg -i http://yourIP:81/stream -an -t 30 -c copy -y out.mp4

ffplay

ffplay essentially uses ffmpeg to stream video to the screen.

ffplay –i http:/yourIP:81/stream

Capture and Display Stream with opencv

The following cpp code needs to be built on a machine with opencv installed.


#include "opencv2/opencv.hpp"

using namespace cv;

int main(int, char**)
{
    VideoCapture cap("http://yourIP:81/stream"); // open the default camera
    if(!cap.isOpened())  // check if we succeeded
        return -1;

    namedWindow("Stream", CV_WINDOW_AUTOSIZE);

    for(;;)
    {
        Mat frame;
        cap >> frame; // get a new frame from camera
        // do any processing
        imshow("Stream", frame);
        if(waitKey(30) >= 0) break;   // you can increase delay to 2 seconds here
    }
    // the camera will be deinitialized automatically in VideoCapture destructor
    return 0;
}

Capture Stream and Save to File with opencv

Example cpp code:

#include "opencv2/opencv.hpp"
#include "iostream"
#include "stdio.h"
#include "time.h"
#include "chrono"

using namespace std;
using namespace cv;
 
int main(){
 
  // Create a VideoCapture object and use camera to capture the video
  VideoCapture cap("http://yourIP:81/stream"); 
 
  // Check if camera opened successfully
  if(!cap.isOpened())
  {
    cout << "Error opening video stream" << endl; 
    return -1; 
  } 
 
  // Default resolution of the frame is obtained.The default resolution is system dependent. 
//  int frame_width = cap.get(CV_CAP_PROP_FRAME_WIDTH); 
//  int frame_height = cap.get(CV_CAP_PROP_FRAME_HEIGHT); 

  //this returns -1 which seems to indicate the propid is not supported
  cout << "fps = " << cap.get(CV_CAP_PROP_FPS) << endl;

  int frame_width = 320;
  int frame_height =240;

  cap.set(CV_CAP_PROP_FRAME_WIDTH,  frame_width);
  cap.set(CV_CAP_PROP_FRAME_HEIGHT, frame_height);

  printf("width = %d  height = %d\r\n", frame_width, frame_height);   

  VideoWriter video("outmjpg.avi",CV_FOURCC('M','J','P','G'),30, \
Size(frame_width,frame_height)); 

auto start = std::chrono::high_resolution_clock::now();
int framecount = 0;
  while(1)
  { 
    Mat frame; 
     
    framecount = framecount + 1;

    // Capture frame-by-frame 
    cap >> frame;
  
    // If the frame is empty, break immediately
    if (frame.empty())
      break;
     
    // Write the frame into the file 'outcpp.avi'
    video.write(frame);
    
    // Display the resulting frame    
    imshow( "Frame", frame );
  
    // Press  ESC on keyboard to  exit
    char c = (char)waitKey(1);
    if( c == 27 ) 
      break;
    auto finish = std::chrono::high_resolution_clock::now();
    std::chrono::duration elapsed = finish - start;

    if ( elapsed.count() > 10 )
    {
      cout << "Elapsed time: " << elapsed.count() << " s\n";
      cout << "FPS = " << framecount/elapsed.count() << endl;
      break;
    }
  }
  cout << "frame count = " << framecount << endl;
  // When everything done, release the video capture and write object
  cap.release();
  video.release();
 
  // Closes all the windows
  destroyAllWindows();
  return 0;
}

Streaming with Gstreamer

gstreamer is a powerful, but complicated-to-use, tool. Here it is used to display the stream from the esp32-cam. Presented here is the command that will function correctly but only after an important change is made to the code in app_httpd.c. First the command:

GST_DEBUG=*:4 gst-launch-1.0 -v souphttpsrc location=http://yourIP:81/stream  ! queue ! \
multipartdemux ! jpegdec ! autovideosink

The problem arises apparently because of the way gstreamer finds the boundary for defining the start of the stream of the jpeg image and the end of the image. The stream from the webserver is essentially “mjpeg” where the TCP packets carry the image data of each jpeg frame. Below is an example of what a camera sends at the beginning of a stream of jpeg images:

HTTP/1.0 200 OK
Server: alphapd
Date: Mon Jan  2 02:20:20 2017
Pragma: no-cache
Cache-Control: no-cache
Content-Type: multipart/x-mixed-replace;boundary=video boundary--
--video boundary--
Content-length: 12132
Date: 01-02-2017 02:20:20 AM IO_00000000_PT_000_000
Content-type: image/jpeg

Note that the line Content-Type defines the boundary at which a jpeg frame ends and the next one begins. However also note that this boundary value is repeated on the next line. For some reason gstreamer looks to this second line to find the value of the boundary, ignoring the definition on the Content-Type line. As written in app_httpd.c this second line is missing and as a result gstreamer fails with an error message indicating that the boundary value is missing. Other software evidently read the Content-Type line. To correct this see below what needs to be changed in app_httpd.c in the stream_handler function:

 static esp_err_t stream_handler(httpd_req_t *req){
    camera_fb_t * fb = NULL;
    esp_err_t res = ESP_OK;
    size_t _jpg_buf_len = 0;
    uint8_t * _jpg_buf = NULL;
    char * part_buf[64];
#if CONFIG_ESP_FACE_DETECT_ENABLED
    dl_matrix3du_t *image_matrix = NULL;
    bool detected = false;
    int face_id = 0;
    int64_t fr_start = 0;
    int64_t fr_ready = 0;
    int64_t fr_face = 0;
    int64_t fr_recognize = 0;
    int64_t fr_encode = 0;
#endif

    static int64_t last_frame = 0;
    if(!last_frame) {
        last_frame = esp_timer_get_time();
    }

    res = httpd_resp_set_type(req, _STREAM_CONTENT_TYPE);
    if(res != ESP_OK){
        return res;
    }
	/*the following statement is needed for Gstreamer to work*/
    res = httpd_resp_send_chunk(req, _STREAM_BOUNDARY, strlen(_STREAM_BOUNDARY));
    httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");

    while(true){
	........................

With this change gstreamer will stream and display video from the esp32-cam. But another problem quickly arises. Due to wifi interruptions the stream of jpeg images will not be perfectly continuous and soon this error appears: <Souphttpsrc ‘error: Server does not support seeking’>. The only known solution for this is a work-around as follows:

curl - - retry 3 http://yourIP:81/stream | gst-launch-1.0 -v fdsrc 
 do-timestamp=true ! queue ! multipartdemux ! jpegdec ! autovideosink

Continuous Streaming Problem Solved

In all cases of various software streaming video from the esp32-cam server, it was found during the testing that there were periods where streaming stopped due to periods of WIFI interruptions. These periods caused the web server to stop sending and the receiving software client, depending on what above software is being used, to behave in various manners. The continuous streaming was unreliable. The esp32-cam returned error code 11 when this happened:

I (4863461) camera_httpd: MJPG: 168569B 238ms (4.2fps), AVG: 275ms (3.6fps), 0+0+0+0=0 0
I (4863781) camera_httpd: MJPG: 171587B 321ms (3.1fps), AVG: 279ms (3.6fps), 0+0+0+0=0 0
I (4864021) camera_httpd: MJPG: 171038B 237ms (4.2fps), AVG: 276ms (3.6fps), 0+0+0+0=0 0
W (4874281) httpd_txrx: httpd_sock_err: error in send : 11
W (4874281) httpd_uri: httpd_uri: uri handler execution failed
dhcps: send_nak>>udp_sendto result 0

Source of problem and solution: The home network where this was tested simply consists of a main router with a few wired clients and the rest are WIFI clients. So, what was going on? To debug the problem Wireshark was run on a client receiving the stream. At the moment of streaming shutdown the captured packets showed that a device had broadcast a STP packet (STP=Spanning Tree Protocol), which shuts down traffic on the client for various time periods, typically 10 seconds or more. Wireshark identified the device as a Netgear WIFI device attached to an Echostar root bridge system. Indeed, this was the Dish Hopper causing the interruptions!! Since the home LAN is not part of a mesh network it would have been nice to turn off STP on the Echostar. However, it was learned that STP can interrupt network traffice for up to a maximum of 30 seconds. Therefore, in esp_httpd_server.h is found HTTPD_DEFAULT_CONFIG and setting .recv_wait_timeout=30 and .send_wait_timeout=30 solves the problem. Subsequently, testing showed that problem went away. It is noted here that these timeout periods are initially set at 5 seconds.

Camera Failure – Update

After a few months of operation one of the devices failed with the following error:

camera probe failed with error 0x20004

The fix was to replace the OV2640 camera. It seems this camera has a finite failure rate.

You may also like...

5 Responses

  1. Philip Chisholm says:

    Hi Frank – Nice tutorial!

    I have been testing out one of these esp32cam – very interesting device.

    But I really want to be able to do live streaming over the internet to a website with the ESP32cam.
    I can’t seem to find tutorials on how someone can actually stream video remotely (without having to mess with router ports).

    Are you able to do that?
    How!?
    I was looking at webRTC – but I don’t think the esp32 can run webrtc.

    Cheers.

    Phil

    • Frank says:

      Hi Phillip, I just saw your comment this morning. I am not quite sure what you want to do. When you say stream video remotely or over the internet to a web site I think of simply accessing the ESP32cam’s webserver as described in my blog here. If the webserver (and its URL) is inside your LAN then you need to set up port forwarding on your router for any client on the internet to obtain the ESP32cam’s video stream. When you build the code change the streaming server port to something of your choice. Then on your router reserve the address that the router DHCP provides to your device and then port forward it along with the port. Address reservation and port forwarding can be found on all routers. That is how I access my ESP32cam from my phone when I am out somewhere.
      If I am not addressing what you are trying to do, let me know. I may be missing your point.

  2. Hi Frank

    Thanks!
    Sorry slow reply – I just saw your comment…
    Let me try to be more clear….

    I am working on a mechanical device with an esp32cam and wanted to be able to connect to any wifi system and control the device remotely – and not have to deal with router settings…

    I see that gstreamer has a webRTC module.
    And webRTC uses ICE to make remote internet connections without having to deal with router settings and firewalls.
    And webRTC makes it easy to do P2P video streaming (to cut down on costs of servers when streaming video).

    But not sure if the gstreamer webRTC module will work on the esp32…it would be nice if there was a ‘micro’ webRTC – similar to the micro RTSP module. https://github.com/geeksville

    I am just a mechanical engineer…so my coding skills are not great (but working on them). I know enough programming to be dangerous…but not enough to sort out really technical coding problems.

    Basically I just want to get connectivity to the internet easily with my device easily…and be able to control my device remotely and stream video P2P (well device to peer D2P).

    If that makes sense!

    Thx.

    Phil

    • Frank says:

      Hi Phil,
      Until you pointed out webRTC I knew nothing about it. Now, after a bit of reading about it here is my brief take-away.
      webRTC runs on machines with browsers, and browsers run on machines with operating systems that sit on top of kernels and the machines themselves usually have considerable resources like lots of RAM (>= 1 GB). The ESP32 API is essentially what I would call a kernel and the device itself lacks the kind of resources an operating system like Windows, Ubuntu, etc. would need. An ESP32, though, is essentially a powerful microcontroller. So, your desire to simply use an ESP32 as a peer in webRTC is I believe what we used to call “un-obtainium”. Yes, the feature where peer to peer is established more or less automatically is nifty.

      Regarding browsers see for example: https://www.html5rocks.com/en/tutorials/webrtc/basics/#toc-support

      I suggest you look at another platform. Is this to be commercialized? A Raspberry Pi3 Model B with Pi Cam could work in your application but these are not really commercializable. Maybe a Qualcomm Dragonboard 410C or something like that. They support debian linux operating systems (Ubuntu, Linaro, etc.) and most browsers are available to run on it. If you want something really powerful consider Nvidia’s Nano (~$100-150) plus a Pi Cam (~$25) since the Nano has drivers for it. The Nano software package will also do AI/Machine Learning, all through the GPU. I suppose, though, you want the cost of the ESP32cam……

  3. Philip Chisholm says:

    Hi Frank – Yes I was thinking of using a lower cost unit… But your comments most helpful. Cheers.
    Phil

Leave a Reply

Your email address will not be published. Required fields are marked *