Skeltrack, the Open Source library for skeleton tracking, keeps being improved here in Igalia and today we are releasing version 0.1.8.
Since July we have had the valuable extra help of Iago Lรณpez who is doing an internship in Igalia’s Interactivity Team.
What’s new
Several bug fixes (including the introspection), both in the library and the supplied example were fixed.
The threading model was simplified and the skeleton tracking implementation was divided in several files for a better organized source code.
While the above is nice, the coolest thing about this release (and kudos to Iago for this) is that it makes Skeltrack work better with scenes where the user is not completely alone. The issue was that if there was another person or object (think chairs, tables, etc. for a real life example) was in the scene they would confuse the skeleton tracking. After this version, while not being perfect (objects/people cannot be touching the user), the algorithm will try to discard objects that are not the user.
But what about having two people in a scene, which one will it choose? To control this, we have introduced a new function:
skeltrack_skeleton_set_focus_point (SkeltrackSkeleton *skeleton,
gint x,
gint y,
gint z)
This function will tell Skeltrack to focus on the user closer to this point, thus allowing to focus on a user in real time by constantly changing this point to, for example, the user’s head position.
So, even if there is no multi-user support, the current API makes it easy to just run other instances of Skeltrack and try to pick users from other points in the scene.
It should also be easier to use Skeltrack for a typical installation where there is a user controlling something in a public space while other people are passing or standing by.
Contribute
We will keep betting on this great library.
If you wanna help us, read the docs, check out Skeltrack’s GitHub and send us patches or open issues.
I have read the docs (but did not read the code), and they raise a lot of questions. Please fix this.
As a minimum, the overview section should not only describe the purpose of the library, but also the expected interaction with the outside world. For example, what is a depth image? Where do I usually get one from? What is the expected performance on typical hardware? How does it scale with the image size? Does the library use multiple cores for its internal work?
Then, the API section should be more detailed, too. What is the expected format of the depth image (right now it is just an opaque array of guint16), pixel order and byte order within the pixel? What is the correct view angle? What is the lifetime of the buffer – i.e. does skeltrack_skeleton_track_joints() copy the buffer before returning, or is this buffer accessed from the other thread until skeltrack_skeleton_track_joints_finish()? What is smoothing of joints – is it related to some kind of filtering over frame numbers? How to deal with variable frame rate (e.g. missed frames)?
Hi Alexander,
I think the overview can give a general idea of what the library does by: “Its main purpose is to offer an easy way to track and identify a number of joints of the human skeleton from depth images.”
I agree that the depth images part can be better explained but if one is reading the developer documentation, it is kind of reasonable to assume that those developers know a bit the context of what those docs were written. The depth image refers to the depth buffer given by a depth camera, the Kinect is the most popular version of a depth camera.
The usage of multiple cores is something rather not important to talk about in an overview.
The format of the depth image, as the docs state, is an array of guint16 elements. Each element represents a value of depth.
About the view angle, the depth camera should be facing the user and it has some tolerance to different angle but we do not have well defined limits to those.
The smoothing of the joints is better explained in my previous post about Skeltrack.
For the remaining questions, please understand that there is a very limited number of people working on this library and obviously we give priority to the code but we’ll try to improve the docs.
Also, if you want to contribute to Skeltrack and write documentation it would be wonderful.
Wake me when there is non-proprietary hardware.
Thanks for the answers, but you have misunderstood some of the questions.
Do I assume correctly that the width and height parameters of skeltrack_skeleton_track_joints() are in pixels?
About the view angle: it is not about where the camera is directed, but whether it is wide-angle or narrow-angle. This affects the conversion from pixels to real-world 3d coordinates. E.g., the Kinect camera is documented to have the 640×480 resolution and an angular field of view of 57ยฐ (i.e. 1 radian) horizontally and 43ยฐ vertically. Thus, if a (340, 240) pixel (where (0,0) is the corner of the field of view) reports the depth value of 1.8 m, then it corresonds to real-world x-coordinate of 57.7 mm relative to the center axis. You seem to return such real-world coordinates at least as a part of struct SkeltrackJoint.
Now suppose that some other firm makes another depth camera, also with 640×480 pixels, but the viewing angle of only 0.5 radians (and thus with more pixels per radian). Then, the calculation that is valid for Kinect becomes invalid for this camera. I really need to know the number of pixels per radian in order for the math to be correct, but you only provide the “dimension-reduction” property that looks relevant. So it looks like your library has an assumption somewhere that is valid for Kinect only. In other words, this property makes a Kinect camera special, in the sense that for all other cameras I would have to specify this property in order to explain the difference. Wouldn’t specifying the number of pixels per radian directly (as in “1280 pixels per radian” for the imaginary third-party camera), as “pixels-per-radian”, instead of this factor (as in “dimension-reduction is 8” if I guessed correctly), be more fair? And this covers the original use case of the scaled depth images, too.
As for the pixel order – is it left-to-right for pixels within a line, and top-to-bottom for lines within the whole image, with no padding between the lines? This is necessary to specify, as not all computer images are in this format.
About the proprietary hardware. It’d be wonderful to have Open Source hardware but this is what we got now and it’s better to have Free Software solutions for that proprietary hardware than having it all proprietary.
Besides, your comment style would work better if I believed you didn’t write that from a computer with proprietary hardware or drive cars full of proprietary parts, etc. ๐
I understand your concerns. Still, we have used the Kinect for our tests and we tried to stay away from device specifics as much as we could and I think the result is very good. Doing something widely compatible would be overkill at this point when most users of the library will use the Kinect and those who don’t can always contact us and we’ll try to help.
We have also recently bought an Asus Xtion Pro to have an alternative to the Kinect but it is very similar in what comes to the depth camera, so it might not need any changes.
At least a user of a different camera (one more close to scientific research) contacted me about Skeltrack and didn’t went so deep into specifics, he was happy with just having to pass a buffer and its dimensions.
As for the pixel order, it uses the common notion of left-right top-bottom.