Sensing Position

Certain applications require that IoT devices are aware of their position in space. Many different technologies exist to this end, each with their own advantages and downsides. The most basic position sensors are limited to simple present/absent information, whereas more advanced sensors can return centimeter-level positioning information. As always, there is a trade-off between range (over what distance can you sense position), resolution (how finely can you measure position), and power consumption.

Satellite Positioning

One of the most well-known positioning systems is the Global Positioning System (GPS). The system uses a constellation of 32 satellites that work together to provide position information to any GPS receiver on the ground. GPS satellites carries an ultra-precise atomic clock, which is synchronized across the system and to stations on the ground. Because of this, the satellites know their own position to a very high degree of accuracy. Each satellite continually broadcasts its location and local time to the ground. GPS receivers use the information of four (or more) satellites to determine their own position and time. Data transfer is strictly unidirectional from satellite to receiver, the satellites have no idea where each receiver is. The more satellites that are “in view” of the receiver, the more accurate position data will be. GPS positioning accuracy is dependent on the type of receiver and the environment conditions, though sub-5m accuracy at 10Hz update rate is representative. As a side-benefit of the way the system works, GPS receivers also return very accurate local time information. This can be useful in many applications.

The GPS system is owned and operated by the United States, though anyway can freely make use of the system. In combat situations, the US can choose to selectively disable GPS service. For strategic reasons, several nations are working on their own satellite navigation systems. The GLONASS system, developed by Russia, achieved 100% global coverage in 2010. Certain sat-nav receiver modules can decode both GPS as well as GLONASS signals, leading to better coverage, faster positioning, and higher accuracy. Another alternative is the Galileo system, which is currently being constructed by the European Union. Galileo is expected to reach full operational capacity in 2019.

Dead reckoning

Dead reckoning is the process of using data from motion sensors to estimate position. A simple example is to use speed to estimate the distance travelled, for instance by multiplying the speed by the elapsed time. In practice, this is done by continuously integrating velocity data. Dead reckoning techniques are subject to cumulative errors: the longer dead reckoning is used, the further the estimated position will drift from the actual position. For this reason, dead reckoning is frequently used to enhance positioning information. A common scenario is to use GPS data in combination with accelerometer/gyroscope data. The GPS receiver provides slow, low-accuracy absolute positioning information, whereas accelerometer/gyroscope sensors provide fast, high-accuracy relative positioning. With sensor fusion techniques, both can be combined in order to determine position to a high degree of accuracy. A Kalman filter is often used for this application.

Near-field communication and RFID

Radio-frequency identification technology relies on short-range radio waves to detect the presence of small objects called tags. Depending on the constraints of the application, NFC / RFID can be used for a variety of rough positioning schemes. It can be used to detect the presence or absence of tagged objects on/near the device. Using longer-range active tags, it can also be used for room-level present/absent positioning. Finally, the technology can also be used to link digital data to a physical object.

RFID tags contain a coil to pick up the radio waves emitted from the RFID reader, as well as a control chip that handles communication. The most common RFID tags do not contain a battery, instead they draw a small amount of power directly from the radio waves. Because these passive RFID tags do not contain a battery, they can be made very small and very cheap. Volume pricing of passive tags is around 0.15 EUR. The maximum read distance for passive tags is typically in the centimeter-range, depending on various factors (e.g. frequency and reader power). For certain applications, active tags are used instead. Active tags contain a battery, allowing a read range of up to 100m. The most basic RFID tags contain a single, non-reprogrammable number which can be used to uniquely identify an object. More advanced tags offer small amounts of reprogrammable memory, typically between 100 bytes and a few kilobyte. The most advanced tags function as wireless sensors by integrating temperature or pressure sensing circuitry. Most smartphones contain an NFC / RFID reader. Notable exception is the iPhone, which does have an NFC reader, but the NFC API cannot be used by third-party developers.

Ultra-wideband positioning

Ultra-wideband positioning is a technique that uses radio waves to accurately measure the distances. In radio positioning systems, more bandwidth (larger range of frequencies used) leads to better distance measurement accuracy. However, electromagnetic emission regulations place a strict upper limit on UWB transmitter power in order to avoid interference with other systems. Ultra-wideband positioning systems work by accumulating multiple low-power UWB transmissions into a single distance measurement. With a single transmitter and receiver pair, only distance information can be extracted. By using multiple stationary transmitter beacons, the system can achieve indoor positioning that is accurate to within 10cm. The picture to the right shows Pozyx, an Arduino-compatible shield for UWB positioning. DecaWave offers UWB positioning ICs and modules that can be integrated into commercial products.

Receiver signal strength indication

Most radio communication systems offer received signal strength indication (RSSI) information. Simply put, RSSI offers a numerical measure for radio reception, and there is a correlation between distance and radio reception. By combining RSSI information of multiple base stations, it is possible to estimate the position of the receiver. This technique can be used in combination with many different communication protocols, including ZigBee, Bluetooth, and WiFi. However, RSSI positioning is less accurate than other techniques, accuracy decreases as distance increases. For this reason, RSSI is frequently combined with different positioning techniques. For example, smartphones often use a technique wherein WiFi and cellular network RSSI information is used to improve the GPS startup time. RSSI is also frequently combined with time-of-flight measurements, as these techniques have complementary properties. Time-of-flight techniques measure the time it takes to send a packet back and forth, and use that information to estimate distance. RSSI is more accurate at close range, whereas time-of-flight offers better accuracy at long range.

Computer vision

Various computer vision techniques can be used to estimate the position of objects in a scene. Computer vision techniques can offer excellent accuracy and resolution. However, the algorithms are computationally expensive and are sensitive to lighting conditions. Positioning through vision techniques is a very broad topic. The list below gives an overview of some of the options: background subtraction, fiducials, stereo vision and Lighthouse.

Background subtraction

As the name suggests, background subtraction is the process of separating out foreground objects from the background in a sequence of video frames. This technique is widely used for detecting moving objects from (static) cameras. The fundamental logic behind this technique is frame difference method consisting of following three steps:

1. Estimation of the background for time t
2. Subtraction of the estimated background from the input frame
3. Applying a threshold to the absolute difference to get the forground mask

Background substraction is challenging in such a way that the implementation has to be robust against changes in illumination and the avoidance of detecting non-stationary background objects (rain, snow, shadow casts, etc).

Applications of background subtraction: human motion detection, counting vehicles in traffic, visual tracking of a tennis ball.


A fiducial marker or fiducial is an object placed in the field of view of an imaging system which appears in the image produced, for use as a point of reference or a measure. It may be either something placed into or on the imaging subject, or a mark or set of marks in the reticle of an optical instrument. In applications of for example augmented reality, fiducials help resolve several problems of integration between the real world view and the synthetic images that augment it. Fiducials of known pattern and size can serve as real world anchors of location, orientation and scale. They can establish the identity of the scene or objects within the scene.

Application: APRIL tags

AprilTag is a visual fiducial system, useful for a wide variety of tasks including augmented reality, robotics, and camera calibration. Targets can be created from an ordinary printer, and the AprilTag detection software computes the precise 3D position, orientation, and identity of the tags relative to the camera. Real-time performance can be achieved even on cell-phone grade processors.

Application: CGI – Dot pattern

The technique that utilizes white dots on a suit or face in order to detect movements is called motion capture if we’re talking body movements or performance capture if capturing facial expressions. White dots are often called “markers” or “tracking markers”. The computer tracks the position of markers in space and attaches a skeleton of a digital character to those points. This results in an accurate transferring of the real body motion into digital world. The markers have to be in contrast with the surrounding fabric so they are easier to track.

Stereo vision

Computer stereo vision is the extraction of 3D information from digital images, such as obtained by a CCD camera. By comparing information about a scene from two vantage points, 3D information can be extracted by examination of the relative positions of objects in the two panels.

In traditional stereo vision, two cameras, displaced horizontally from one another are used to obtain two differing views on a scene, in a manner similar to human binocular vision. By comparing these two images, the relative depth information can be obtained in the form of a disparity map, which encodes the difference in horizontal coordinates of corresponding image points. The values in this disparity map are inversely proportional to the scene depth at the corresponding pixel location. The 3-D information can be obtained from a pair of images, also known as a stereo pair, by estimating the relative depth of points in the scene. These estimates are represented in a stereo disparity map, which is constructed by matching corresponding points in the stereo pair.

Applications of stereo vision: Kinect, high resolution 3D images


The main idea behind the Valve’s tracking techology is pretty simple. By flooding a room with non-visible light, Lighthouse functions as a reference point for any positional tracking device (like a VR headset or a game controller) to figure out where it is in real 3D space.

A base station is the little box that’s the foundation of the Lighthouse Tracking System. It uses alternating sweeps of horizontal and vertical lasers to pass over the HTC Vive headset and SteamVR controllers which are covered in small sensors that detect that lasers as they go by. The system cleverly integrates all of this data to determine the rotation of the devices and their position in 3D space. High speed on-board IMUs in each device are used to aid in tracking.