FIMA - FCAI workshop: AI for Mobile Work Machines in November 2021

FCAI & FIMA seminar brought together a lineup of some of the most significant representatives of the Finnish AI industry. The event was held in UKK Institute on November 24th,2021. The themes touched upon AI and perception, safety, control and decision support.

KalmarOne aims at safety and eco-efficiency
The event was opened by Pekka Yli-Paunu from Kalmar Cargotec. Yli-Paunu presented ‘AI for Smart Ports’, a field where AI affects global logistics and port automation. AI converts data with powerful algorithms to replicate humans. It already automates machines, vessels, terminal operating systems, and many other functions. KalmarOne provides solutions to increase cargo and traffic, to optimise work hours, avoid error, and to develop the supply chain. Kalmar optimises safety, vessel route, turnaround time, container dwell time, and creates forecasts and digital twins. 

https://www.cargotec.com/en/kalmar/  

Kalmar One

Kalmar Ports

Play video on YouTube (opens in new tab)

To simulate, or not to simulate?

Ville Kyrki FCAI: Simulators can be used to leverage advances in learning controllers. Kyrki spoke about using simulators that learn the controllers for physical systems. He first discussed the nature of reinforcement learning, which is a form of trial and error.  The system is based on exploration and usually needs a lot of trials. Simulator systems are a good tool to use, because data is costly and physical systems need time to operate. The process also wears out the equipment. The simulation can be taken to action in a physical world. The simulation can do the heavy training in order to enable lighter training in the physical world.

However, there is a reality gap that causes potentially dangerous behaviors to emerge. This is the problem, but how to solve it? There are two ways: firstly, domain randomisation and secondly, domain adaptation. These variables can partially bridge the gap. Domain randomisation increases the robustness of the models and increases its robustness against distraction and calibration errors. Through domain adaptation we can use a minimal number of real world trials and adapt the controller. To implicitly identify unknown model parameter we don’t need to know an analytical model but we can use only the simulator.

Which came first, data or hardware?
Simo Särkkä’s (Aalto University) topic was ‘Data-driven machine learning’. Data is not everything. Classical machine learning learns from examples. Neural networks fit non-linear regressor parameters to the data. Support vector machines find optimal separating planes on data. Gaussian processes predict outputs by mimicking training data. Deep learning models are more complicated than this.

By building a transistor and generating data from it, and teaching an algorithm to mimic it, we can invent a transistor using machine learning. But a better way is to model the known physics and identify the unknown connections. It is possible to use machine learning to invent how to connect the parts, and only then build the transistor. Data is not all we know. Also humans use a lot of reasoning in the process. Or we can allow machine learning to contribute to the reasoning.

Endless miracles generated on the Socratean wisdom of ignorance principle

Arno Solin of Aalto University started his presentation by showing an array of fantastic examples of various AI generators, such as StyleGAN, a style-based generator architecture by Karras et al., a Japanese style ukiyo-e art generator, and DALL·E transformer, which connects text clues to create images.

However, Solin sees risks and dangers in this kind of technology. These models are data-hungry and require computing power. The models are hard or near impossible to interpret. They may suggest interpretations at 99.6% confidence without still being sensitive to details. Questionable robustness and trustworthiness may result in poor results. Learning should follow probabilistic principles. But neural network models have biases by design and they are hard to pinpoint. These problems remain poorly understood. Uncertain, limited data cannot be the ground for generalisation.

How could we encode meaningful prior knowledge into neural networks? What we should try is to account for distribution shift and existing theory, and generalise what has been learned. The technology should find out what it doesn’t know. Solin’s own papers have been on display at NeurIPS conference in 2020 an now in 2021.

Solin has worked on cases such as Spectacularai.com, and the Realant Quadruped. Here are some links to the AI cases Solin referred to:

2021: Meronen, L., Trapp, M., Solin, A. Periodic Activation Functions Induce Stationarity
Periodic Activation Functions Induce Stationarity

2021: HybVIO: Pushing the Limits of Real-time Visual-inertial Odometry

2021: DALL·E DALL·E

2020: Electric Dreams of Ukiyo: A Series of Japanese Artworks Created by an Artificial Intelligence

2020: Boney et al: RealAnt: An Open-Source Low-Cost Quadruped for Research in Real-World Reinforcement Learning

2018: PIVO: Probabilistic Inertial-visual Odometry for Occlusion-robust Navigation

Buzz is the word

Petri Aaltonen of Epec focused the audience’s attention on a fly. Every seemingly simple fly is worth a moment’s consideration. It has a brain with 100,000 neurons. It weighs a milligram, consumes a milliwatt of power, can see, fly, and navigate, also it can feed and reproduce. That is a long list of admirably efficient attributes and qualities. If only that could be scaled up.

Meanwhile we humans get stuck in traffic jams. Waiting in bottlenecks costs commuters and society enormous amounts of extra money. In order to profit from this down time, we should be able to work during our commutes. Epec attempts to solve modern problems by developing control units and engineering services that are applicable to mining, forestry, automotive, and many other industries.

Automated driving in real traffic, its societal impact and perceived dangers 
Sami Koskinen of VTT focused on autonomous driving. Automated driving DNNs (deep neural networks) are usually used in environmental perception.  In the method first presented by Koskinen, the autonomous vehicle recognises lanes and boxed obstacles.

GPU-like parallel processing units are still new. Koskinen mentions Comma.ai startup as an example of advanced neural networks that enable automated driving. VTT research vehicles use Lidar and YOLO4 overlay on front camera. Semantic segmentation BiSeNet v2 is a light algorithm. It looks promising but hazy. Additionally, traffic light detection, lane detection can be added to the setup.

Data fusion deals with the question of how to combine tools. Lidars will suffer sub –10 degree temperatures. Can a zoom lens bring more range? It is difficult to combine with lidar.

What is difficult for AI? As DNNs copy humans, problems remain. It is enough to recognise the animal by 95% accuracy. The vehicle will stop at a bump until either a human takes the handle or the bump moves. Robotic safety is slow, and it follows large safety margins. How large should the safety margin be, then?  The law requires special attention near children and drunken persons. Humans are still better at understanding weather and friction. The machine goes through redundant processes.

Safety validation is very much needed. AI doesn’t solve everything. The ethical questions raise concern. If humans killed 222 people in traffic in 2020, will it be okay for robots to kill only 150? Certain fatalities such as suicides, recklessness, DUI cannot be avoided with any technology. So what eventually is the ethical requirement or guideline for automated vehicle systems? Let’s keep in mind that automated trains work relatively well and have proved to be extremely safe.

Cost-efficient cranes equipped with cameras looking down
Laura Ruotsalainen of University of Helsinki discussed a project AI case for industrial vision. Ruotsalainen’s FCAI project takes place in 2020-2022 and it is done with a donation from Konecranes in an automated indoor hall. The hall logistics are supervised and monitored with cameras attached to the crane.

While lidars are too expensive, and sustainability requires low power consumption, the automation projectsettles for monocular cameras with only one lens. The main challenge in this type of camera choice is that it shows no perception of depth.Deep learning is developing to learn depth by visual cues such as horizon, vanishing points, vertical location.

In deep learning and SLAM, structure from motion SfM, changes in the images may be traced to changes in camera location. For example inverse rotation originates the source and the target imagery runs through DepthNet algorithm.

Alongside the process, Ruotsalainen has supervised two theses by M. Leinonen: Monocular 3D Object Detection And Tracking in Industrial Settings, 2021, and N. Joswig: Evaluation of Deep Learning-based SLAM in Industry Vision

Ruotsalainen has contributed in Joswig et al., 2021, ‘Improved deep-depth estimation for environments with sparse visual cues’, paper on improving computer vision based monocular depth estimation. It will be published next year.

Play video on YouTube (opens in new tab)

PIVO – Probabilistic Inertial Visual Odometry
Juho Kannala’s presentation was based on a paper about PIVO – Probabilistic Inertial Visual Odometry. What does this mean? Visual-inertial Odometry combines visual and inertial data for odometry, the use of motion sensors to determine the robot’s change in position, in relation to a known position. This happens by tracking motion in real time, and localizing the device with a model. It is needed to enable augmented reality, for example with autonomous vehicles.

How much data is needed and where can we get it? Smartphone data gives gyroscope, camera, and accelerometer data. Also GPS, wifi, microphone etc may be available and useful. Sensor fusion in smartphones allows us to infer the true scale. We can observe acceleration and angular velocity in the phone.

Mobiles have a small field of view but they enable multiple applications and use cases, such as the following examples:

SenseTime and Chengdu IFS co-launched AR navigation service powered by the SenseMARS AR platform

Visual Inertial Simultaneous Localization and Mapping (VISLAM) Introduction

CES 2021 – LiDAR Technology with Intel RealSense

Open-source SLAM with Intel RealSense depth cameras

Intel Realsense D435i Depth Camera – Mapping  & Localization Demo>

Play video on YouTube (opens in new tab)

Play video on YouTube (opens in new tab)

Source: live presentations at AI For Mobile Work Machines