## Publications

(for the most up to date list, see my CV; also, see my Google Scholar page)

It has Potential: Gradient-Driven Denoisers for Convergent Solutions to Inverse Problems

R. Cohen, Y. Blau, D. Freedman, and E. Rivlin

Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2021

In recent years there has been increasing interest in leveraging denoisers for solving general inverse problems. Two leading frameworks are regularization-by-denoising (RED) and plug-and-play priors (PnP) which incorporate explicit likelihood functions with priors induced by denoising algorithms. RED and PnP have shown state-of-the-art performance in diverse imaging tasks when powerful denoisers are used, such as convolutional neural networks (CNNs). However, the study of their convergence remains an active line of research. Recent works derive the convergence of RED and PnP methods by treating CNN denoisers as approximations for maximum a posteriori (MAP) or minimum mean square error (MMSE)estimators. Yet, state-of-the-art denoisers cannot be interpreted as either MAP or MMSE estimators, since they typically do not exhibit symmetric Jacobians. Furthermore, obtaining stable inverse algorithms often requires controlling the Lipschitz constant of CNN denoisers during training. Precisely enforcing this constraint is impractical, hence, convergence cannot be completely guaranteed.In this work, we introduce image denoisers derived as the gradients of smooth scalar-valued deep neural networks, acting as potentials. This ensures two things:(1) the proposed denoisers display symmetric Jacobians, allowing for MAP andMMSE estimators interpretation; (2) the denoisers may be integrated into RED and PnP schemes with backtracking step size, removing the need for enforcing their Lipschitz constant. To show the latter, we develop a simple inversion method that utilizes the proposed denoisers. We theoretically establish its convergence to stationary points of an underlying objective function consisting of the learned potentials. We numerically validate our method through various imaging experiments, showing improved results compared to standard RED and PnP methods, and with additional provable stability.

Unsupervised 3D Shape Coverage Estimation with Applications to Colonoscopy

Y. Blau, D. Freedman, V. Dashinsky, R. Goldenberg, and E. Rivlin

Proceedings of the Workshop on Computer Vision for Automated Medical Diagnosis in conjunction with ICCV, 2021

Reconstructing shapes from partial and noisy 3D data is a well-studied problem, which in recent years has been

dominated by data-driven techniques. Yet in a low data regime, these techniques struggle to provide fine and accurate reconstructions. Here we focus on the relaxed problem of estimating shape coverage, i.e. asking “how much of the shape was seen?” rather than “what was the original shape?” We propose a method for unsupervised shape coverage estimation, and validate that this task can be performed accurately in a low data regime. Shape coverage estimation can provide valuable insights which pave the way for innovative applications, as we demonstrate for the case of deficient coverage detection in colonoscopy screenings.

Detection of Elusive Polyps via a Large-Scale Artificial Intelligence System.

D.M. Livovsky, D. Veikherman, T. Golany, A. Aides, V. Dashinski, N. Rabani, D. Ben Shimol, Y. Blau, L. Katzir, I. Shimshoni, Y. Liu, O. Segol, E. Goldin, G. Corrado, J. Lachter, Y. Matias, E. Rivlin, and D. Freedman

Gastrointestinal Endoscopy, 2021

Background and Aims: Colorectal cancer is a leading cause of death. Colonoscopy is the criterion standard for detection and removal of precancerous lesions and has been shown to reduce mortality. The polyp miss rate during colonoscopies is 22% to 28%. DEEP DEtection of Elusive Polyps (DEEP2) is a new polyp detection system based on deep learning, which alerts the operator in real-time to the presence and location of polyps. The primary outcome was the performance of DEEP2 on the detection of elusive polyps.

Methods: The DEEP2 system was trained on 3,611 hours of colonoscopy videos derived from 2 sources, and was validated on a set comprising 1,393 hours, from a third unrelated source. The ground truth labeling was provided by offline gastroenterologist annotators, who were able to watch the video in slow-motion and pause/rewind as required. To assess the applicability, stability, user experience and in order to obtain some preliminary data on performance in a real-life scenario, a preliminary prospective clinical validation study was performed, comprising 100 procedures (clinicaltrial.gov ID: NCT04693078).

Results: DEEP2 achieved a sensitivity of 97.1% at 4.6 false alarms per video for all polyps, 88.5% and 84.9% for polyps that are in the field of view for less than 5 and 2 seconds, respectively. DEEP2 was able to detect polyps, not seen by live real-time endoscopists or offline annotators in an average of 0.22 polyps per sequence. In the clinical validation study the system detected an average of 0.89 additional polyps per procedure. No adverse events occurred.

Conclusions: DEEP2 has a high sensitivity for polyp detection and was effective in increasing the detection of polyps both in colonoscopy videos and in real procedures with a low number of false alarms.

12-Lead ECG Reconstruction via Koopman

Operators

T. Golany, D. Freedman, S. Minha, and K. Radinsky

Proceedings of the International Conference on Machine Learning (ICML), 2021

32% of all global deaths in the world are caused by cardiovascular diseases. Early detection, especially for patients with ischemia or cardiac arrhythmia, is crucial. To reduce the time between symptoms onset and treatment, wearable ECG sensors were developed to allow for the recording of the full 12-lead ECG signal at home. However, if even a single lead is not correctly positioned on the body that lead becomes corrupted, making automatic diagnosis on the basis of the full signal impossible. In this work, we present a methodology to reconstruct missing or noisy leads using the theory of Koopman Operators. Given a dataset consisting of full 12-lead ECGs, we learn a dynamical system describing the evolution of the 12 individual signals together in time. The Koopman theory indicates that there exists a high-dimensional embedding space in which the operator which propagates from one time instant to the next is linear. We therefore learn both the mapping to this embedding space, as well as the corresponding linear operator. Armed with this representation, we are able to impute missing leads by solving a least squares system in the embedding space, which can be achieved efficiently due to the sparse structure of the system. We perform an empirical evaluation using 12-lead ECG signals from thousands of patients, and show that we are able to reconstruct the signals in such way that enables accurate clinical diagnosis.

Learning Optimal Wavefront Shaping for Multi-Channel Imaging

E. Nehme, B. Ferdman, L. Weiss, T. Naor, D. Freedman, T. Michaeli, and Y. Shechtman

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2021

Fast acquisition of depth information is crucial for accurate 3D tracking of moving objects. Snapshot depth sensing can be achieved by wavefront coding, in which the point-spread function (PSF) is engineered to vary distinctively with scene depth by altering the detection optics. In low-light applications, such as 3D localization microscopy, the prevailing approach is to condense signal photons into a single imaging channel with phase-only wavefront modulation to achieve a high pixel-wise signal to noise ratio. Here we show that this paradigm is generally suboptimal and can be significantly improved upon by employing multi-channel wavefront coding, even in low-light applications. We demonstrate our multi-channel optimization scheme on 3D localization microscopy in densely labelled live cells where detectability is limited by overlap of modulated PSFs. At extreme densities, we show that a split-signal system, with end-to-end learned phase masks, doubles the detection rate and reaches improved precision compared to the current state-of-the-art, single-channel design. We implement our method using a bifurcated optical system, experimentally validating our approach by snapshot volumetric imaging and 3D tracking of fluorescently labelled subcellular elements in dense environments.

Inverse Design of Quantum Holograms in Three-Dimensional Nonlinear Photonic Crystals

E. Rozenberg, A. Karnieli, O. Yesharim, S. Trajtenberg-Mills, D. Freedman, A. Bronstein, and A. Arie

Proceedings of the Conference on Lasers and Electro-Optics (CLEO) - Fundamental Science, 2021

We introduce a systematic approach for designing 3D nonlinear photonic crystals and pump beams for generating desired quantum correlations between structured photon-pairs. Our model is fully differentiable, allowing accurate and efficient learning and discovery of novel designs.

Learning to Localize Objects Using Limited Annotation, with Applications to Thoracic Diseases

E. Rozenberg, D. Freedman, and A. Bronstein

IEEE Access, 2021

Motivation:The localization of objects in images is a longstanding objective within the field of image processing. Most current techniques are based on machine learning approaches, which typically require careful annotation of training samples in the form of expensive bounding box labels. The need for such large-scale annotation has only been exacerbated by the widespread adoption of deep learning techniques within the image processing community: deep learning is notoriously data-hungry. Method:In this work, we attack this problem directly by providing a new method for learning to localize objects with limited annotation: most training images can simply be annotated with their whole image labels (and no bounding box), with only a small fraction marked with bounding boxes. The training is driven by a novel loss function, which is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning. Care is taken to ensure that the loss is numerically well-posed. Additionally, we propose a neural network architecture which accounts for both patch dependence, through the use of Conditional RandomField layers, and shift-invariance, through the inclusion of anti-aliasing filters.Results:We demonstrate our method on the task of localizing thoracic diseases in chest X-ray images, achieving state-of-the-art performance on the ChestX-ray14 dataset. We further show that with a modicum of additional effort our technique can be extended from object localization to object detection, attaining high quality results on the Kaggle RSNA Pneumonia Detection Challenge.Conclusion:The technique presented in this paper has the potential to enable high accuracy localization in regimes in which annotated data is either scarce or expensive to acquire. Future work will focus on applying the ideas presented in this paper to the realm of semantic segmentation.

Deep Energy: Task Driven Training of Deep Neural Networks

A. Golts, D. Freedman, and M. Elad

IEEE Journal of Selected Topics in Signal Processing, 2021

The success of deep learning has been due in no small part to the availability of large annotated datasets. Thus, a major bottleneck in the current learning pipeline is the human annotation of data, which can be quite time consuming. For a given problem setting, we aim to circumvent this issue via the use of an externally specified energy function appropriate for that setting; we call this the Deep Energy approach. We show how to train a network on an entirely unlabelled dataset using such an energy function, and apply this general technique to learn CNNs for two specific tasks: seeded segmentation and image matting. Once the network parameters have been learned, we obtain a high-quality solution in a fast feed-forward style, without the need to repeatedly optimize the energy function for each image.

ECG ODE-GAN: Learning Ordinary Differential

Equations of ECG Dynamics via Generative Adversarial Learning

T. Golany, D. Freedman, and K. Radinsky

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021

Understanding the dynamics of complex biological and physiological systems has been explored for many years in theform of physically-based mathematical simulators. The behavior of a physical system is often described via ordinarydifferential equations (ODE), referred to as the dynamics. Inthe standard case, the dynamics are derived from purely physical considerations. By contrast, in this work we study howthe dynamics can be learned by a generative adversarial network which combines both physical and data considerations.As a use case, we focus on the dynamics of the heart signal electrocardiogram (ECG). We begin by introducing a newGAN framework, dubbed ODE-GAN, in which the generatorlearns the dynamics of a physical system in the form of anordinary differential equation. Specifically, the generator network receives as input a value at a specific time step, andproduces the derivative of the system at that time step. Thus,the ODE-GAN learns purely data-driven dynamics. We thenshow how to incorporate physical considerations into ODE-GAN. We achieve this through the introduction of an additional input to the ODE-GAN generator: physical parameters,which partially characterize the signal of interest. As we focus on ECG signals, we refer to this new framework as ECG-ODE-GAN. We perform an empirical evaluation and showthat generating ECG heartbeats from our learned dynamicsimproves ECG heartbeat classification.

3D Microscopy in Extreme Conditions: In-Flow or at High Density

Y. Shechtman, E. Nehme, B. Ferdman, O. Adir, R. Gordon-Soffer, R. Orange, T. Naor, Y. Shalev-Ezra, S. Goldberg, O. Alalouf, D. Freedman, A. Schroeder, T. Michaeli, and L. Weiss

SPIE Photonics West BiOS, 2021

Attaining three-dimensional data at high throughput is a grand challenge in microscopy. I will discuss two recent contributions to 3D microscopy utilizing point-spread-function (PSF) engineering: 1.a new method that extends the capabilities of flow-based imaging to 3D co-localization microscopy, and 2. efficient and fast localization of dense molecules in volumetric single-molecule-localization microscopy (SMLM) using deep learning, termed DeepSTORM3D. Together, these methods enable the study of sub-cellular biological processes at unprecedented timescales and throughputs.

High-Frequency Full-waveform Inversion with Deep Learning for Seismic and Medical Ultrasound Imaging

M. Feigin, Y. Makovsky, D. Freedman, and B.W. Anthony

Society of Exploration Geophysicists Annual Meeting (SEG), 2020

Sound speed inversion is a key problem in seismic imaging for the purpose of high-resolution imaging and ground structure analysis. In the medical ultrasound imaging domain, longitudinal sound speed recovery, as opposed to shear wave velocity methods, has seen little use, mostly due to the resources required. However both research and new hardware are showing interesting applications and implications (Nebojsa Duric and Littrup 2018) Classic full waveform inversion (FWI) and Travel time tomography methods require large time and computational resources, and often, human in the loop interaction. This makes them inapplicable for most medical imaging applications. FWI also generally requires a good initial condition and low-frequency content in the signal for stable convergence. In this work we analyze the applicability of a deep-learning based approach for high frequency FWI. We present results on single-shot recovery using simulated data employing characteristic parameters for both medical ultrasound and seismic datasets. Results show great potential for interactive frame rate approximate solution.

Simulator-Based Generative Adversarial Networks for ECG Synthesis to Improve Deep ECG Classification

T. Golany, D. Freedman, and K. Radinsky

Proceedings of the International Conference on Machine Learning (ICML), 2020

Generating training examples for supervised tasks is a long sought after goal in AI. We study the problem of heart signal electrocardiogram (ECG) synthesis for improved heartbeat classification. ECG synthesis is challenging: the generation of training examples for such biological-physiological systems is not straightforward, due to their dynamic nature in which the various parts of the system interact in complex ways. However, an understanding of these dynamics has been developed for years in the form of mathematical process simulators. We study how to incorporate this knowledge into the generative process by leveraging a biological simulator for the task of ECG classification. Specifically, we use a system of ordinary differential equations representing heart dynamics, and incorporate this ODE system into the optimization process of a generative adversarial network to create biologically plausible ECG training examples. We perform empirical evaluation and show that heart simulation knowledge during the generation process improves ECG classification.

DeepSTORM3D: Dense Three Dimensional Localization Microscopy and Point Spread Function Design by Deep Learning

E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. Weiss, O. Alalouf, R. Orange, T. Michaeli, and Y. Shechtman

Nature Methods, 2020

An outstanding challenge in single-molecule localization microscopy is the accurate and precise localization of individual point emitters in three dimensions in densely labeled samples. One established approach for three-dimensional single-molecule localization is point-spread-function (PSF) engineering, in which the PSF is engineered to vary distinctively with emitter depth using additional optical elements. However, images of dense emitters, which are desirable for improving temporal resolution, pose a challenge for algorithmic localization of engineered PSFs, due to lateral overlap of the emitter PSFs. Here we train a neural network to localize multiple emitters with densely overlapping Tetrapod PSFs over a large axial range. We then use the network to design the optimal PSF for the multi-emitter case. We demonstrate our approach experimentally with super-resolution reconstructions of mitochondria and volumetric imaging of fluorescently labeled telomeres in cells. Our approach, DeepSTORM3D, enables the study of biological processes in whole cells at timescales that are rarely explored in localization microscopy.

Detecting Deficient Coverage in Colonoscopies

D. Freedman, Y. Blau, L. Katzir, A. Aides, I. Shimshoni, D. Veikherman, T. Golany, A. Gordon, G. Corrado, Y. Matias, and E. Rivlin

IEEE Transactions on Medical Imaging, 2020

Colonoscopy is the tool of choice for preventing Colorectal Cancer, by detecting and removing polyps before they become cancerous. However, colonoscopy is hampered by the fact that endoscopists routinely miss 22-28% of polyps. While some of these missed polyps appear in the endoscopist's field of view, others are missed simply because of substandard coverage of the procedure, i.e. not all of the colon is seen. This paper attempts to rectify the problem of substandard coverage in colonoscopy through the introduction of the C2D2 (Colonoscopy Coverage Deficiency via Depth) algorithm which detects deficient coverage, and can thereby alert the endoscopist to revisit a given area. More specifically, C2D2 consists of two separate algorithms: the first performs depth estimation of the colon given an ordinary RGB video stream; while the second computes coverage given these depth estimates. Rather than compute coverage for the entire colon, our algorithm computes coverage locally, on a segment-by-segment basis; C2D2 can then indicate in real-time whether a particular area of the colon has suffered from deficient coverage, and if so the endoscopist can return to that area. Our coverage algorithm is the first such algorithm to be evaluated in a large-scale way; while our depth estimation technique is the first calibration-free unsupervised method applied to colonoscopies. The C2D2 algorithm achieves state of the art results in the detection of deficient coverage. On synthetic sequences with ground truth, it is 2.4 times more accurate than human experts; while on real sequences, C2D2 achieves a 93.0% agreement with experts.

Detecting Muscle Activation Using

Ultrasound Speed of Sound Inversion with Deep Learning

M. Feigin, M. Zwecker, D. Freedman, and B. Anthony

Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2020

Functional muscle imaging is essential for diagnostics of a multitude of musculoskeletal afflictions such as degenerative muscle diseases, muscle injuries, muscle atrophy, and neurological related issues such as spasticity. However, there is currently no solution, imaging or otherwise, capable of providing a map of active muscles over a large field of view in dynamic scenarios. In this work, we look at the feasibility of longitudinal sound speed measurements to the task of dynamic muscle imaging of contraction or activation. We perform the assessment using a deep learning network applied to pre-beamformed ultrasound channel data for sound speed inversion. Preliminary results show that dynamic muscle contraction can be detected in the calf and that this contraction can be positively assigned to the operating muscles. Potential frame rates in the hundreds to thousands of frames per second are necessary to accomplish this.

Unsupervised Single Image Dehazing Using Dark Channel Prior Loss

A. Golts, D. Freedman, and M. Elad

IEEE Transactions on Image Processing, 2020

Single image dehazing is a critical stage in many modern-day autonomous vision applications. Early prior-based methods often involved a time-consuming minimization of a hand-crafted energy function. Recent learning-based approaches utilize the representational power of deep neural networks (DNNs) to learn the underlying transformation between hazy and clear images. Due to inherent limitations in collecting matching clear and hazy images, these methods resort to training on synthetic data; constructed from indoor images and corresponding depth information. This may result in a possible domain shift when treating outdoor scenes. We propose a completely unsupervised method of training via minimization of the well-known, Dark Channel Prior (DCP) energy function. Instead of feeding the network with synthetic data, we solely use real-world outdoor images and tune the network’s parameters by directly minimizing the DCP. Although our “Deep DCP” technique can be regarded as a fast approximator of DCP, it actually improves its results significantly. This suggests an additional regularization obtained via the network and learning process. Experiments show that our method performs on par with other large-scale, supervised methods.

Localization With Limited Annotation for Chest X-rays

E. Rozenberg, D. Freedman, and A. Bronstein

Proceedings of Machine Learning Research, 2019

Localization of an object within an image is a common task in medical imaging. Learning to localize or detect objects typically requires the collection of data that has been labelled with bounding boxes or similar annotations, which can be very time consuming and expensive. A technique that could perform such learning with much less annotation would, therefore, be quite valuable. We present such a technique for localization with limited annotation, in which the number of images with bounding boxes can be a small fraction of the total dataset (e.g. less than 1\%); all other images only possess a whole image label and no bounding box. We propose a novel loss function for tackling this problem; the loss is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning and is numerically well-posed. Furthermore, we propose a new architecture which accounts for both patch dependence and shift-invariance, through the inclusion of CRF layers and anti-aliasing filters, respectively. We apply our technique to the localization of thoracic diseases in chest X-ray images and demonstrate state-of-the-art localization performance on the ChestX-ray14 dataset introduced in Wang et al. (2017).

Deep Learning for Dense Single-Molecule Localization Microscopy

E. Nehme, L. Weiss, E. Hershko, D. Freedman, R. Gordon, B. Ferdman, T. Michaeli, and Y. Shechtman

Learning for Computational Imaging (LCI) Workshop, in conjunction with ICCV, 2019

In conventional microscopy, the spatial resolution of an image is bounded by Abbe’s diffraction limit, corresponding to approximately half the optical wavelength. Over the last decade, super resolution methods have revolutionized biological imaging, enabling the observation of cellular structures at the nanoscale. These include the popular localization microscopy methods, like photo-activated localization microscopy ((F)PALM) [1, 2] and stochastic optical reconstruction microscopy (STORM) [3]. However, despite the great advancement, existing localization microscopy methods are still limited in their acquisition and post-processing speeds, and in their ability to extract 3D and multicolor properties of the imaged samples.

A Deep Learning Framework for Single-Sided Sound Speed Inversion in Medical Ultrasound

M. Feigin, D. Freedman, and B.W. Anthony

IEEE Transactions on Biomedical Engineering, 2019

Ultrasound elastography is gaining traction as an accessible and useful diagnostic tool for such things as cancer detection and differentiation as well as liver and thyroid disease diagnostics. Unfortunately, state of the art acoustic radiation force techniques are limited to high end ultrasound hardware due to high power requirements, are extremely sensitive to patient and sonographer motion and generally suffer from low frame rates. Researchers have shown that pressure wave velocity possesses similar diagnostic abilities to shear wave velocity. Using pressure waves removes the need for generating shear waves, in turn, enabling elasticity based diagnostic techniques on portable and low cost devices. However, current travel time tomography and full waveform inversion techniques for recovering pressure wave velocities, require a full circumferential field of view. Focus based techniques on the other hand provide only localized measurements, and are sensitive to the intermediate medium. In this paper, we present a single sided sound speed inversion solution using a fully convolutional deep neural network. We show that it is possible to invert for longitudinal sound speed in soft tissue at real time frame rates. For the computation, analysis is performed on channel data information from three diagonal plane waves. This is the first step towards a full waveform solver using a deep learning framework for the elastic and viscoelastic inverse problem.

SOSELETO: A Unified Approach to Transfer Learning and Training with Noisy Labels

O. Litany and D. Freedman

Learning from Limited Labeled Data (LLD) Workshop, in conjunction with ICLR, 2019

Best Paper Award

We present SOSELETO (SOurce SELEction for Target Optimization), a new method for exploiting a source dataset to solve a classification problem on a target dataset. SOSELETO is based on the following simple intuition: some source examples are more informative than others for the target problem. To capture this intuition, source samples are each given weights; these weights are solved for jointly with the source and target classification problems via a bilevel optimization scheme. The target therefore gets to choose the source samples which are most informative for its own classification task. Furthermore, the bilevel nature of the optimization acts as a kind of regularization on the target, mitigating overfitting. SOSELETO may be applied to both classic transfer learning, as well as the problem of training on datasets with noisy labels; we show state of the art results on both of these problems.

DeepSTORM 3D: Deep Learning for Dense 3D Localization Microscopy

E. Nehme, D. Freedman, T. Michaeli, Y. Shechtman

Quantitative BioImaging Conference (QBI), 2019

Best Poster Award

We present two fundamental contributions to tackle the problem of high-density overlapping PSFs for accurate 3D localization. First, we employ a convolutional neural network (CNN) for 3D localization from dense fields of overlapping emitters with the Tetrapod PSF. Second, we engineer the optimal PSF for high-density 3D localization microscopy. Namely, by incorporating a physical layer in the CNN with an adjustable phase modulation, we can jointly learn the network weights and the phase mask via backpropagation. Our approach is highly flexible and can be easily adapted to any single-molecule blinking/non-blinking dataset. We validated our approach on simulations showing superior performance to existing methods. Future work will demonstrate the method on experimentally acquired data.

Toward Realistic Hands Gesture Interface: Keeping It Simple for Developers and Machines

E. Krupka, K. Karmon, N. Bloom, D. Freedman, I. Gurvich, A. Hurvitz, I. Leichter, Y. Smolin, Y. Tzairi, A. Vinnikov, and A. Bar Hillel

Proceedings of the ACM CHI Conference on Human Factors in Computing Systems (CHI), 2017

Development of a rich hand-gesture-based interface is currently a tedious process, requiring expertise in computer vision and/or machine learning. We address this problem by introducing a simple language for pose and gesture description, a set of development tools for using it, and an algorithmic pipeline that recognizes it with high accuracy. The language is based on a small set of basic propositions, obtained by applying four predicate types to the fingers and to palm center: direction, relative location, finger touching and finger folding state. This enables easy development of a gesture-based interface, using coding constructs, gesture definition files or an editing GUI. The language is recognized from 3D camera input with an algorithmic pipeline composed of multiple classification/regression stages, trained on a large annotated dataset. Our experimental results indicate that the pipeline enables successful gesture recognition with a very low computational load, thus enabling a gesture-based interface on low-end processors.

ASIST: Automatic Semantically Invariant Scene Transformation

O. Litany, T. Remez, D. Freedman, L. Shapira, A. Bronstein, and R. Gal

Computer Vision and Image Understanding, 157:284-299, 2017

We present ASIST, a technique for transforming point clouds by replacing objects with their semantically equivalent counterparts. Transformations of this kind have applications in virtual reality, repair of fused scans, and robotics. ASIST is based on a unified formulation of semantic labeling and object replacement; both result from minimizing a single objective. We present numerical tools for the efficient solution of this optimization problem. The method is experimentally assessed on new datasets of both synthetic and real point clouds, and is additionally compared to two recent works on object replacement on data from the corresponding papers.

Reality Skins: Creating Immersive and Tactile Virtual Environments

L. Shapira and D. Freedman

Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2016

Reality Skins enables mobile and large-scale virtual reality experiences, dynamically generated based on the user’s environment. A head-mounted display (HMD) coupled with a depth camera is used to scan the user’s surroundings: reconstruct geometry, infer floor plans, and detect objects and obstacles. From these elements we generate a Reality Skin, a 3D environment which replaces office or apartment walls with the corridors of a spaceship or underground tunnels, replacing chairs and desks, sofas and beds with crates and computer consoles, fungi and crumbling ancient statues. The placement of walls, furniture and objects in the Reality Skin attempts to approximate reality, such that the user can move around, and touch virtual objects with tactile feedback from real objects. Each possible reality skins world consists of objects, materials and custom scripts. Taking cues from the user’s surroundings, we create a unique environment combining these building blocks, attempting to preserve the geometry and semantics of the real world. We tackle 3D environment generation as a constraint satisfaction problem, and break it into two parts: First, we use a Markov Chain Monte-Carlo optimization, over a simple 2D polygonal model, to infer the layout of the environment (the structure of the virtual world). Then, we populate the world with various objects and characters, attempting to satisfy geometric (virtual objects should align with objects in the environment), semantic (a virtual chair aligns with a real one), physical (avoid collisions, maintain stability) and other constraints. We find a discrete set of transformations for each object satisfying unary constraints, incorporate pairwise and higher-order constraints, and optimize globally using a very recent technique based on semidefinite relaxation.

Accurate, Robust, and Flexible Real-time Hand Tracking

T. Sharp, C. Keskin, D. Robertson, J. Taylor, J. Shotton, D. Kim, C. Rhemann, I. Leichter, A. Vinnikov, Y. Wei, D. Freedman, P. Kohli, E. Krupka, A. Fitzgibbon, and S. Izadi

Proceedings of the ACM CHI Conference on Human Factors in Computing Systems (CHI), 2015

Best Paper honorable mention

We present a new real-time hand tracking system based on a single depth camera. The system can accurately reconstruct complex hand poses across a variety of subjects. It also allows for robust tracking, rapidly recovering from any temporary failures. Most uniquely, our tracker is highly flexible, dramatically improving upon previous approaches which have focused on front-facing close-range scenarios. This flexibility opens up new possibilities for human-computer interaction with examples including tracking at distances from tens of centimeters through to several meters (for controlling the TV at a distance), supporting tracking using a moving depth camera (for mobile scenarios), and arbitrary camera placements (for VR headsets). These features are achieved through a new pipeline that combines a multi-layered discriminative reinitialization strategy for per-frame pose estimation, followed by a generative model-fitting stage. We provide extensive technical details and a detailed qualitative and quantitative analysis.

Learning Fast Hand Pose Recognition

E. Krupka, A. Vinnikov, B. Klein, A. Bar-Hillel, D. Freedman, S. Stachniak, and C. Keskin

Computer Vision and Machine Learning with RGB-D Sensors (editors L. Shao, J. Han, P. Kohli, and Z. Zhang), p 267–288. Springer, 2014

Practical real time hand pose recognition requires a classifier of high accuracy, running in a few milliseconds speed. We present a novel classifier architecture, the Discriminative Ferns Ensemble (DFE), for addressing this challenge. The classifier architecture optimizes both classification speed and accuracy when a large training set is available. Speed is obtained using simple binary features and direct indexing into a set of tables, and accuracy by using a large capacity model and careful discriminative optimization. The proposed framework is applied to the problem of hand pose recognition in depth and infra-red images, using a very large training set. Both the accuracy and the classification time obtained are considerably superior to relevant competing methods, allowing one to reach accuracy targets with run times orders of magnitude faster than the competition. We show empirically that using DFE, we can significantly reduce classification time by increasing training sample size for a fixed target accuracy. Finally, Scalability to a large number of classes is tested using a synthetically generated data set of 81 classes.

SRA: Fast Removal of General Multipath for Tof Sensors

D. Freedman, Y. Smolin, E. Krupka, I. Leichter, and M. Schmidt

Proceedings of the European Conference on Computer Vision (ECCV), 2014

A major issue with Time of Flight sensors is the presence of multipath interference. We present Sparse Reflections Analysis (SRA), an algorithm for removing this interference which has two main advantages. First, it allows for very general forms of multipath, including interference with three or more paths, diffuse multipath resulting from Lambertian surfaces, and combinations thereof. SRA removes this general multipath with robust techniques based on L1 optimization. Second, due to a novel dimension reduction, we are able to produce a very fast version of SRA, which is able to run at frame rate. Experimental results on both synthetic data with ground truth, as well as real images of challenging scenes, validate the approach.

Discriminative Ferns Ensemble for Hand Pose Recognition

E. Krupka, A. Vinnikov, B. Klein, A. Bar-Hillel, D. Freedman, and S. Stachniak

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014

We present the Discriminative Ferns Ensemble (DFE) classifier for efficient visual object recognition. The classifier architecture is designed to optimize both classification speed and accuracy when a large training set is available. Speed is obtained using simple binary features and direct indexing into a set of tables, and accuracy by using a large capacity model and careful discriminative optimization. The proposed framework is applied to the problem of hand pose recognition in depth and infra-red images, using a very large training set. Both the accuracy and the classification time obtained are considerably superior to relevant competing methods, allowing one to reach accuracy targets with run times orders of magnitude faster than the competition. We show empirically that using DFE, we can significantly reduce classification time by increasing training sample size for a fixed target accuracy. Finally a DFE result is shown for the MNIST dataset, showing the method’s merit extends beyond depth images.

DFlow and DField: New Features for Capturing Object and Image Relationships

P. Kisilev, D. Freedman, E. Walach, and A. Tzadok

Proceedings of the International Conference on Pattern Recognition (ICPR), 2012

In this paper we propose two new types of features useful for problems in which one wants to describe object or image relationships rather than objects or images themselves. The features are based on the notion of distribution flow, as derived from the classic Transportation Problem. Two variants of such features, the Distribution Flow (DFlow) and Displacement Field (DField), are defined and studied. The proposed features show promising results in two different applications, Inter- and Intra-Class Relationship Characterization, and improve on simple concatenation of corresponding pairs of histograms.

Efficient and Robust Image Descriptor for GUI Object Classification

A. Dubrovina, P. Kisilev, D. Freedman, S. Schein, and R. Bergman

Proceedings of the International Conference on Pattern Recognition (ICPR), 2012

Graphical User Interface (GUI) object classification is essential for image-based software automation tools. The challenges posed by GUI object classification are significantly different from those in natural image classification. In this paper we present a novel image descriptor developed specifically for GUI objects; it is robust to various changes in the appearance of GUI objects, such as various screen resolution, ”skin”, as well as various operating system related. We use this image descriptor with Support Vector Machine classifier, and experimentally show the descriptor robustness to the above transformations, and its superior performance compared to existing image descriptors.

An Improved Image Graph for Semi-Automatic Segmentation

D. Freedman

Signal, Image and Video Processing, 6(4):533-545, 2012

The problem of semi-automatic segmentation has attracted much interest over the last few years. The Random Walker algorithm [1] has proven to be quite a popular solution to this problem, as it is able to deal with several components, and models the image using a convenient graph structure. We propose two improvements to the image graph used by the Random Walker method. First, we propose a new way of computing the edge weights. Traditionally, such weights are based on the similarity between two neighbouring pixels, using their grayscale intensities or colours. We substitute a new definition of weights based on the probability distributions of colours. This definition is much more robust than traditional measures, as it allows for textured objects, and objects that are composed of multiple perceptual components. Second, the traditional graph has a vertex set which is the set of pixels, and edges between each pair of neighbouring pixels. We substitute a smaller, irregular graph based on Mean Shift oversegmentation. This new graph is typically several orders of magnitude smaller than the original image graph, which can lead to a major savings in computing time. We show results demonstrating the substantial improvement achieved when using the proposed image graph.

Enforcing Topological Constraints in Random Field Image Segmentation

C. Chen, D. Freedman, and C.H. Lampert

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011

The problem of semi-automatic segmentation has attracted much interest over the last few years. The Random Walker algorithm [1] has proven to be quite a popular solution to this problem, as it is able to deal with several components, and models the image using a convenient graph structure. We propose two improvements to the image graph used by the Random Walker method. First, we propose a new way of computing the edge weights. Traditionally, such weights are based on the similarity between two neighbouring pixels, using their grayscale intensities or colours. We substitute a new definition of weights based on the probability distributions of colours. This definition is much more robust than traditional measures, as it allows for textured objects, and objects that are composed of multiple perceptual components. Second, the traditional graph has a vertex set which is the set of pixels, and edges between each pair of neighbouring pixels. We substitute a smaller, irregular graph based on Mean Shift oversegmentation. This new graph is typically several orders of magnitude smaller than the original image graph, which can lead to a major savings in computing time. We show results demonstrating the substantial improvement achieved when using the proposed image graph.

Hardness Results for Optimal Homology Bases

C. Chen and D. Freedman

Discrete and Computational Geometry, 45(3):425–448, 2011

In this paper, we address the problem of localizing homology classes, namely, finding the cycle representing a given class with the most concise geometric measure. We study the problem with different measures: volume, diameter and radius. For volume, that is, the 1-norm of a cycle, we prove the problem is NP-hard to approximate within any constant factor. We also prove that for homology of 2-dimension or higher, the problem is NP-hard to approximate even when the Betti number is O(1). A side effect is the inapproximability proof of the problem of computing the nonbounding cycle with the smallest volume, and computing cycles representing a homology basis with the minimal total volume. As for the other two measures defined by pairwise geodesic distance, we show that the localization problem is NP-hard for diameter, but is polynomial for radius.

Algebraic Topology for Computer Vision

D. Freedman and C. Chen

Computer Vision (editor S.R. Yoshida), p 239–268. Nova Science, 2011

Algebraic topology is generally considered one of the purest subfields of mathematics. However, over the last decade two interesting new lines of research have emerged, one focusing on algorithms for algebraic topology, and the other on applications of algebraic topology in engineering and science. Amongst the new areas in which the techniques have been applied are computer vision and image processing. In this paper, we survey the results of these endeavours. Because algebraic topology is an area of mathematics with which most computer vision practitioners have no experience, we review the machinery behind the theories of homology and persistent homology; our review emphasizes intuitive explanations. In terms of applications to computer vision, we focus on four illustrative problems: shape signatures, natural image statistics, image denoising, and segmentation. Our hope is that this review will stimulate interest on the part of computer vision researchers to both use and extend the tools of this new field.

Topology Noise Removal for Curve and Surface Evolution

C. Chen and D. Freedman

International MICCAI Workshop on Medical Computer Vision, pages 31–42, 2010

In cortex surface segmentation, the extracted surface is required to have a particular topology, namely, a two-sphere. We present a new method for removing topology noise of a curve or surface within the level set framework, and thus produce a cortal surface with correct topology. We define a new energy term which quantifies topology noise. We then show how to minimize this term by computing its functional derivative with respect to the level set function. This method differs from existing methods in that it is inherently continuous and not digital; and in the way that our energy directly relates to the topology of the underlying curve or surface, versus existing knot-based measures which are related in a more indirect fashion. The proposed flow is validated empirically.

KDE Paring and a Faster Mean Shift Algorithm

D. Freedman and P. Kisilev

SIAM Journal on Imaging Sciences, 3(4):878–903, 2010

The kernel density estimate (KDE) is a nonparametric density estimate which has broad application in computer vision and pattern recognition. In particular, the mean shift procedure uses the KDE structure to cluster or segment data, including images and video. The usefulness of these twin techniques—KDE and mean shift—on large data sets is hampered by the large space or description complexity of the KDE, which in turn leads to a large time complexity of the mean shift procedure that is superlinear in the number of points. In this paper, we propose a sampling technique for KDE paring, i.e., the construction of a compactly represented KDE with much smaller description complexity. We prove that this technique has good properties in that the pared-down KDE so constructed is close to the original KDE in a precise mathematical sense. We then show how to use this pared-down KDE to devise a considerably faster mean shift algorithm, whose time complexity we analyze formally. Experiments show that image and video segmentation results of the proposed fast mean shift method are similar to those based on the standard mean shift procedure, with the typical speed-up several orders of magnitude for large data sets. Finally, we present an application of the fast mean shift method to the efficient construction of multiscale graph structures for images, which can be used as a preprocessing step for more sophisticated segmentation algorithms.

Computing Color Transforms With Applications to Image Editing

D. Freedman and P. Kisilev

Journal of Mathematical Imaging and Vision, 37(3):220–231, 2010

In this paper, we present a unified approach for the problem of computing color transforms, applications of which include shadow removal and object recoloring. We propose two algorithms for transforming colors. In the first algorithm, the detection of source and target regions is performed using a Bayesian classifier. Given these regions, the computed transform alters the color properties of the target region so as to closely resemble those of the source region. The proposed probabilistic formulation leads to a linear program (similar to the classic Transportation Problem), which computes the desired transformation between the target and source distributions. In the second algorithm, the detection and transformation steps are united into a single unified approach; furthermore, the continuity of the transformation arises more intrinsically within this algorithm. Both formulations allow the target region to acquire the properties of the source region, while at the same time retaining its own look and feel. Promising results are shown for a variety of applications.

Measuring and Computing Natural Generators for Homology Groups

C. Chen and D. Freedman

Computational Geometry: Theory and Applications, 43(2):169–181, 2010

We develop a method for measuring homology classes. This involves two problems. First, we define the size of a homology class, using ideas from relative homology. Second, we define an optimal basis of a homology group to be the basis whose elements’ size have the minimal sum. We provide a greedy algorithm to compute the optimal basis and measure classes in it. The algorithm runs in O(βn^3 log^2 n) time, where n is the size of the simplicial complex and β is the Betti number of the homology group. Finally, we prove the stability of our result. The algorithm can be adapted to measure any given class.

Graph Cuts With Many-pixel Interactions: Theory and Applications to Shape Modelling

D. Freedman and M.W. Turek

Image and Vision Computing, 28(3):467–473, 2010

Many problems in computer vision can be posed in terms of energy minimization, where the relevant energy function models the interactions of many pixels. Finding the global or near-global minimum of such functions tends to be difficult, precisely due to these interactions of large ð> 3Þ numbers of pixels. In this paper, we derive a set of sufficient conditions under which energies which are functions of discrete binary variables may be minimized using graph cut techniques. We apply these conditions to the problem of incorporating shape priors in segmentation. Experimental results demonstrate the validity of this approach.

Parameter Tuning by Pairwise Preferences

P. Kisilev and D. Freedman

Proceedings of the British Machine Vision Conference (BMVC), 2010

That most computer vision algorithms rely on parameters is a fact of life which cannot be avoided. For optimal algorithm performance, these parameters need to be tuned; generally speaking, this tuning is done manually or in some heuristic fashion. In this paper, we propose a new, general method for attacking the problem of parameter tuning, which is applicable to a wide variety of computer vision algorithms. Our method is semi-automatic: a user is given several pairs of outputs from a given vision algorithm, which have been generated by different parameter values; the user is then required to simply choose, for each pair, which output is preferred. Our method then finds the smoothest preference function which satisfies these user preferences. Using the theory of Reproducing Kernel Hilbert Spaces, we show how this problem can be reduced to a finite-dimensional convex optimization. We validate our parameter tuning scheme both on simulated data and on the problem of tuning the parameters of an image denoising algorithm.

Content-Aware Image Resizing by Quadratic Programming

R. Chen, D. Freedman, Z. Karni, C. Gotsman, and L. Liu

Proceedings of the Workshop on Non-Rigid Shape Analysis and Deformable Image Alignment (NORDIA) (in conjunction with CVPR), 2010

Best Paper

We present a new method for content-aware image resizing based on a framework of global optimization. We show that the basic resizing problem can be formulated as a convex quadratic program. Furthermore, we demonstrate how the basic framework may be extended to prevent foldovers of the underlying mesh; encourage the magnification of salient regions; and preserve straight line structures. We show results demonstrating the effectiveness of the proposed method by comparing with four leading competitor methods.

Object-to-Object Color Transfer: Optimal Flows and SMSP Transformations

D. Freedman and P. Kisilev

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010

Given a source object and a target object, we consider the problem of transferring the “color scheme” of the source to the target, while at the same time maintaining the target’s original look and feel. This is a challenging problem due to the fact that the source and target may each consist of multiple colors, each of which comes in multiple shades. We propose a two stage solution to this problem. (1) A discrete color flow is computed from the target histogram to the source histogram; this flow is computed as the solution to a convex optimization problem, whose global optimum may be found. (2) The discrete flow is turned into a continuous color transformation, which can be written as a convex sum of Stretch-Minimizing Structure Preserving (SMSP) transformations. These SMSP transformations, which are computed based on the color flow, are affine transformations with desirable theoretical properties. The effectiveness of this two stage algorithm is validated in a series of experiments.

Fast Inverse Halftoning

Z. Karni, D. Freedman, and D. Shaked

Proceedings of the 31st International Congress on Imaging Science (ICIS), 2010

Printers use halftoning to render printed pages. This process is useful for many printing technologies which are binary in nature, as it allows the printer to deposit the ink as series of dots of constant darkness. Indeed, many of printing pipelines are based on this 1-bit framework; this unfortunately raises a critical problem when image processing operations that require the original 8-bit image must be performed. In this situation, what is required is the reconstruction of the 8-bit image from its halftoned version, a process referred to as "inverse halftoning". In this paper, we present a technique for fast inverse halftoning which given a dithered image together with the dithering mask that created it, approximates the original 8-bit image. The technique is elegant, and allows for generalizations to other inverse problems in which the exact details of the forward process are known. The algorithm is light computationally, and has been tested in practice. Results are shown, demonstrating the algorithm’s promise.

Hardness Results for Homology Localization

C. Chen and D. Freedman

Proceedings of the Twenty-First ACM-SIAM Symposium on Discrete Algorithms (SODA), 2010

We address the problem of localizing homology classes, namely, finding the cycle representing a given class with the most concise geometric measure. We focus on the volume measure, that is, the 1-norm of a cycle. Two main results are presented. First, we prove the problem is NP-hard to approximate within any constant factor. Second, we prove that for homology of dimension two or higher, the problem is NP-hard to approximate even when the Betti number is O(1). A side effect is the inapproximability of the problem of computing the nonbounding cycle with the smallest volume, and computing cycles representing a homology basis with the minimal total volume. We also discuss other geometric measures (diameter and radius) and show their disadvantages in homology localization. Our work is restricted to homology over the Z_2 field.

Energy-Based Image Deformation

Z. Karni, D. Freedman, and C. Gotsman

Computer Graphics Forum, 28(5):1257–1268, 2009

We present a general approach to shape deformation based on energy minimization, and applications of this approach to the problems of image resizing and 2D shape deformation. Our deformation energy generalizes that found in the prior art, while still admitting an efficient algorithm for its optimization. The key advantage of our energy function is the flexibility with which the set of “legal transformations” may be expressed; these transformations are the ones which are not considered to be distorting. This flexibility allows us to pose the problems of image resizing and 2D shape deformation in a natural way and generate minimally distorted results. It also allows us to strongly reduce undesirable foldovers or self-intersections. Results of both algorithms demonstrate the effectiveness of our approach.

Color Transforms for Creative Image Editing

P. Kisilev and D. Freedman

Proceedings of the Seventeenth IS&T Color Imaging Conference (CIC), 2009

In this paper, we present a unified approach for the problem of computing color transforms, applications of which include shadow removal, object recoloring, and scene relighting. The detection of source and target regions is performed using a Bayesian classifier. Given these regions, the computed transform alters the color properties of the target region so as to closely resemble those of the source region. The proposed probabilistic formulation leads to a linear program (similar to the classic Transportation Problem), which computes the desired transformation between the target and source distributions. This formulation allows the target region to acquire the properties of the source region, while at the same time retaining its own look and feel. Promising results are shown for a variety of applications.

Fast Mean Shift by Compact Density Representation

D. Freedman and P. Kisilev

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009

The Mean Shift procedure is a well established clustering technique that is widely used in imaging applications such as image and video segmentation, denoising, object tracking, texture classification, and others. However, the Mean Shift procedure has relatively high time complexity which is superlinear in the number of data points. In this paper we present a novel fast Mean Shift procedure which is based on the random sampling of the Kernel Density Estimate (KDE). We show theoretically that the resulting reduced KDE is close to the complete data KDE, to within a given accuracy. Moreover, we prove that the time complexity of the proposed fast Mean Shift procedure based on the reduced KDE is considerably lower than that of the original Mean Shift; the typical gain is of several orders for big data sets. Experiments show that image and video segmentation results of the proposed fast Mean Shift method are similar to those based on the standard Mean shift procedure. We also present a new application of the Fast Mean Shift method to the efficient construction of graph hierarchies for images; the resulting structure is potentially useful for solving computer vision problems which can be posed as graph problems, including stereo, semi-automatic segmentation, and optical flow.

Energy-Based Shape Deformation

Z. Karni, D. Freedman, and C. Gotsman

Proceedings of the ACM Symposium on Geometry Processing (SGP), 2009

We present a general approach to shape deformation based on energy minimization, and applications of this approach to the problems of image resizing and 2D shape deformation. Our deformation energy generalizes that found in the prior art, while still admitting an efficient algorithm for its optimization. The key advantage of our energy function is the flexibility with which the set of “legal transformations” may be expressed; these transformations are the ones which are not considered to be distorting. This flexibility allows us to pose the problems of image resizing and 2D shape deformation in a natural way and generate minimally distorted results. It also allows us to strongly reduce undesirable foldovers or self-intersections. Results of both algorithms demonstrate the effectiveness of our approach.

Fast Data Reduction via KDE Approximation

D. Freedman and P. Kisilev

Proceedings of the Data Compression Conference (DCC), 2009

Many of today’s real world applications need to handle and analyze continually growing amounts of data, while the cost of collecting data decreases. As a result, the main technological hurdle is that the data is acquired faster than it can be processed. Data reduction methods are thus increasingly important, as they allow one to extract the most relevant and important information from giant data sets. We present one such method, based on compressing the description length of an estimate of the probability distribution of a set points.

Quantifying Homology Classes

C. Chen and D. Freedman

Proceedings of the International Symposium on Theoretical Aspects of Computer Science (STACS), 2008

We develop a method for measuring homology classes. This involves three problems. First, we define the size of a homology class, using ideas from relative homology. Second, we define an optimal basis of a homology group to be the basis whose elements’ size have the minimal sum. We provide a greedy algorithm to compute the optimal basis and measure classes in it. The algorithm runs in O(β^4 n^3 log^2 n) time, where n is the size of the simplicial complex and β is the Betti number of the homology group. Third, we discuss different ways of localizing homology classes and prove some hardness results.

An Incremental Algorithm for Reconstruction of Surfaces of Arbitrary Codimension

D. Freedman

Computational Geometry: Theory and Applications, 36(2):106–116, 2007

A new algorithm is presented for surface reconstruction from unorganized points. Unlike many previous algorithms, this algorithm does not select a subcomplex of the Delaunay Triangulation of the points. Instead, it uses an incremental algorithm, adding one simplex of the surface at a time. As a result, the algorithm does not require the surface’s embedding space to be R^3; the dimension of the embedding space may vary arbitrarily without substantially affecting the complexity of the algorithm. One result of using this incremental algorithmic technique is that very little can be proven about the reconstruction; nonetheless, it is interesting from an experimental viewpoint, as it allows for a wider variety of surfaces to be reconstructed. In particular, the class of non-orientable surfaces, such as the Klein Bottle, may be reconstructed. Results are shown for surfaces of varying genus.

Joint Segmentation-Registration of Organs Using Geometric Models

A. Ayvaci and D. Freedman

Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2007

In this paper, we present a novel method for the segmentation of the organs found in CT and MR images. The proposed algorithm utilizes the shape model of the target organ to gain robustness in the case where the objective organ is surrounded by other organs or tissue with the similar intensity profile. The algorithm labels the image based on the graph-cuts technique and incorporates the shape prior using a technique based on level-sets. The method requires proper registration of the shape template for an accurate segmentation, and we propose a unified registration-segmentation framework to solve this problem. Furthermore, to reduce the computational cost, the algorithm is designed to run on watershed regions instead of voxels. The accuracy of the algorithm is shown on the medical examples.

Multiscale Modeling and Constraints for Max-Flow / Min-Cut Problems in Computer Vision

M.W. Turek and D. Freedman

Proceedings of the IEEE Workshop on Perceptual Organization in Computer Vision (in conjunction with CVPR), 2006

Multiscale techniques have been used for many years in computer vision. Recently multiscale edges have received attention in spectral graph methods as an important perceptual cue. In this paper multiscale cues are used in the context of max-flow/min-cut energy minimization. We formulate multiscale min-cut versions of three typical computer vision applications, namely interactive segmentation, image restoration, and optical flow. We then solve across all scales simultaneously. This use of multiscale models and constraints leads to quantitatively and qualitatively improved experimental results.

Model-Based Segmentation of Medical Imagery by Matching Distributions

D. Freedman, R.J. Radke, T. Zhang, Y. Jeong, D.M. Lovelock, and G.T.Y. Chen

IEEE Transactions on Medical Imaging, 24(3):281–292, 2005

The segmentation of deformable objects from three-dimensional (3-D) images is an important and challenging problem, especially in the context of medical imagery. We present a new segmentation algorithm based on matching probability distributions of photometric variables that incorporates learned shape and appearance models for the objects of interest. The main innovation over similar approaches is that there is no need to compute a pixelwise correspondence between the model and the image. This allows for a fast, principled algorithm. We present promising results on difficult imagery for 3-D computed tomography images of the male pelvis for the purpose of image-guided radiotherapy of the prostate.

Improving Performance of Distribution Tracking Through Background Mismatch

T. Zhang and D. Freedman

IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2):282–287, 2005

This paper proposes a new density matching method based on background mismatching for tracking of nonrigid moving objects. The new tracking method extends the idea behind the original density-matching tracker [7], which tracks an object by finding a contour in which the photometric density sampled from the enclosed region most closely matches a model density. This method can be quite sensitive to the initial curve placements and model density. The new method eliminates these sensitivities by adding a second term to the optimization: The mismatch between the model density and the density sampled from the background. By maximizing this term, the tracking algorithm becomes significantly more robust in practice. Furthermore, we show the enhanced ability of the algorithm to deal with target objects which possess smooth or diffuse boundaries. The tracker is in the form of a partial differential equation, and is implemented using the level-set framework. Experiments on synthesized images and real video sequences show our proposed methods are effective and robust; the results are compared with several existing methods.

Energy Minimization via Graph Cuts: Settling What Is Possible

D. Freedman and P. Drineas

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005

The recent explosion of interest in graph cut methods in computer vision naturally spawns the question: what energy functions can be minimized via graph cuts? This question was first attacked by two papers of Kolmogorov and Zabih [23, 24], in which they dealt with functions with pairwise and triplewise pixel interactions. In this work, we extend their results in two directions. First, we examine the case of k-wise pixel interactions; the results are derived from a purely algebraic approach. Second, we discuss the applicability of provably approximate algorithms. Both of these developments should help researchers best understand what can and cannot be achieved when designing graph cut based algorithms.

Illumination-Invariant Tracking via Graph Cuts

D. Freedman and M.W. Turek

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005

Illumination changes are a ubiquitous problem in computer vision. They present a challenge in many applications, including tracking: for example, an object may move in and out of a shadow. We present a new tracking algorithm which is insensitive to illumination changes, while at the same time using all of the available photometric information. The algorithm is based on computing an illumination-invariant optical flow field; the computation is made robust by using a graph cuts formulation. Experimentally, the new technique is shown to quite reliable in both synthetic and real sequences, dealing with a variety of illumination changes that cause problems for density based trackers.

Interactive Graph Cut Based Segmentation With Shape Priors

D. Freedman and T. Zhang

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005

Interactive or semi-automatic segmentation is a useful alternative to pure automatic segmentation in many applications. While automatic segmentation can be very challenging, a small amount of user input can often resolve ambiguous decisions on the part of the algorithm. In this work, we devise a graph cut algorithm for interactive segmentation which incorporates shape priors. While traditional graph cut approaches to interactive segmentation are often quite successful, they may fail in cases where there are diffuse edges, or multiple similar objects in close proximity to one another. Incorporation of shape priors within this framework mitigates these problems. Positive results on both medical and natural images are demonstrated

Active Contours for Tracking Distributions

D. Freedman and T. Zhang

IEEE Transactions on Image Processing, 13(4):518–526, 2004

A new approach to tracking using active contours is presented. The class of objects to be tracked is assumed to be characterized by a probability distribution over some variable, such as intensity, color, or texture. The goal of the algorithm is to find the region within the current image, such that the sample distribution of the interior of the region most closely matches the model distribution. Two separate criteria for matching distributions are examined, and the curve evolution equations are derived in each case. The flows are shown to perform well in experiments.

Surface Reconstruction, One Triangle at a Time

D. Freedman

Proceedings of the Canadian Conference of Computational Geometry (CCCG), 2004

A new algorithm is presented for surface reconstruction from unorganized points. Unlike many existing methods, this algorithm does not select a subcomplex of the Delaunay Triangulation of the points. Instead, it uses an incremental technique, adding one triangle of the surface at a time. As a result, the algorithm does not require the surface’s embedding space to be R^3; the dimension of the embedding space may vary arbitrarily without substantially affecting the complexity of the algorithm. A wider variety of surfaces may therefore be reconstructed, including the class of non-orientable surfaces, such as the Klein Bottle. The algorithm is of an experimental character; while it is motivated from geometric and topological intuition, no proofs of homeomorphism are given. Instead, its efficacy is demonstrated from the point of view of correct experimental reconstruction of surfaces of varying genus.

Deformable Model-Based Segmentation of 3D CT by Matching Distributions

R.J. Radke, D. Freedman, T. Zhang, Y. Jeong, and G.T.Y. Chen

Medical Physics, 31(6):1711, 2004

A rate-limiting step in tomographic image guided therapy or 4D CT-based treatment planning is the significant amount of time and human intervention required to delineate the tumor and nearby critical structures on each scan. A radiation oncologist can often take 30-45 minutes to outline all of the structures of interest in every axial slice. Performing this operation each of the 30-40 times a patient is treated is tedious. Using an uncompiled MATLAB implementation on a modest hardware platform (1.67 GHz AMD machine with 448 MB RAM), we present an algorithm that performs the same procedure in less than five minutes. Our early results show in-plane agreement with manual segmentation of multiple organs in the male pelvis on the order of 2- 3mm.

Model-Based Multiobject Segmentation via Distribution Matching

D. Freedman, R.J. Radke, T. Zhang, Y. Jeong, and G.T.Y. Chen

Proceedings of the IEEE Workshop on Articulated and Nonrigid Motion (in conjunction with CVPR), 2004

A new algorithm for the segmentation of objects from 3D images using deformable models is presented. This algorithm relies on learned shape and appearance models for the objects of interest. The main innovation over similar approaches is that there is no need to compute a pixelwise correspondence between the model and the image; instead, probability distributions are compared. This allows for a faster, more principled algorithm. Furthermore, the algorithm is not sensitive to the form of the shape model, making it quite flexible. Results of the algorithm are shown for the segmentation of the prostate and bladder from medical images.

Effective Tracking Through Tree-Search

D. Freedman

IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):604–615, 2003

A new contour tracking algorithm is presented. Tracking is posed as a matching problem between curves constructed out of edges in the image, and some shape space describing the class of objects of interest. The main contributions of the paper are to present an algorithm which solves this problem accurately and efficiently, in a provable manner. In particular, the algorithm’s efficiency derives from a novel tree-search algorithm through the shape space, which allows for much of the shape space to be explored with very little effort. This latter property makes the algorithm effective in highly cluttered scenes, as is demonstrated in an experimental comparison with a condensation tracker.

Tracking Objects Using Density Matching and Shape Priors

T. Zhang and D. Freedman

Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2003

We present a novel method for tracking objects by combining density matching with shape priors. Density matching is a tracking method which operates by maximizing the Bhattacharyya similarity measure between the photometric distribution from an estimated image region and a model photometric distribution. Such trackers can be expressed as PDE-based curve evolutions, which can be implemented using level sets. Shape priors can be combined with this level-set implementation of density matching by representing the shape priors as a series of level sets; a variational approach allows for a natural, parametrization-independent shape term to be derived. Experimental results on real image sequences are shown.

Combinatorial Curve Reconstruction in Hilbert Spaces: A New Sampling Theory and an Old Result Revisited

D. Freedman

Computational Geometry: Theory and Applications, 23(2):227–241, 2002

The goals of this paper are twofold. The first is to present a new sampling theory for curves, based on a new notion of local feature size. The properties of this new feature size are investigated, and are compared with the standard feature size definitions. The second goal is to revisit an existing algorithm for combinatorial curve reconstruction in spaces of arbitrary dimension, the Nearest Neighbour Crust of Dey and Kumar [Proc. ACM-SIAM Sympos. Discrete Algorithms, 1999, pp. 893–894], and to prove its validity under the new sampling conditions. Because the new sampling theory can imply less dense sampling, the new proof is, in some cases, stronger than that presented in [Proc. ACM-SIAM Sympos. Discrete Algorithms, 1999, pp. 893–894]. Also of interest are the techniques used to prove the theorem, as they are unlike those used used in the curve reconstruction literature to date.

Efficient Simplicial Reconstructions of Manifolds From Their Samples

D. Freedman

IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(10):1349–1357, 2002

A new algorithm for manifold learning is presented. Given only samples of a finite-dimensional differentiable manifold and no a priori knowledge of the manifold’s geometry or topology except for its dimension, the goal is to find a description of the manifold. The learned manifold must approximate the true manifold well, both geometrically and topologically, when the sampling density is sufficiently high. The proposed algorithm constructs a simplicial complex based on approximations to the tangent bundle of the manifold. An important property of the algorithm is that its complexity depends on the dimension of the manifold, rather than that of the embedding space. Successful examples are presented in the cases of learning curves in the plane, curves in space, and surfaces in space; in addition, a case when the algorithm fails is analyzed.

Compression of Protein Conformational Space

Y. Shao, M. Magdon-Ismail, D. Freedman, S. Akella, and C. Bystroff

International Conference on Research in Computational Molecular Biology (RECOMB), 2002

Compressing conformational space is the process of defining a subspace of minimal dimensionality where any point may represent a protein-like structure. This is similar to the problem of image compression, where it is desired to reconstruct an image from a small amount of information. In this case, the similarity (in atomic detail) between a true protein and the protein like structure obtained by projecting the compressed protein back into real space is the measure of the success of the compression algorithm. If proteins may be accurately compressed to a space that is efficiently searchable, and then decompressed back to real space, existing energy functions that use atomic detail may finally be rigorously tested in an exhaustive conformational search. (Note: Unlike in the threading approach, such an exhaustive simulation search may consider the folding pathway, and therefore the folding kinetics.) In this paper, we apply compression techniques to various representations of the proteins of known structure. We apply principle component analysis (PCA) to the coordinate, backbone angle, and distance matrix representations. Additionally, we applied Fourier transform techniques to distance matrix space. The success of the compression was measured by the structural difference between the original and the reconstructed coordinates for proteins that were not used in the development of the compression algorithm. It is found that some representations of the model are more easily compressed that others. We find, unexpectedly, that the models that retain the most atomic detail may be compressed to the smallest subspace.

Contour Tracking in Clutter: A Subset Approach

D. Freedman and M.S. Brandstein

International Journal of Computer Vision, 38(2):173–186, 2000

A new method for tracking contours of moving objects in clutter is presented. For a given object, a model of its contours is learned from training data in the form of a subset of contour space. Greater complexity is added to the contour model by analyzing rigid and non-rigid transformations of contours separately. In the course of tracking, multiple contours may be observed due to the presence of extraneous edges in the form of clutter; the learned model guides the algorithm in picking out the correct one. The algorithm, which is posed as a solution to a minimization problem, is made efficient by the use of several iterative schemes. Results applying the proposed algorithm to the tracking of a flexing finger and to a conversing individual’s lips are presented.

Manifold Reconstruction From Unorganized Points

D. Freedman

Proceedings of the Asilomar Conference on Signals, Systems, and Computers, 2000

A new algorithm for manifold reconstruction is presented. The goal is to take samples drawn from a finite-dimensional manifold, and to reconstruct a manifold, based only on the samples, which is a good approximation to the true manifold; nothing of the true manifold’s geometry or topology is known a priori. The algorithm constructs a simplicial complex based on approximating tangent hyperplanes to the manifold, and does so efficiently. Successful examples are presented for curve reconstruction in the plane, curve reconstruction in space, and surface reconstruction in space.

Provably Fast Algorithms for Contour Tracking

D. Freedman and M.S. Brandstein

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2000

A new tracker is presented. Two sets are identified: one which contains all possible curves as found in the image, and a second which contains all curves which characterize the object of interest. The former is constructed out of edge-points in the image, while the latter is learned prior to running. The tracked curve is taken to be the element of the first set which is nearest the second set. The formalism for the learned set of curves allows for mathematically well understood groups of transformations (e.g. affine, projective) to be treated on the same footing as less well understood deformations, which may be learned from training curves. An algorithm is proposed to solve the tracking problem, and its properties are theoretically demonstrated: it solves the global optimization problem, and does so with certain complexity bounds. Experimental results applying the proposed algorithm to the tracking of a moving finger are presented, and compared with the results of a condensation approach.

Methods of Global Optimization in the Tracking of Contours

D. Freedman and M.S. Brandstein

Proceedings of the Asilomar Conference on Signals, Systems, and Computers, 1999

A new method for tracking contours of moving objects in clutter is presented. For a given object, a model of its contours is learned from training contours in the form of a subset of curve space. Complexity is added to the contour model by analyzing rigid and non-rigid transformations of contours separately. In the course of tracking, a very large number of potential curves are typically observed due to the presence of extraneous edges in the form of clutter; the learned model guides the algorithm in picking out the correct one. The algorithm is posed as a solution to a minimization problem; theoretical results on how to achieve the global minimum to within a certain resolution, and the complexity of this operation, are presented. Experimental results applying the proposed algorithm to the tracking of a flexing finger and to a conversing individual’s lips are also presented.

A Subset Approach to Contour Tracking in Clutter

D. Freedman and M.S. Brandstein

Proceedings of the IEEE International Conference on Computer Vision (ICCV), 1999

A new method for tracking contours of moving objects in clutter is presented. For a given object, a model of its contours is learned from training data in the form of a subset of contour space. Greater complexity is added to the contour model by analyzing rigid and non-rigid transformations of contoursseparately. In the course of tracking, multiple contours may be observed due to the presence of extraneous edges in the form of clutter; the learned model guides the algorithm in picking out the correct one. The algorithm, which is posed as a solution to a minimization problem, is made efficient by the use of several iterative schemes. Results applying the proposed algorithm to the tracking of a flexing finger and to a conversing individual’s lips are presented.

Density of Electrons in a Lateral Quantum Dot by Semi-Classical Trajectory Analysis

R. Taylor, A. Sachrajda, D. Freedman, and P. Kelly

Solid State Communications, 89(7):579–582, 1994

A semi-classical trajectory model is used to calculate the magneto-resistance of submicron-sized dots with entrance and exit quantum point contact leads. A comparison with experimental data allows the determination of the electron density and the mean free path for the confined regions and shows that these parameters can differ from their equivalent “bulk” values. It is found that for realistic geometries the calculated conductance is extremely sensitive to the angular spread of the collimated beam but depends only weakly on the effective width of the quantum point contacts.