Tomasz Trzcinski

CV
h-index48
33papers
1,061citations
Novelty43%
AI Score37

33 Papers

LGJan 31, 2023
Learning Data Representations with Joint Diffusion Models

Kamil Deja, Tomasz Trzcinski, Jakub M. Tomczak

Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train. In this work, we depart from a set of empirical observations that indicate the usefulness of internal representations built by contemporary deep diffusion-based generative models not only for generating but also predicting. We then propose to extend the vanilla diffusion model with a classifier that allows for stable joint end-to-end training with shared parameterization between those objectives. The resulting joint diffusion model outperforms recent state-of-the-art hybrid methods in terms of both classification and generation quality on all evaluated benchmarks. On top of our joint training approach, we present how we can directly benefit from shared generative and discriminative representations by introducing a method for visual counterfactual explanations.

CVAug 6, 2024Code
LumiGauss: Relightable Gaussian Splatting in the Wild

Joanna Kaleta, Kacper Kania, Tomasz Trzcinski et al.

Decoupling lighting from geometry using unconstrained photo collections is notoriously challenging. Solving it would benefit many users as creating complex 3D assets takes days of manual labor. Many previous works have attempted to address this issue, often at the expense of output fidelity, which questions the practicality of such methods. We introduce LumiGauss - a technique that tackles 3D reconstruction of scenes and environmental lighting through 2D Gaussian Splatting. Our approach yields high-quality scene reconstructions and enables realistic lighting synthesis under novel environment maps. We also propose a method for enhancing the quality of shadows, common in outdoor scenes, by exploiting spherical harmonics properties. Our approach facilitates seamless integration with game engines and enables the use of fast precomputed radiance transfer. We validate our method on the NeRF-OSR dataset, demonstrating superior performance over baseline methods. Moreover, LumiGauss can synthesize realistic images for unseen environment maps. Our code: https://github.com/joaxkal/lumigauss.

AIMay 30, 2022
A Deep Learning Approach for Automatic Detection of Qualitative Features of Lecturing

Anna Wroblewska, Jozef Jasek, Bogdan Jastrzebski et al.

Artificial Intelligence in higher education opens new possibilities for improving the lecturing process, such as enriching didactic materials, helping in assessing students' works or even providing directions to the teachers on how to enhance the lectures. We follow this research path, and in this work, we explore how an academic lecture can be assessed automatically by quantitative features. First, we prepare a set of qualitative features based on teaching practices and then annotate the dataset of academic lecture videos collected for this purpose. We then show how these features could be detected automatically using machine learning and computer vision techniques. Our results show the potential usefulness of our work.

CVApr 12, 2021Code
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

Jacek Komorowski, Monika Wysoczanska, Tomasz Trzcinski

We introduce a discriminative multimodal descriptor based on a pair of sensor readings: a point cloud from a LiDAR and an image from an RGB camera. Our descriptor, named MinkLoc++, can be used for place recognition, re-localization and loop closure purposes in robotics or autonomous vehicles applications. We use late fusion approach, where each modality is processed separately and fused in the final part of the processing pipeline. The proposed method achieves state-of-the-art performance on standard place recognition benchmarks. We also identify dominating modality problem when training a multimodal descriptor. The problem manifests itself when the network focuses on a modality with a larger overfit to the training data. This drives the loss down during the training but leads to suboptimal performance on the evaluation set. In this work we describe how to detect and mitigate such risk when using a deep metric learning approach to train a multimodal neural network. Our code is publicly available on the project website: https://github.com/jac99/MinkLocMultimodal.

IVDec 21, 2023
Noninvasive Estimation of Mean Pulmonary Artery Pressure Using MRI, Computer Models, and Machine Learning

Michal K. Grzeszczyk, Tadeusz Satlawa, Angela Lungu et al.

Pulmonary Hypertension (PH) is a severe disease characterized by an elevated pulmonary artery pressure. The gold standard for PH diagnosis is measurement of mean Pulmonary Artery Pressure (mPAP) during an invasive Right Heart Catheterization. In this paper, we investigate noninvasive approach to PH detection utilizing Magnetic Resonance Imaging, Computer Models and Machine Learning. We show using the ablation study, that physics-informed feature engineering based on models of blood circulation increases the performance of Gradient Boosting Decision Trees-based algorithms for classification of PH and regression of values of mPAP. We compare results of regression (with thresholding of estimated mPAP) and classification and demonstrate that metrics achieved in both experiments are comparable. The predicted mPAP values are more informative to the physicians than the probability of PH returned by classification models. They provide the intuitive explanation of the outcome of the machine learning model (clinicians are accustomed to the mPAP metric, contrary to the PH probability).

CVSep 7, 2025
Neural Bloom: A Deep Learning Approach to Real-Time Lighting

Rafal Karp, Dawid Gruszka, Tomasz Trzcinski

We propose a novel method to generate bloom lighting effect in real time using neural networks. Our solution generate brightness mask from given 3D scene view up to 30% faster than state-of-the-art methods. The existing traditional techniques rely on multiple blur appliances and texture sampling, also very often have existing conditional branching in its implementation. These operations occupy big portion of the execution time. We solve this problem by proposing two neural network-based bloom lighting methods, Neural Bloom Lighting (NBL) and Fast Neural Bloom Lighting (FastNBL), focusing on their quality and performance. Both methods were tested on a variety of 3D scenes, with evaluations conducted on brightness mask accuracy and inference speed. The main contribution of this work is that both methods produce high-quality bloom effects while outperforming the standard state-of-the-art bloom implementation, with FastNBL being faster by 28% and NBL faster by 12%. These findings highlight that we can achieve realistic bloom lighting phenomena faster, moving us towards more realism in real-time environments in the future. This improvement saves computational resources, which is a major bottleneck in real-time rendering. Furthermore, it is crucial for sustaining immersion and ensuring smooth experiences in high FPS environments, while maintaining high-quality realism.

CVOct 24, 2021
EgoNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale

Jacek Komorowski, Monika Wysoczanska, Tomasz Trzcinski

The paper presents a deep neural network-based method for global and local descriptors extraction from a point cloud acquired by a rotating 3D LiDAR. The descriptors can be used for two-stage 6DoF relocalization. First, a course position is retrieved by finding candidates with the closest global descriptor in the database of geo-tagged point clouds. Then, the 6DoF pose between a query point cloud and a database point cloud is estimated by matching local descriptors and using a robust estimator such as RANSAC. Our method has a simple, fully convolutional architecture based on a sparse voxelized representation. It can efficiently extract a global descriptor and a set of keypoints with local descriptors from large point clouds with tens of thousand points. Our code and pretrained models are publicly available on the project website.

CVOct 6, 2021
Large-Scale Topological Radar Localization Using Learned Descriptors

Jacek Komorowski, Monika Wysoczanska, Tomasz Trzcinski

In this work, we propose a method for large-scale topological localization based on radar scan images using learned descriptors. We present a simple yet efficient deep network architecture to compute a rotationally invariant discriminative global descriptor from a radar scan image. The performance and generalization ability of the proposed method is experimentally evaluated on two large scale driving datasets: MulRan and Oxford Radar RobotCar. Additionally, we present a comparative evaluation of radar-based and LiDAR-based localization using learned global descriptors. Our code and trained models are publicly available on the project website.

CVFeb 17, 2021
I Want This Product but Different : Multimodal Retrieval with Synthetic Query Expansion

Ivona Tautkute, Tomasz Trzcinski

This paper addresses the problem of media retrieval using a multimodal query (a query which combines visual input with additional semantic information in natural language feedback). We propose a SynthTriplet GAN framework which resolves this task by expanding the multimodal query with a synthetically generated image that captures semantic information from both image and text input. We introduce a novel triplet mining method that uses a synthetic image as an anchor to directly optimize for embedding distances of generated and target images. We demonstrate that apart from the added value of retrieval illustration with synthetic image with the focus on customization and user feedback, the proposed method greatly surpasses other multimodal generation methods and achieves state of the art results in the multimodal retrieval task. We also show that in contrast to other retrieval methods, our method provides explainable embeddings.

LGNov 30, 2020
RegFlow: Probabilistic Flow-based Regression for Future Prediction

Maciej Zięba, Marcin Przewięźlikowski, Marek Śmieja et al.

Predicting future states or actions of a given system remains a fundamental, yet unsolved challenge of intelligence, especially in the scope of complex and non-deterministic scenarios, such as modeling behavior of humans. Existing approaches provide results under strong assumptions concerning unimodality of future states, or, at best, assuming specific probability distributions that often poorly fit to real-life conditions. In this work we introduce a robust and flexible probabilistic framework that allows to model future predictions with virtually no constrains regarding the modality or underlying probability distribution. To achieve this goal, we leverage a hypernetwork architecture and train a continuous normalizing flow model. The resulting method dubbed RegFlow achieves state-of-the-art results on several benchmark datasets, outperforming competing approaches by a significant margin.

CVDec 10, 2019
SuperNCN: Neighbourhood consensus network for robust outdoor scenes matching

Grzegorz Kurzejamski, Jacek Komorowski, Lukasz Dabala et al.

In this paper, we present a framework for computing dense keypoint correspondences between images under strong scene appearance changes. Traditional methods, based on nearest neighbour search in the feature descriptor space, perform poorly when environmental conditions vary, e.g. when images are taken at different times of the day or seasons. Our method improves finding keypoint correspondences in such difficult conditions. First, we use Neighbourhood Consensus Networks to build spatially consistent matching grid between two images at a coarse scale. Then, we apply Superpoint-like corner detector to achieve pixel-level accuracy. Both parts use features learned with domain adaptation to increase robustness against strong scene appearance variations. The framework has been tested on a RobotCar Seasons dataset, proving large improvement on pose estimation task under challenging environmental conditions.

IVSep 11, 2019
Late fusion of deep learning and hand-crafted features for Achilles tendon healing monitoring

Norbert Kapinski, Jedrzej M. Nowosielski, Maciej E. Marchwiany et al.

Healing process assessment of the Achilles tendon is usually a complex procedure that relies on a combination of biomechanical and medical imaging tests. As a result, diagnostics remains a tedious and long-lasting task. Recently, a novel method for the automatic assessment of tendon healing based on Magnetic Resonance Imaging and deep learning was introduced. The method assesses six parameters related to the treatment progress utilizing a modified pre-trained network, PCA-reduced space, and linear regression. In this paper, we propose to improve this approach by incorporating hand-crafted features. We first perform a feature selection in order to obtain optimal sets of mixed hand-crafted and deep learning predictors. With the use of approx. 20,000 MRI slices, we then train a meta-regression algorithm that performs the tendon healing assessment. Finally, we evaluate the method against scores given by an experienced radiologist. In comparison with the previous baseline method, our approach significantly improves correlation in all of the six parameters assessed. Furthermore, our method uses only one MRI protocol and saves up to 60\% of the time needed for data acquisition.

IVSep 11, 2019
Monitoring Achilles tendon healing progress in ultrasound imaging with convolutional neural networks

Piotr Woznicki, Przemyslaw Przybyszewski, Norbert Kapinski et al.

Achilles tendon rupture is a debilitating injury, which is typically treated with surgical repair and long-term rehabilitation. The recovery, however, is protracted and often incomplete. Diagnosis, as well as healing progress assessment, are largely based on ultrasound and magnetic resonance imaging. In this paper, we propose an automatic method based on deep learning for analysis of Achilles tendon condition and estimation of its healing progress on ultrasound images. We develop custom convolutional neural networks for classification and regression on healing score and feature extraction. Our models are trained and validated on an acquired dataset of over 250.000 sagittal and over 450.000 axial ultrasound slices. The obtained estimates show a high correlation with the assessment of expert radiologists, with respect to all key parameters describing healing progress. We also observe that parameters associated with i.a. intratendinous healing processes are better modeled with sagittal slices. We prove that ultrasound imaging is quantitatively useful for clinical assessment of Achilles tendon healing process and should be viewed as complementary to magnetic resonance imaging.

CVJan 2, 2019
Plugin Networks for Inference under Partial Evidence

Michal Koperski, Tomasz Konopczynski, Rafał Nowak et al.

In this paper, we propose a novel method to incorporate partial evidence in the inference of deep convolutional neural networks. Contrary to the existing, top performing methods, which either iteratively modify the input of the network or exploit external label taxonomy to take the partial evidence into account, we add separate network modules ("Plugin Networks") to the intermediate layers of a pre-trained convolutional network. The goal of these modules is to incorporate additional signal, ie information about known labels, into the inference procedure and adjust the predicted output accordingly. Since the attached plugins have a simple structure, consisting of only fully connected layers, we drastically reduced the computational cost of training and inference. At the same time, the proposed architecture allows to propagate information about known labels directly to the intermediate layers to improve the final representation. Extensive evaluation of the proposed method confirms that our Plugin Networks outperform the state-of-the-art in a variety of tasks, including scene categorization, multi-label image annotation, and semantic segmentation.

CVOct 23, 2018
Classifying and Visualizing Emotions with Emotional DAN

Ivona Tautkute, Tomasz Trzcinski

Classification of human emotions remains an important and challenging task for many computer vision algorithms, especially in the era of humanoid robots which coexist with humans in their everyday life. Currently proposed methods for emotion recognition solve this task using multi-layered convolutional networks that do not explicitly infer any facial features in the classification phase. In this work, we postulate a fundamentally different approach to solve emotion recognition task that relies on incorporating facial landmarks as a part of the classification loss function. To that end, we extend a recently proposed Deep Alignment Network (DAN) with a term related to facial features. Thanks to this simple modification, our model called EmotionalDAN is able to outperform state-of-the-art emotion classification methods on two challenging benchmark dataset by up to 5%. Furthermore, we visualize image regions analyzed by the network when making a decision and the results indicate that our EmotionalDAN model is able to correctly identify facial landmarks responsible for expressing the emotions.

CVSep 28, 2018
Aggregation of binary feature descriptors for compact scene model representation in large scale structure-from-motion applications

Jacek Komorowski, Tomasz Trzcinski

In this paper we present an efficient method for aggregating binary feature descriptors to allow compact representation of 3D scene model in incremental structure-from-motion and SLAM applications. All feature descriptors linked with one 3D scene point or landmark are represented by a single low-dimensional real-valued vector called a \emph{prototype}. The method allows significant reduction of memory required to store and process feature descriptors in large-scale structure-from-motion applications. An efficient approximate nearest neighbours search methods suited for real-valued descriptors, such as FLANN, can be used on the resulting prototypes to speed up matching processed frames.

CVSep 28, 2018
SConE: Siamese Constellation Embedding Descriptor for Image Matching

Tomasz Trzcinski, Jacek Komorowski, Lukasz Dabala et al.

Numerous computer vision applications rely on local feature descriptors, such as SIFT, SURF or FREAK, for image matching. Although their local character makes image matching processes more robust to occlusions, it often leads to geometrically inconsistent keypoint matches that need to be filtered out, e.g. using RANSAC. In this paper we propose a novel, more discriminative, descriptor that includes not only local feature representation, but also information about the geometric layout of neighbouring keypoints. To that end, we use a Siamese architecture that learns a low-dimensional feature embedding of keypoint constellation by maximizing the distances between non-corresponding pairs of matched image patches, while minimizing it for correct matches. The 48-dimensional oating point descriptor that we train is built on top of the state-of-the-art FREAK descriptor achieves significant performance improvement over the competitors on a challenging TUM dataset.

CVSep 28, 2018
Interest point detectors stability evaluation on ApolloScape dataset

Jacek Komorowski, Konrad Czarnota, Tomasz Trzcinski et al.

In the recent years, a number of novel, deep-learning based, interest point detectors, such as LIFT, DELF, Superpoint or LF-Net was proposed. However there's a lack of a standard benchmark to evaluate suitability of these novel keypoint detectors for real-live applications such as autonomous driving. Traditional benchmarks (e.g. Oxford VGG) are rather limited, as they consist of relatively few images of mostly planar scenes taken in favourable conditions. In this paper we verify if the recent, deep-learning based interest point detectors have the advantage over the traditional, hand-crafted keypoint detectors. To this end, we evaluate stability of a number of hand crafted and recent, learning-based interest point detectors on the street-level view ApolloScape dataset.

CVJun 18, 2018
BinGAN: Learning Compact Binary Descriptors with a Regularized GAN

Maciej Zieba, Piotr Semberecki, Tarek El-Gaaly et al.

In this paper, we propose a novel regularization method for Generative Adversarial Networks, which allows the model to learn discriminative yet compact binary representations of image patches (image descriptors). We employ the dimensionality reduction that takes place in the intermediate layers of the discriminator network and train binarized low-dimensional representation of the penultimate layer to mimic the distribution of the higher-dimensional preceding layers. To achieve this, we introduce two loss terms that aim at: (i) reducing the correlation between the dimensions of the binarized low-dimensional representation of the penultimate layer i. e. maximizing joint entropy) and (ii) propagating the relations between the dimensions in the high-dimensional space to the low-dimensional space. We evaluate the resulting binary image descriptors on two challenging applications, image matching and retrieval, and achieve state-of-the-art results.

CVJun 13, 2018
Estimating Achilles tendon healing progress with convolutional neural networks

Norbert Kapinski, Jakub Zielinski, Bartosz A. Borucki et al.

Quantitative assessment of a treatment progress in the Achilles tendon healing process - one of the most common musculoskeletal disorder in modern medical practice - is typically a long and complex process: multiple MRI protocols need to be acquired and analysed by radiology experts. In this paper, we propose to significantly reduce the complexity of this assessment using a novel method based on a pre-trained convolutional neural network. We first train our neural network on over 500,000 2D axial cross-sections from over 3000 3D MRI studies to classify MRI images as belonging to a healthy or injured class, depending on the patient's condition. We then take the outputs of modified pre-trained network and apply linear regression on the PCA-reduced space of the features to assess treatment progress. Our method allows to reduce up to 5-fold the amount of data needed to be registered during the MRI scan without any information loss. Furthermore, we are able to predict the healing process phase with equal accuracy to human experts in 3 out of 6 main criteria. Finally, contrary to the current approaches to regeneration assessment that rely on radiologist subjective opinion, our method allows to objectively compare different treatments methods which can lead to improved diagnostics and patient's recovery.

CVApr 27, 2018
Extracting textual overlays from social media videos using neural networks

Adam Słucki, Tomasz Trzcinski, Adam Bielski et al.

Textual overlays are often used in social media videos as people who watch them without the sound would otherwise miss essential information conveyed in the audio stream. This is why extraction of those overlays can serve as an important meta-data source, e.g. for content classification or retrieval tasks. In this work, we present a robust method for extracting textual overlays from videos that builds up on multiple neural network architectures. The proposed solution relies on several processing steps: keyframe extraction, text detection and text recognition. The main component of our system, i.e. the text recognition module, is inspired by a convolutional recurrent neural network architecture and we improve its performance using synthetically generated dataset of over 600,000 images with text prepared by authors specifically for this task. We also develop a filtering method that reduces the amount of overlapping text phrases using Levenshtein distance and further boosts system's performance. The final accuracy of our solution reaches over 80A% and is au pair with state-of-the-art methods.

CVApr 26, 2018
Pay Attention to Virality: understanding popularity of social media videos with the attention mechanism

Adam Bielski, Tomasz Trzcinski

Predicting popularity of social media videos before they are published is a challenging task, mainly due to the complexity of content distribution network as well as the number of factors that play part in this process. As solving this task provides tremendous help for media content creators, many successful methods were proposed to solve this problem with machine learning. In this work, we change the viewpoint and postulate that it is not only the predicted popularity that matters, but also, maybe even more importantly, understanding of how individual parts influence the final popularity score. To that end, we propose to combine the Grad-CAM visualization method with a soft attention mechanism. Our preliminary results show that this approach allows for more intuitive interpretation of the content impact on video popularity, while achieving competitive results in terms of prediction accuracy.

CVApr 23, 2018
Siamese Generative Adversarial Privatizer for Biometric Data

Witold Oleszkiewicz, Peter Kairouz, Karol Piczak et al.

State-of-the-art machine learning algorithms can be fooled by carefully crafted adversarial examples. As such, adversarial examples present a concrete problem in AI safety. In this work we turn the tables and ask the following question: can we harness the power of adversarial examples to prevent malicious adversaries from learning identifying information from data while allowing non-malicious entities to benefit from the utility of the same data? For instance, can we use adversarial examples to anonymize biometric dataset of faces while retaining usefulness of this data for other purposes, such as emotion recognition? To address this question, we propose a simple yet effective method, called Siamese Generative Adversarial Privatizer (SGAP), that exploits the properties of a Siamese neural network to find discriminative features that convey identifying information. When coupled with a generative model, our approach is able to correctly locate and disguise identifying information, while minimally reducing the utility of the privatized dataset. Extensive evaluation on a biometric dataset of fingerprints and cartoon faces confirms usefulness of our simple yet effective method.

CVApr 22, 2018
I Know How You Feel: Emotion Recognition with Facial Landmarks

Ivona Tautkute, Tomasz Trzcinski, Adam Bielski

Classification of human emotions remains an important and challenging task for many computer vision algorithms, especially in the era of humanoid robots which coexist with humans in their everyday life. Currently proposed methods for emotion recognition solve this task using multi-layered convolutional networks that do not explicitly infer any facial features in the classification phase. In this work, we postulate a fundamentally different approach to solve emotion recognition task that relies on incorporating facial landmarks as a part of the classification loss function. To that end, we extend a recently proposed Deep Alignment Network (DAN), that achieves state-of-the-art results in the recent facial landmark recognition challenge, with a term related to facial features. Thanks to this simple modification, our model called EmotionalDAN is able to outperform state-of-the-art emotion classification methods on two challenging benchmark dataset by up to 5%.

CVJan 25, 2018
SocialML: machine learning for social media video creators

Tomasz Trzcinski, Adam Bielski, Paweł Cyrta et al.

In the recent years, social media have become one of the main places where creative content is being published and consumed by billions of users. Contrary to traditional media, social media allow the publishers to receive almost instantaneous feedback regarding their creative work at an unprecedented scale. This is a perfect use case for machine learning methods that can use these massive amounts of data to provide content creators with inspirational ideas and constructive criticism of their work. In this work, we present a comprehensive overview of machine learning-empowered tools we developed for video creators at Group Nine Media - one of the major social media companies that creates short-form videos with over three billion views per month. Our main contribution is a set of tools that allow the creators to leverage massive amounts of data to improve their creation process, evaluate their videos before the publication and improve content quality. These applications include an interactive conversational bot that allows access to material archives, a Web-based application for automatic selection of optimal video thumbnail, as well as deep learning methods for optimizing headline and predicting video popularity. Our A/B tests show that deployment of our tools leads to significant increase of average video view count by 12.9%. Our additional contribution is a set of considerations collected during the deployment of those tools that can hel

CVJan 8, 2018
DeepStyle: Multimodal Search Engine for Fashion and Interior Design

Ivona Tautkute, Tomasz Trzcinski, Aleksander Skorupa et al.

In this paper, we propose a multimodal search engine that combines visual and textual cues to retrieve items from a multimedia database aesthetically similar to the query. The goal of our engine is to enable intuitive retrieval of fashion merchandise such as clothes or furniture. Existing search engines treat textual input only as an additional source of information about the query image and do not correspond to the real-life scenario where the user looks for 'the same shirt but of denim'. Our novel method, dubbed DeepStyle, mitigates those shortcomings by using a joint neural network architecture to model contextual dependencies between features of different modalities. We prove the robustness of this approach on two different challenging datasets of fashion items and furniture where our DeepStyle engine outperforms baseline methods by 18-21% on the tested datasets. Our search engine is commercially deployed and available through a Web-based application.

CVAug 9, 2017
Random Binary Trees for Approximate Nearest Neighbour Search in Binary Space

Michal Komorowski, Tomasz Trzcinski

Approximate nearest neighbour (ANN) search is one of the most important problems in computer science fields such as data mining or computer vision. In this paper, we focus on ANN for high-dimensional binary vectors and we propose a simple yet powerful search method that uses Random Binary Search Trees (RBST). We apply our method to a dataset of 1.25M binary local feature descriptors obtained from a real-life image-based localisation system provided by Google as a part of Project Tango. An extensive evaluation of our method against the state-of-the-art variations of Locality Sensitive Hashing (LSH), namely Uniform LSH and Multi-probe LSH, shows the superiority of our method in terms of retrieval precision with performance boost of over 20%

CVJul 27, 2017
Understanding Aesthetics in Photography using Deep Convolutional Neural Networks

Maciej Suchecki, Tomasz Trzcinski

Evaluating aesthetic value of digital photographs is a challenging task, mainly due to numerous factors that need to be taken into account and subjective manner of this process. In this paper, we propose to approach this problem using deep convolutional neural networks. Using a dataset of over 1.7 million photos collected from Flickr, we train and evaluate a deep learning model whose goal is to classify input images by analysing their aesthetic value. The result of this work is a publicly available Web-based application that can be used in several real-life applications, e.g. to improve the workflow of professional photographers by pre-selecting the best photos.

CVJul 21, 2017
Evaluation of Hashing Methods Performance on Binary Feature Descriptors

Jacek Komorowski, Tomasz Trzcinski

In this paper we evaluate performance of data-dependent hashing methods on binary data. The goal is to find a hashing method that can effectively produce lower dimensional binary representation of 512-bit FREAK descriptors. A representative sample of recent unsupervised, semi-supervised and supervised hashing methods was experimentally evaluated on large datasets of labelled binary FREAK feature descriptors.

CVJul 21, 2017
Recurrent Neural Networks for Online Video Popularity Prediction

Tomasz Trzcinski, Pawel Andruszkiewicz, Tomasz Bochenski et al.

In this paper, we address the problem of popularity prediction of online videos shared in social media. We prove that this challenging task can be approached using recently proposed deep neural network architectures. We cast the popularity prediction problem as a classification task and we aim to solve it using only visual cues extracted from videos. To that end, we propose a new method based on a Long-term Recurrent Convolutional Network (LRCN) that incorporates the sequentiality of the information in the model. Results obtained on a dataset of over 37'000 videos published on Facebook show that using our method leads to over 30% improvement in prediction performance over the traditional shallow approaches and can provide valuable insights for content creators.

CLJul 21, 2017
Shallow reading with Deep Learning: Predicting popularity of online content using only its title

Wociech Stokowiec, Tomasz Trzcinski, Krzysztof Wolk et al.

With the ever decreasing attention span of contemporary Internet users, the title of online content (such as a news article or video) can be a major factor in determining its popularity. To take advantage of this phenomenon, we propose a new method based on a bidirectional Long Short-Term Memory (LSTM) neural network designed to predict the popularity of online content using only its title. We evaluate the proposed architecture on two distinct datasets of news articles and news videos distributed in social media that contain over 40,000 samples in total. On those datasets, our approach improves the performance over traditional shallow approaches by a margin of 15%. Additionally, we show that using pre-trained word vectors in the embedding layer improves the results of LSTM models, especially when the training set is small. To our knowledge, this is the first attempt of applying popularity prediction using only textual information from the title.

CVJun 6, 2017
Deep Alignment Network: A convolutional neural network for robust face alignment

Marek Kowalski, Jacek Naruniec, Tomasz Trzcinski

In this paper, we propose Deep Alignment Network (DAN), a robust face alignment method based on a deep neural network architecture. DAN consists of multiple stages, where each stage improves the locations of the facial landmarks estimated by the previous stage. Our method uses entire face images at all stages, contrary to the recently proposed face alignment methods that rely on local patches. This is possible thanks to the use of landmark heatmaps which provide visual information about landmark locations estimated at the previous stages of the algorithm. The use of entire face images rather than patches allows DAN to handle face images with large variation in head pose and difficult initializations. An extensive evaluation on two publicly available datasets shows that DAN reduces the state-of-the-art failure rate by up to 70%. Our method has also been submitted for evaluation as part of the Menpo challenge.

SIOct 21, 2015
Predicting popularity of online videos using Support Vector Regression

Tomasz Trzcinski, Przemyslaw Rokita

In this work, we propose a regression method to predict the popularity of an online video based on temporal and visual cues. Our method uses Support Vector Regression with Gaussian Radial Basis Functions. We show that modelling popularity patterns with this approach provides higher and more stable prediction results, mainly thanks to the non-linearity character of the proposed method as well as its resistance against overfitting. We compare our method with the state of the art on datasets containing over 14,000 videos from YouTube and Facebook. Furthermore, we show that results obtained relying only on the early distribution patterns, can be improved by adding social and visual metadata.