Project

AUTOMATED TV METADATA EXTRACTION

Client

TV Control

General info

Client

The varied requirements of different media channels and broadcasters make it challenging to prepare program descriptions.
There are ways to automate the task, but even these require lots of manual effort owing to the preparation of templates and missing metadata.

Project

DLabs.AI built a solution to kill two birds with one stone. First, we looked to extract metadata and possible templates from existing descriptions.
Then, we tackled the creation of the descriptions themselves.

About the project

Problem

It’s tricky to find metadata such as director, actors, year, or genre in a program description. It’s even more problematic to then accurately port that information to a new database.

Solution

We built a hybrid extraction method based on regular expressions, linguistic rules, and statistical language models, using dependency analysis to decompose templates.

Results

We achieved a 1-2% error rate in extracted metadata (depending on the field), with 2.5 metadata items extracted on average per description.

Project duration

5 months

  

Technologies used

Python

NumPy

SpaCy

Jupyter

Pandas

SciPy

MetaLib

  

The path to success

Step 1: Clean the metadata

Improve the metadata for training, finding external sources to fill and validate existing metadata extraction

Step 2: Create a pipeline for metadata extraction

Develop and train algorithms to extract specific information from descriptions

Step 3: Template analysis

Turn the description into an abstract sentence structure to allow us to reuse it within the template itself

See it in action

  

CLIENTS OPINION

format_quote

They have great attention to detail and background
knowledge in the subject.

Alex Chelmis, Director

AI SOLUTIONS WE’RE PROUD OF

See other AI projects that have helped our clients achieve their business goals.

Copy link
Powered by Social Snap