Weighting meinGrün Zielgeometrien from geolocated Social Media posts for measuring greenspace popularity¶

This Notebook contains the complete process to apply weights to meinGrün Zielgeometrien extracted from LBSM data (geolocated Social Media Posts).

Intersection of points and coordinates is not complex. However, there're several potential pitfalls in the process.

Pitfalls:

for improving representativity, we're using different datasources (Twitter, Instagram Flickr)
- all of these have different composition, biases and sources of noise
for coordinate mapping, a primary source of noise are the different geoaccuracies (gps/'latlng')
- flickr has the highest geoaccuracy, while Twitter and Instagram are mostly geoaccuracy 'place'
- this must be taken into account when assigning coordinates to different shapes (we'll be using a SEARCH_RADIUS 25 and 50m)
the overall quantity of photos is spread quite differently
- Instagram makes up 80% of data, Flickr about 20% and Twitter 1%
- normalization of three datasources is necessary, so that each source has a balanced impact on final result
- Twitter may introduce biases as to the low amount of data, which can be taken into account by reducing Twitter overall impact
normalization on area is necessary, otherwise large shapes would dominate because of their higher likeliness of intersecting with many coordinates
- area must be calculated for different search distances used per origin:
  - 50m buffer area for Twitter/Instagram coordinates
  - 25m buffer area for Flickr coordinates
long tail: typically, Social Media posts peak at few places; but we're also interested in the 'long tail of data' (the other 80% of less frequented places)
- we can apply log or sqrt to flatten this peak and improve visibility of less frequented places
finally we need to classify and label data into arbritary classes
- after normalization, absolute numbers are meaningless, therefore we can interpolate results for each shape to 1 to 1000 range, for improving comparability across runs
- based on 1 to 1000 range, we can apply natural_breaks or other classifiers to label bins of common classes (e.g. 6 bins are used herein)

Lets first start with importing the necessary dependencies.

First, make sure you have these packages installed. Dependency management in python can be tricky. The following instructions are based on conda package manager (e.g. miniconda). By pinning one package (holoviews), we can make sure that 1) compatible package versions are installed for all other packages and 2) the code written herein is compatible.

conda create -n jupyter_env_meingruen python=3.7 -c pyviz -c conda-forge
conda activate jupyter_env_meingruen
conda config --env --set channel_priority strict
conda config --show channel_priority # verify
conda install -c pyviz -c conda-forge geoviews "bokeh<2.0.0" seaborn shapely "holoviews=1.12.7" geopandas matplotlib jupyterlab descartes mapclassify

Optional:

conda install -c pyviz -c conda-forge jupyter_contrib_nbextensions tagmaps

Alternatively, to create an env with the latest versions of bokeh, holoviews, geoviews, use the following approach:

conda create -n jupyter_env_meingruen python=3.7 -c "pyviz/label/dev" -c conda-forge
conda activate jupyter_env_meingruen
conda config --env --set channel_priority strict
conda install -c "pyviz/label/dev" -c conda-forge geoviews bokeh seaborn shapely holoviews geopandas matplotlib jupyterlab descartes mapclassify tagmaps jupyter_contrib_nbextensions

import math
import csv
import fiona
import geoviews as gv
import geoviews.feature as gf
import mapclassify as mc
import seaborn as sns
import datetime as dt
import geopandas as gp
import pandas as pd
import holoviews as hv
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from rtree import index
from pathlib import Path
from shapely.geometry import shape
from pyproj import Transformer, CRS, Proj
from shapely.ops import transform
from shapely.geometry import Point, Polygon, MultiPolygon
from shapely.geometry.collection import GeometryCollection
from holoviews import dim
from holoviews.operation.datashader import datashade, shade, dynspread, rasterize
from geopandas.tools import sjoin
from matplotlib.patches import Patch
# enable shapely.speedups which makes some of the spatial queries running faster.
import shapely.speedups
shapely.speedups.enable()
from cartopy import crs as ccrs
from collections import defaultdict
from collections import namedtuple
from tagmaps.classes.utils import Utils
from IPython.display import clear_output, Markdown, display
hv.notebook_extension('bokeh')

Plot used package versions:

import pkg_resources
root_packages = [
    'geoviews', 'geopandas', 'pandas', 'numpy', 'descartes',
    'matplotlib', 'shapely', 'cartopy', 'holoviews',
    'mapclassify', 'fiona', 'bokeh', 'pyproj', 'ipython',
    'jupyterlab', 'tagmaps']
root_packages.sort(reverse=True)
root_packages_list = []
for m in pkg_resources.working_set:
    if m.project_name.lower() in root_packages:
        root_packages_list.append([m.project_name, m.version])

display(pd.DataFrame(
            root_packages_list,
            columns=["package", "version"]
        ).set_index("package").transpose())
display(Markdown('---'))

Because we're going to display pretty large maps, graphics and interactive visualizations in this notebook, the following line may improve jupyter speed.

%config Completer.use_jedi = False

Define Input Parameters¶

This section contains the only manual code edits that are needed to run this notebook.

ZIELGEOMETRIEN_DATA: this is the geojson containing the Zielgeometrien (polygons), social media posts will be used to calculate weights for each polygon in this geojson
LBSM_INPUT_DATA: this is the social media data used to calculate weights, e.g. a collection of post-records with lat-lng coordinates this file was generated using tagmaps package, which applies filters and cleans up noisy social media raw data

The files should be places in a subfolder called "./01_Input/" (relative to this notebook).

# TARGETSHAPE_VERSION = 'v4.1'
# TARGETSHAPE_VERSION = 'v170919'
TARGETSHAPE_VERSION = 'v5'
INTERSECT_VERSION = 'v1.2'

## Heidelberg Offiziell
# LEVEL = "L0"
# TARGET_SHAPE_ID_COL = "id"
# GREEN_TARGET_COL = None
# ZIELGEOMETRIEN_DATA = "GeoJSON_Heidelberg_v05/greentargets_hd.json"

## Level 0:
LEVEL = "L0"
TARGET_SHAPE_ID_COL = "TARGET_ID"
GREEN_TARGET_COL = "TARGET_green_target"
ZIELGEOMETRIEN_DATA = "GeoJSON_Dresden_v05/targets.json"
# ZIELGEOMETRIEN_DATA = "GeoJSON_Heidelberg_v05/targets.json"


## Level 1 (Wege):
# LEVEL = "L1"
# TARGET_SHAPE_ID_COL = "WAY_ID"
# GREEN_TARGET_COL = "PARENT_TARGET_green_target"
# ZIELGEOMETRIEN_DATA = "GeoJSON_Dresden_4.1/targetParts_Wege.json"
# ZIELGEOMETRIEN_DATA = "GeoJSON_Heidelberg/targets_Wege.json"

## Level 2 (Wege LC):
# LEVEL = "L2"
# TARGET_SHAPE_ID_COL = "LC_ID"
# GREEN_TARGET_COL = "PARENT_TARGET_green_target"
# ZIELGEOMETRIEN_DATA = "GeoJSON_Dresden_4.1/targetParts_Wege_LC.json"
# ZIELGEOMETRIEN_DATA = "GeoJSON_Heidelberg/targets_Wege_LC.json"

## Level 3 (Wege LULC):
# LEVEL = "L3"
# TARGET_SHAPE_ID_COL = "LULC_ID"
# GREEN_TARGET_COL = "PARENT_TARGET_green_target"
# ZIELGEOMETRIEN_DATA = "GeoJSON_Dresden_4.1/targetParts_Wege_LULC.json"
# ZIELGEOMETRIEN_DATA = "GeoJSON_Heidelberg/targets_Wege_LULC.json"

# LBSM_INPUT_DATA = "Output_cleaned_Dresden_FlickrOnly.csv"
# LBSM_INPUT_DATA = "Output_cleaned_Heidelberg_FlickrOnly.csv"
# LBSM_INPUT_DATA = "2019-09-03_meinGruen_Heidelberg.csv"
LBSM_INPUT_DATA = "2019-09-02_meinGruen_Dresden.csv"

def get_str_formatted_today():
    """Returns date as string (YYYY-mm-dd)"""
    today = dt.date.today()
    today_str = today.strftime("%Y-%m-%d")
    return today_str
TODAY = get_str_formatted_today()
display(Markdown(f'**Last execution date: {TODAY}**'))

Define manual boundaries for map zooms

Define initial extent where the map should be zoomed to. If empty, the map will show the full boundary of the provided osm-graph (which may result in slow performance of the final map). To select boundary lat/lng coordinates, use (online tool link).

# Dresden Altstadt
LIM_LNG_MAX_1 = 13.750784
LIM_LNG_MIN_1 = 13.730367
LIM_LAT_MAX_1 = 51.055349
LIM_LAT_MIN_1 = 51.048746

# Dresden Center
LIM_LNG_MAX_2 = 13.788357
LIM_LNG_MIN_2 = 13.706689
LIM_LAT_MAX_2 = 51.060386
LIM_LAT_MIN_2 = 51.033973

# Dresden Neustadt:
LIM_LNG_MAX_3 = 13.782692
LIM_LNG_MIN_3 = 13.701024
LIM_LAT_MAX_3 = 51.087890
LIM_LAT_MIN_3 = 51.061492

# Heidelberg Center
LIM_LNG_MAX_4 = 8.73
LIM_LNG_MIN_4 = 8.68
LIM_LAT_MAX_4 = 49.420
LIM_LAT_MIN_4 = 49.404

# Total Bounds Dresden
LIM_LNG_MAX_5 = 14.0876945
LIM_LNG_MIN_5 = 13.46475261
LIM_LAT_MAX_5 = 51.24305035
LIM_LAT_MIN_5 = 50.90449772

Define the search max radius (in meters) to be used when intersecting polys with points and provide a name (used for writing data):

SEARCH_RADIUS_MAX =  50
CITY_NAME = "Dresden"
# CITY_NAME = "Heidelberg"

Load Shape Data¶

Load GeoJSON

geom_input_filepath = Path.cwd() / "01_Input" / ZIELGEOMETRIEN_DATA

%%time
meingruen_targetgeom_shapes = gp.read_file(geom_input_filepath)
total_bounds = meingruen_targetgeom_shapes.to_crs("epsg:4326").total_bounds
print(f'Total bounds: {total_bounds}')
print(f'Number of shapes: {len(meingruen_targetgeom_shapes)}')
print(f'{meingruen_targetgeom_shapes.crs}')

Total bounds: [13.4753296  50.91018206 14.0876945  51.24119145]
Number of shapes: 26707
epsg:4326
CPU times: user 7.11 s, sys: 312 ms, total: 7.42 s
Wall time: 7.42 s

Assign bounds to globa variables so we can use those later.

LIM_LNG_MIN, LIM_LAT_MIN, LIM_LNG_MAX, LIM_LAT_MAX = total_bounds

Project from WGS1984 (4326) to World Mercator (epsg:3395)

# gdf_targetgeom_shapes = meingruen_targetgeom_shapes.to_crs({'init': 'epsg:3395'})
gdf_targetgeom_shapes = meingruen_targetgeom_shapes.to_crs('epsg:3395')

Preview Shape Table

gdf_targetgeom_shapes.head()

Remove unnecessary columns

There're many columns in this dataset that are of no interest to the computation in this notebook. Have a look whats there:

list(gdf_targetgeom_shapes.columns.values)

['OBJECTID',
 'TARGET_ID',
 'TARGET_green_target',
 'TARGET_kontrolle_noetig',
 'TARGET_tats_zugang',
 'TARGET_stadt_area_perc',
 'TARGET_stadt_and_nbk_area_perc',
 'TARGET_tsp_enthalten',
 'TARGET_spielplatz_enthalten',
 'TARGET_park_enthalten',
 'TARGET_tsp_only',
 'TARGET_access',
 'TARGET_gemeinde_DD',
 'TARGET_name',
 'TARGET_names',
 'TARGET_type',
 'TARGET_types',
 'TARGET_origin',
 'TARGET_origins',
 'TARGET_AREA',
 'TARGET_LENGTH',
 'SHAPE_Length',
 'SHAPE_Area',
 'TARGET_access2',
 'geometry']

Drop columns that are not needed:

def try_drop_cols_df(gdf, drop_cols):
    """Drop columns from pandas dataframe with try..except"""
    for drop_col in drop_cols:
        # apply drop
        try:
            gdf.drop(
                columns=[drop_col], inplace=True)
        except KeyError:
            pass
        
drop_list = ['class0_ratio', 'class1_ratio', 'class2_ratio', 'class3_ratio', 
             'class4_ratio', 'class4_ratio', 'class4_ratio', 'class4_ratio',
             'classNoData_ratio', 'class0_area', 'class1_area', 'class2_area', 
             'class3_area', 'class4_area', 'classNoData_area', 'SHAPE_Length',
             'SHAPE_Area', 'WAY_AREA', 'WAY_LENGTH', 'PARENT_TARGET_ID',
             'PARENT_TARGET_gemeinde_DD','PARENT_WAY_ID', 'LULC_origin', 
             'LULC_origin', 'LULC_access', 'LULC_name', 'LULC_type', 
             'LULC_type_ger', 'LULC_rank', 'LULC_AREA', 'LULC_LENGTH', 
             'LC_LENGTH', 'LC_AREA', 'class_name', 'class_code',
             'TARGET_access', 'TARGET_name', 'TARGET_names', 'TARGET_type',
             'TARGET_types', 'TARGET_AREA', 'TARGET_LENGTH', 
             'text_1', 'text_3', 'text_2']

# apply drop
try_drop_cols_df(gdf_targetgeom_shapes, drop_list)

Check the new list of remaining columns

list(gdf_targetgeom_shapes.columns.values)

['OBJECTID',
 'TARGET_ID',
 'TARGET_green_target',
 'TARGET_kontrolle_noetig',
 'TARGET_tats_zugang',
 'TARGET_stadt_area_perc',
 'TARGET_stadt_and_nbk_area_perc',
 'TARGET_tsp_enthalten',
 'TARGET_spielplatz_enthalten',
 'TARGET_park_enthalten',
 'TARGET_tsp_only',
 'TARGET_gemeinde_DD',
 'TARGET_origin',
 'TARGET_origins',
 'TARGET_access2',
 'geometry']

Optional: Preview Plot with Geopandas directly¶

gdf_targetgeom_shapes.geometry.plot(figsize=(20,10))

<matplotlib.axes._subplots.AxesSubplot at 0x7fefe9446e50>

Print bounds of target shapes (World Mercator Porjection)

gdf_targetgeom_shapes.geometry.total_bounds

array([1500066.82912089, 6572236.50305387, 1568234.97819121,
       6630728.67034656])

Projections UTM¶

There are several projections used in this notebook:

Web Mercator WGS1984 (epsg:4326) for input data/output data in decimal degrees
World Mercator (epsg:3395) for map display in Geoviews/Bokeh
UTM projected coordinate system for local distance calculations

For accurate calculations in meters distance, we need to project data to UTM. Tagmaps package has a methods that will return the best UTM projection based on a lat-lng coordinate:

def get_poly_from_bounds(bounds):
    '''Add small buffer around path bounds''' 
    xmin, ymin, xmax, ymax = bounds
    bbox_polygon = Polygon([[xmin, ymax], [xmax, ymax], [xmax, ymin], [xmin, ymin]])
    return bbox_polygon

bound_poly = get_poly_from_bounds(total_bounds)
print(f'input_lon_center: {bound_poly.centroid.x}')
print(f'input_lat_center: {bound_poly.centroid.y}')
# Auto detection epsg_code:
epsg_code = Utils._convert_wgs_to_utm(
            bound_poly.centroid.x, bound_poly.centroid.y)
print(epsg_code)

input_lon_center: 13.781512049000042
input_lat_center: 51.07568675400005
32633

Or, specify manually:

crs_proj = f"epsg:{epsg_code}"
crs_wgs = "epsg:4326"

# define Transformer ahead of time
# with xy-order of coordinates
proj_transformer = Transformer.from_crs(
    crs_wgs, crs_proj, always_xy=True)

def project_geometry(geom):
    """Project geometries using shapely and proj.Transform"""
    geom_proj = transform(proj_transformer.transform, geom)
    return geom_proj

# report
print(f'Target EPSG Code: {epsg_code}')
print(f'Target projection: {crs_proj}')

Target EPSG Code: 32633
Target projection: epsg:32633

Optional: Plot Preview in Geoviews with Bokeh Backend¶

# convert to WGS1984: epsg:4326
gdf_targetgeom_shapes.to_crs(crs_proj, inplace=True)

# convert to Geoviews Shapes
gv_shapes = gv.Polygons(
    [gv.Shape(feat) for feat in gdf_targetgeom_shapes.geometry if isinstance(feat, (Polygon, MultiPolygon))],
    crs=ccrs.epsg(epsg_code))

Plot Rasterized static with datashade

use datashade, faster display but lower quality

datashade(gv_shapes).opts(width=800, height=480)

Plot interactive rasterized

%%opts Image [colorbar=True clipping_colors={'NaN': (0, 0, 0, 0)}]

original = rasterize(gv_shapes, precompute=True)
clipped = dynspread(datashade(gv_shapes))
# plot
gv.tile_sources.EsriImagery * \
clipped.opts(
    width=800,
    height=480,
    alpha=0.8)

Intersection¶

For appending points to polygons there're several options available:

use geopandas.sjoin (unfortunately, no distance available). Example
use spatial rtree and best fit custom method
use from shapely.ops import nearest_points

Since it is the fastest and most versatile, we'll explore the second option.

1. Load LBSM Data

Load CSV of LBSM points to Pandas dataframe, two formats available

Raw data (Instagram, Flickr, Twitter)
Cleaned Format merged based on UPL (only use Flickr because Instagram/Twitter are not available in highest geo-accuracy).

First, detect the format.

file_input_path = Path.cwd() / "01_Input" / LBSM_INPUT_DATA
with open(file_input_path, 'r') as infile:
    reader = csv.DictReader(infile)
    fieldnames = reader.fieldnames
# there're two different formats of input data
# cleaned_posts (UPL)
# and raw data (UP)
# cleaned posts have a slightly different format
# which is detected here based on the CSV's header
if 'latitude' in fieldnames:
    raw_format = True
    usecols = ['origin_id', 'latitude', 'longitude', 'user_guid', 'post_create_date', 'post_publish_date']
    dtypes={'origin_id': str,'latitude': float, 'longitude': float, 'user_guid': str, 'post_create_date': str, 'post_publish_date': str}
elif 'lat' in fieldnames:
    raw_format = False
    usecols = ['origin_id', 'lat', 'lng', 'user_guid', 'post_create_date', 'post_publish_date']
    dtypes={'origin_id': str,'lat': float, 'lng': float, 'user_guid': str, 'post_create_date': str, 'post_publish_date': str}
else:
    print("Unknown Format")

Load data in dataframe df

%%time
df = pd.read_csv(file_input_path, usecols=usecols, dtype=dtypes, encoding='utf-8')
print(f'Loaded {len(df)} posts.')

Loaded 991117 posts.
CPU times: user 4 s, sys: 594 ms, total: 4.59 s
Wall time: 4.63 s

For Flickr, post_create_date and post_publish_date is available, for Instagram and Twitter, only post_publish_date is available.

Combine both with a preference on post_create_date
wrapped in try..except for multiple execution handling
rename final columns

try:
    df['post_create_date'].fillna(df['post_publish_date'], inplace=True)
    df.drop(columns=['post_publish_date'], inplace=True)
    df.rename(columns={'post_create_date': 'post_date'}, inplace=True)
except KeyError:
    pass

df.head()

Optional: Get some statistics for origin

1 = Instagram
2 = Flickr
3 = Twitter

plt.rcParams.update({'xtick.labelsize': 'small'})
plt.rcParams.update({'ytick.labelsize': 'small'})
plt.rc('legend', fontsize=10)

sns.countplot(x='origin_id', hue='origin_id', data=df)

<matplotlib.axes._subplots.AxesSubplot at 0x7fefac201cd0>

size_count = df.groupby('origin_id').size()
total_count = len(df)
for key, value in size_count.items():
    print(f'Origin ID {key} with {value} posts, total percentage is: {value/(total_count/100):.2f}%')

Origin ID 1 with 806181 posts, total percentage is: 81.34%
Origin ID 2 with 183249 posts, total percentage is: 18.49%
Origin ID 3 with 1687 posts, total percentage is: 0.17%

2. Convert pandas dataframe to geopandas geodataframe

def make_point(row):
    if raw_format:
        return Point(row.longitude, row.latitude)
    return Point(row.lng, row.lat)

# Go through every row, and make a point out of its lat and lon
points = df.apply(make_point, axis=1)

# Make a new GeoDataFrame
# using the data from our old df
# but also adding in the geometry we just made
lbsm_data_gdf = gp.GeoDataFrame(df, geometry=points, crs=crs_wgs)
# project to UTM
lbsm_data_gdf = lbsm_data_gdf.set_geometry(lbsm_data_gdf.geometry.apply(project_geometry))
# alternative for newer geopandas versions:
# lbsm_data_gdf.to_crs(crs_proj, inplace=True)

lbsm_data_gdf.head()

Optional: Preview Single Target Geometry Graph Polygon Intersection¶

Lets test the inteserction for a small sample

1 polygon from meiNGrün Target Geometries
1 lbsm post

a) Get one polygon from meinGrün Target Geometries

modify the number to get different samples

# Get row for specific ID
id = 9
# id = 20
# id = 26
# id = 29
# id = 42
gdf_targetgeom_shapes.iloc[id].geometry

It is possible to get bounds of the line segment by:

gdf_targetgeom_shapes.iloc[id].geometry.bounds

(415010.3708631706, 5655820.556908655, 415043.3199936065, 5655852.084404823)

b) Prepare poly geoviews layer

gv_sample_poly = gv.Shape(
    gdf_targetgeom_shapes.iloc[id].geometry,
    label=f'meinGrün Target Shape', crs=ccrs.epsg(epsg_code))
gv_sample_poly_buf_50m = gv.Shape(
    gdf_targetgeom_shapes.iloc[id].geometry.buffer(50, resolution=3, cap_style=1, join_style=1, mitre_limit=5.0),
    label=f'Search Radius 50m', crs=ccrs.epsg(epsg_code))
gv_sample_poly_buf_25m = gv.Shape(
    gdf_targetgeom_shapes.iloc[id].geometry.buffer(25, resolution=3, cap_style=1, join_style=1, mitre_limit=5.0),
    label=f'Search Radius 25m', crs=ccrs.epsg(epsg_code))

c) Optional: Test Prepare point geoviews layer (slow)

%%time
def add_buffer(geom, buffer_num=50, return_poly=False):
    '''Add small buffer around poly bounds
    Buffer defaults to: 50m
    ''' 
    xmin, ymin, xmax, ymax = geom.bounds
    xmin = xmin - buffer_num
    ymin = ymin - buffer_num
    xmax = xmax + buffer_num
    ymax = ymax + buffer_num
    if return_poly:
        bbox_polygon = Polygon(
            [[xmin, ymax],
             [xmax, ymax],
             [xmax, ymin],
             [xmin, ymin]])
        return bbox_polygon
    return xmin, ymin, xmax, ymax

# adjust number for sample polygon
sample_polygon = gdf_targetgeom_shapes.iloc[id]
xmin, ymin, xmax, ymax = add_buffer(
    sample_polygon.geometry)

# intersect/clip points based on bbox input
# this is very slow! we'll use an improved version for mass intersection
sample_point_gdf = lbsm_data_gdf.cx[xmin:xmax, ymin:ymax]

print(f'Intersections found for id {id}: {len(sample_point_gdf)} posts.')

# for automatic selection of polygon with many intersections
# use code block below

# max_intersect = 0
# max_id_ref = None
# for id in range(1,30):
#     # adjust number for sample polygon
#     sample_polygon = gdf_targetgeom_shapes.iloc[id]
#     xmin, ymin, xmax, ymax = add_buffer(
#         sample_polygon.geometry)
#     
#     # intersect/clip points based on bbox input
#     sample_point_temp = lbsm_data_gdf.cx[xmin:xmax, ymin:ymax]
#     number_of_intersections = len(sample_point_temp)
#     if number_of_intersections > max_intersect:
#         max_id_ref = id
#         max_intersect = number_of_intersections
#         sample_point_gdf = sample_point_temp
#         
# # report
# print(f'Max Intersections found for id {max_id_ref}: {len(sample_point_gdf)} posts.')

Intersections found for id 9: 1 posts.
CPU times: user 13 s, sys: 359 ms, total: 13.4 s
Wall time: 13.4 s

d) Prepare Geoviews Layer for selected Sample Points

sample_point_selection = gv.Points(sample_point_gdf, crs=ccrs.epsg(epsg_code))
# sample_point_selection_25m = gv.Points(sample_point_gdf.loc[sample_point_gdf['origin_id'] == "2"], crs=ccrs.epsg(epsg_code))
# sample_point_selection_50m = gv.Points(sample_point_gdf.loc[sample_point_gdf['origin_id'].isin(["1", "3"])], crs=ccrs.epsg(epsg_code))

e) Output sample selection map

def set_active_tool(plot, element):
    # enable wheel_zoom by default
    plot.state.toolbar.active_scroll = plot.state.tools[0]

# the Overlay will automatically extent to data dimensions,
# the polygons of HDBSCAN cover the city area:
# since we want the initial zoom to be zoomed in to the selected sample geometry
# we need to supply xlim and ylim
# The Bokeh renderer only support Web Mercator Projection
# we therefore need to project Großer Garten Bounds first
# from WGS1984 to Web Mercator
sample_bottomleft = proj_transformer.transform(
    xmin, ymin)
sample_topright = proj_transformer.transform(
    xmax, ymax)

# to show 200m radius
# centre_line = pyproj.transform(
#     pyproj.Proj(init=f'EPSG:{epsg_code}'),
#     pyproj.Proj(init='EPSG:3857'),
#     gdf_targetgeom_shapes.iloc[max_id_ref].geometry.centroid.x,
#     gdf_targetgeom_shapes.iloc[max_id_ref].geometry.centroid.y)

hv.Overlay(
    gv.tile_sources.EsriImagery * \
    gv_sample_poly.opts(line_width=2, fill_alpha=0.5, line_color='red', show_legend=True) * \
    gv_sample_poly_buf_25m.opts(line_width=2, fill_alpha=0, line_color='red', line_dash='dotted', show_legend=True) * \
    gv_sample_poly_buf_50m.opts(line_width=2, fill_alpha=0, line_color='orange', line_dash='dotted', show_legend=True) * \
    # sample_point_selection_50m.opts(line_color='orange', line_width=3, fill_alpha=0.5, tools=['hover'], show_legend=True) * \
    # sample_point_selection_25m.opts(color='red', line_width=3, fill_alpha=0.5, tools=['hover'], show_legend=True) * \
    # hv.Ellipse(centre_line[0], centre_line[1], 200, label='200m Buffer').opts(line_color='red', line_width=3, show_legend=True) * \
    sample_point_selection.opts(tools=['hover'], size=5, color='yellow') # * \
    # gv_testpoint
    ).opts(
        width=800,
        height=480,
        finalize_hooks=[set_active_tool],
        xlim=(sample_bottomleft[0], sample_topright[0]),
        ylim=(sample_bottomleft[1], sample_topright[1])
    )

Intersect test using rtree¶

We'll create an rtree with boxes for each line in the OSM Graph. This will be the most expensive operation, afterwards we can use Shapely's nearest function to snap points to a small number of possible lines. First, construct RTree for the geometry from osm paths. We'll extend the boundary by a small buffer surrounding the path so that we can catch points at both ends.

adjust buffer_num to your needs, minimum buffer_num should be the maximum SEARCH_RADIUS beeing used

%%time
# Populate R-tree index with bounds of polygons
idx = index.Index()
for pos, poly in enumerate(gdf_targetgeom_shapes.geometry):
    idx.insert(pos, add_buffer(poly, buffer_num=50))

CPU times: user 3.19 s, sys: 31.2 ms, total: 3.22 s
Wall time: 3.23 s

Intersect our testpoint with the rtree

Next, we intersect our testpoint with the rtree and select those paths that are near the testpoint. We can define a search_radius. This is very fast because the heavy lifting is done by the spatial rtree intersection (idx.intersection((testpoint.coords[0]))) while the expensive distance calculation from shapely (gdf.geometry[i].project(testpoint)) is only done for the few paths objects that are preselected based on their boundary.

Important: Modify SEARCH_RADIUS (m) to test output for various distances

Since Instagram and Twitter have lower Geoaccuracies on average, we use a large Search Radius of 50m, wheras for Flickr we'll use 25m.

SEARCH_RADIUS_ORIGIN = {
    "1":50,"2":25,"3":50
}

lbsm_data_gdf.head()

%%time
# Query testpoint to see which polygon it is in
# using first Rtree index, then Shapely geometry's within
# SEARCH_RADIUS =  25
# to look or other points here, override id:
id = 101
lbsm_post = lbsm_data_gdf.iloc[id]
SEARCH_RADIUS = SEARCH_RADIUS_ORIGIN.get(lbsm_post.origin_id)

testpoint = lbsm_post.geometry
gv_testpoint = gv.Points(
    [testpoint], crs=ccrs.epsg(epsg_code), label='Testpoint'
    ).opts(tools=['hover'], size=5, color='yellow')
# calculate boundary to show
xmin, ymin, xmax, ymax = add_buffer(lbsm_post.geometry)
sample_bottomleft = proj_transformer.transform(
    xmin, ymin)
sample_topright = proj_transformer.transform(
    xmax, ymax)

# with rtree only
poly_idx_rtree = [i for i in idx.intersection((testpoint.coords[0]))]

# with search radius
poly_idx = [i for i in idx.intersection((testpoint.coords[0]))
            if gdf_targetgeom_shapes.geometry[i].distance(testpoint) < SEARCH_RADIUS]
            #if gdf.geometry[i].interpolate(gdf.geometry[i].project(testpoint)).distance(testpoint) > search_radius]

# without search radius, extact intersection
poly_idx_exact = [i for i in idx.intersection((testpoint.coords[0]))
                   if gdf_targetgeom_shapes.geometry[i].intersects(testpoint)]

# report
print(f'{len(poly_idx_rtree)} preselected polygons through rtree')
print(f'{len(poly_idx)} potentially relevant polygons found (within search radius {SEARCH_RADIUS}).')
print(f'{len(poly_idx_exact)} exact intersection polygons found.')

7 preselected polygons through rtree
4 potentially relevant polygons found (within search radius 50).
1 exact intersection polygons found.
CPU times: user 15.6 ms, sys: 0 ns, total: 15.6 ms
Wall time: 200 ms

print(poly_idx)

[14948, 8372, 8521, 4357]

Select subset of relevant polygons based on id's

gdf_targetgeom_shapes.iloc[poly_idx].head()

prepare geoviews layer for preview

testpoly_sel = gv.Polygons(
    [gv.Shape(feat) for feat in gdf_targetgeom_shapes.iloc[poly_idx].geometry],
    crs=ccrs.epsg(epsg_code),
    label=f'Found {len(poly_idx)} Target Polygons (search radius {SEARCH_RADIUS}m)')
testpoly_sel_exact = gv.Polygons(
    [gv.Shape(feat) for feat in gdf_targetgeom_shapes.iloc[poly_idx_exact].geometry],
    crs=ccrs.epsg(epsg_code),
    label=f'Found {len(poly_idx_exact)} Target Polygons (exact intersection)')

output map (careful: might be slow)

hv.Overlay(
    gv.tile_sources.EsriImagery * \
    # gv_shapes.opts(line_color='white', line_width=1) * \
    testpoly_sel.opts(line_color='yellow', line_width=3, fill_alpha=0, show_legend=True) * \
    testpoly_sel_exact.opts(line_color='red', line_width=3, fill_alpha=0, show_legend=True, line_dash='dotted') * \
    gv_testpoint.opts(tools=['hover'], size=10, color='red', show_legend=True)
    #hv.Ellipse(gv_testpoint.x, gv_testpoint.y, 35, label='200m Buffer').opts(line_color='yellow', line_width=1, show_legend=True)
    ).opts(
        width=800,
        height=480,
        xlim=(sample_bottomleft[0], sample_topright[0]),
        ylim=(sample_bottomleft[1], sample_topright[1]),
        title_format=f"Selected Target Geometries for Testpoint and {SEARCH_RADIUS}m Search Radius",
    )

Intersect all points with grid and rtree¶

We execute this intersection for each point in our LBSM dataset. We update all counts in close polygons from the meiNGrün Target Geometries

1. Add statistics field to input geometries

gdf_targetgeom_shapes.head()

Initialize count structures

They're several things we want to measure per meinGrün target shape:

User_Posts: Number of social media posts in total
User_Days: Counting each user once per day (post timestamp)
User_Post_Locations: Counting each user-location coordinate only once (post latlng)
InstagramCount: Number of Instagram posts in total
FlickrCount: Number of Flickr posts in total
TwitterCount: Number of Twitter posts in total

# init count structures
gdf_targetgeom_shapes['User_Posts'] = 0
gdf_targetgeom_shapes['User_Days'] = [set() for _ in range(len(gdf_targetgeom_shapes))]
gdf_targetgeom_shapes['User_Post_Locations'] = [set() for _ in range(len(gdf_targetgeom_shapes))]
gdf_targetgeom_shapes['InstagramCount'] = 0
gdf_targetgeom_shapes['FlickrCount'] = 0
gdf_targetgeom_shapes['TwitterCount'] = 0
gdf = gdf_targetgeom_shapes

gdf_targetgeom_shapes.head()

%%time

def get_userday(post):
    """Remove time from date-time string, e.g. 2019-01-01"""
    if isinstance(post.post_date, str) and len(post.post_date) >= 10:
        return f'{post.user_guid}:{post.post_date[:10]}'
    return f'{post.user_guid}:1900-01-01'
def get_location(post_geom):
    """Concat lat:lng to string"""
    return f'{post_geom.y}:{post_geom.x}'
def concat_userlocation(user_guid, location_str):
    """Concat user:lat:lng to string"""
    return f'{user_guid}:{location_str}'

    
x = 0
x_i = 0
# get ids from column names
column_id_up = gdf.columns.get_loc("User_Posts")
column_id_ud = gdf.columns.get_loc("User_Days")
column_id_upl = gdf.columns.get_loc("User_Post_Locations")
col_id_instagram = gdf.columns.get_loc('InstagramCount')
col_id_flickr = gdf.columns.get_loc('FlickrCount')
col_id_twitter = gdf.columns.get_loc('TwitterCount')
origin_col_dict = {
    "1":col_id_instagram, "2":col_id_flickr, "3":col_id_twitter}
# initialize dict lookup for location-poly-idx
latlng_dict_polyidx = dict()

total_records = len(lbsm_data_gdf)
for lbsm_record in lbsm_data_gdf.itertuples():
    # print(lbsm_record.geometry)
    # break
    # update status
    x += 1
    msg_text = (
        f'Processed records: {x} ({x/(total_records/100):.2f}%). '
        f'Intersection found: {x_i}')
    if x % 100 == 0:
        clear_output(wait=True)
        print(msg_text)
        
    # get search radius based on currrent posts's origin
    SEARCH_RADIUS = SEARCH_RADIUS_ORIGIN.get(
        lbsm_record.origin_id)
    # we save some processing speed by remembering 
    # repeated locations and sacrificing some ram
    # e.g. spatial intersection is more expensive than dict-lookup
    location_str = get_location(lbsm_record.geometry)
    if location_str in latlng_dict_polyidx:
        poly_idx = latlng_dict_polyidx[location_str]
    else:
        poly_idx = [i for i in idx.intersection((lbsm_record.geometry.coords[0]))
                if gdf.geometry[i].distance(lbsm_record.geometry) < SEARCH_RADIUS]
        # add to dict for improving speed for repetitive locations
        latlng_dict_polyidx[location_str] = poly_idx
    # count intersection
    if len(poly_idx) > 0:
        x_i += 1
        userday = get_userday(lbsm_record)
        userlocation = concat_userlocation(lbsm_record.user_guid, location_str)
        # update poly count column
        # the iloc indexer for Pandas Dataframe is used for integer-
        # location based indexing / selection by position.
        # this may update/increment multiple rows/records
        origin_col = origin_col_dict.get(lbsm_record.origin_id)
        gdf.iloc[poly_idx, origin_col] += 1
        gdf.iloc[poly_idx, column_id_up] += 1
        # the following 2 operations are quite expensive on memory/cpu
        # disable if not needed
        # gdf.iloc[poly_idx, column_id_ud].apply(lambda x: x.add(userday))
        # gdf.iloc[poly_idx, column_id_upl].apply(lambda x: x.add(userlocation))
    
# final status
clear_output(wait=True)
print(msg_text)

Processed records: 991117 (100.00%). Intersection found: 891430
CPU times: user 32min 50s, sys: 25.7 s, total: 33min 15s
Wall time: 33min 4s

Optional: store or load intermediate aggregated dataframe results

Store:

gdf.to_pickle(f"gdf_{CITY_NAME}_{LEVEL}_{TARGETSHAPE_VERSION}_{INTERSECT_VERSION}.pickle")

Load:

gdf = pd.read_pickle(f"gdf_{CITY_NAME}_{LEVEL}_{TARGETSHAPE_VERSION}_{INTERSECT_VERSION}.pickle")

gdf.head()

Preview results table

gdf[[TARGET_SHAPE_ID_COL, 'User_Posts', 'User_Days', 'User_Post_Locations', 'InstagramCount', 'FlickrCount', 'TwitterCount']].head()

Convert Sets to unique counts

gdf['UD'] = gdf.apply(lambda row: len(row.User_Days), axis=1)
gdf['UPL'] = gdf.apply(lambda row: len(row.User_Post_Locations), axis=1)

gdf[[TARGET_SHAPE_ID_COL, 'User_Posts', 'User_Days', 'User_Post_Locations', 'UD', 'UPL', 'InstagramCount', 'FlickrCount', 'TwitterCount']].head()

Remove set column, since they're not needed anymore:

gdf.drop(columns=['User_Post_Locations', 'User_Days'], inplace=True)
gdf.rename(columns={'User_Posts': 'UP'}, inplace=True)
gdf.rename(columns={'InstagramCount': 'UP_Instagram'}, inplace=True)
gdf.rename(columns={'FlickrCount': 'UP_Flickr'}, inplace=True)
gdf.rename(columns={'TwitterCount': 'UP_Twitter'}, inplace=True)

Out final Geodataframe with aggregated measurements per green target shape:

gdf[[TARGET_SHAPE_ID_COL, 'UP', 'UD', 'UPL', 'UP_Instagram', 'UP_Flickr', 'UP_Twitter']].head()

gdf[gdf.UP > 0].head()

Get Histogram

The histogram shows the typical long tail pattern: most shapes have few posts, but a few shapes have a lot of posts.

np.sqrt(np.sqrt(gdf['UP'])).hist(bins=20, alpha=0.5)

<matplotlib.axes._subplots.AxesSubplot at 0x7fef6b4bc810>

Normalize for visualization¶

Absolute counts per shape are not very meaningful, since large shapes have a higher likeliness of intersecting with many posts.

We can normalize using the area (square kilometres) of each target shape)
Furthermore, we can "flatten" the long tail a bit by applying a square root function
these normalized numbers are only relevant based on their relative value, therefore we can stretch results to 1 to 1000 range for comparability across different runs

Calculate buffered Area for search distances

# Get row for specific ID
id = 0
gdf.iloc[id].geometry

gdf.iloc[id].geometry.buffer(50, resolution=3, cap_style=1, join_style=1, mitre_limit=5.0)

print(f'{gdf.iloc[id].geometry.area:.2f} m²')
print(f'50m Search radius: {gdf.iloc[id].geometry.buffer(50, resolution=16, cap_style=1, join_style=1, mitre_limit=5.0).area:.2f} m²')
print(f'25m Search radius: {gdf.iloc[id].geometry.buffer(25, resolution=16, cap_style=1, join_style=1, mitre_limit=5.0).area:.2f} m²')

578.44 m²
50m Search radius: 13701.92 m²
25m Search radius: 5216.68 m²

def m2_to_km2(m_num):
    """Convert square meters to square kilometres"""
    return m_num/1000000

print(f'{m2_to_km2(gdf.iloc[id].geometry.area):.4f} km²')
print(f'50m Search radius: {m2_to_km2(gdf.iloc[id].geometry.buffer(50, resolution=16, cap_style=1, join_style=1, mitre_limit=5.0).area):.4f} km²')
print(f'25m Search radius: {m2_to_km2(gdf.iloc[id].geometry.buffer(25, resolution=16, cap_style=1, join_style=1, mitre_limit=5.0).area):.4f} km²')

0.0006 km²
50m Search radius: 0.0137 km²
25m Search radius: 0.0052 km²

Calculate km²

for all geometries in the dataset
for all 25m buffered geometries in dataset
for all 50m buffered geometries in dataset

def shape_buf(geom, buf_dist_m):
    """Buffer geometry based on m distance"""
    buf_geom = geom.buffer(buf_dist_m, resolution=16, cap_style=1, join_style=1, mitre_limit=5.0)
    return buf_geom

gdf['area_skm'] = gdf['geometry'].apply(lambda x: m2_to_km2(x.area))
# same result:
# gdf['poly_area'] = gdf.geometry.area / 10**6
gdf['area_skm_SR25'] = gdf['geometry'].apply(lambda x: m2_to_km2(shape_buf(x, 25).area))
gdf['area_skm_SR50'] = gdf['geometry'].apply(lambda x: m2_to_km2(shape_buf(x, 50).area))

# gdf[[TARGET_SHAPE_ID_COL, 'UP', 'UD', 'UPL', 'UP_Instagram', 'UP_Flickr', 'UP_Twitter', 'area_skm', 'area_skm_SR25', 'area_skm_SR50']].head()
gdf[[TARGET_SHAPE_ID_COL, 'UP', 'UP_Instagram', 'UP_Flickr', 'UP_Twitter', 'area_skm', 'area_skm_SR25', 'area_skm_SR50']].head()

Normalize

def stretch_norm_dfcolumn(df, column, min_stretch, max_stretch):
    """Interpolates a Geodataframe column's values to a new range"""
    min_val = df[column].min()
    max_val = df[column].max()
    # replace values in-place
    # the lambda function is applied to all values from _column_
    # df.loc[:, column] means: replace values of _column_ in-place (modifying the original dataframe)
    # .loc[row_indexer,col_indexer]
    df.loc[:, column] = df[column].apply(
        lambda x: np.interp(x, (min_val, max_val), (min_stretch, max_stretch)))
    # similar but not recommended:
    # df[column] = np.interp(df[column], (min_val, max_val), (min_stretch, max_stretch))

# it is possible to 'flatten' the unequal distribution of values
# use one time sqrt for a bit of flatten, use two times sqrt for more flatten
gdf['UP_Norm'] = np.sqrt(gdf['UP']/gdf['area_skm'])
# gdf['UP_Norm'] = np.sqrt(np.sqrt(gdf['UP']/gdf['poly_area']))

# interpolate to 1 to 1000 range
stretch_norm_dfcolumn(gdf, 'UP_Norm', 1, 1000)

# print
gdf[[TARGET_SHAPE_ID_COL, 'UP', 'UP_Norm', 'UP_Instagram', 'UP_Flickr', 'UP_Twitter']].sort_values('UP_Norm', ascending=False).head()
# gdf[[TARGET_SHAPE_ID_COL, 'UP', 'UP_Norm', 'UD', 'UPL', 'UP_Instagram', 'UP_Flickr', 'UP_Twitter']].sort_values('UP_Norm', ascending=False).head()

Normalize Origin Influence

It becomes obvious that Instagram dominates results. We want to generate an even result between the 3 data sources.

Our next goal therefore is to allow only

40% Influence by Instagram (reduced because Instagram has low geoaccuracy)
50% Influence by Flickr (increased because Flickr featurss highest geoaccuracy)
10% Influence by Twitter (reduced because twitter features sampling noise)

We can do this by using the information from UP_Instagram, UP_Flickr, and UP_Twitter columns.

Since we've used a Search_Radius of 50m for Instagram and Twitter and a Search_Radius of 25m for Flickr,
we need to normalize count values separately based on origin and buffered area.

Remember the original percentage of the three datasources:

origin_perc_total = dict()
for key, value in size_count.items():
    total_perc = value/(total_count/100)
    print(f'Origin ID {key} with {value} posts, total percentage is: {total_perc:.2f}%')
    origin_perc_total[key] = total_perc

Origin ID 1 with 806181 posts, total percentage is: 81.34%
Origin ID 2 with 183249 posts, total percentage is: 18.49%
Origin ID 3 with 1687 posts, total percentage is: 0.17%

Thus, we need to perform a two step normalization:

first, normalize based on total percentage of each datasource
second, apply manuall influence flags (40%, 50%, 10%)

def norm_ip_origin(row):
    total_up = row['UP']
    # get one third of the total count of row
    if total_up == 0:
        return 0
    # get total counts per origin
    # normalize per search radius
    # flatten curve using square
    instagram_up = np.sqrt(row['UP_Instagram']/row['area_skm_SR50'])
    flickr_up = np.sqrt(row['UP_Flickr']/row['area_skm_SR25'])
    twitter_up = np.sqrt(row['UP_Twitter']/row['area_skm_SR50'])
    total_up = instagram_up + flickr_up + twitter_up
    # calc perc per origin percentage
    # and apply manual weights
    instagram_up_norm = instagram_up / (origin_perc_total.get("1") / 100) * 0.4
    flickr_up_norm = flickr_up  / (origin_perc_total.get("2") / 100) * 0.5
    twitter_up_norm = twitter_up / (origin_perc_total.get("3") / 100) * 0.1
    sum_norm = instagram_up_norm + flickr_up_norm + twitter_up_norm
    return sum_norm
    
gdf['UP_Norm_Origin'] = gdf.apply(lambda row: norm_ip_origin(row), axis=1)
# gdf['UP_Norm_Origin'] = np.sqrt(gdf['UP_Norm_Origin']/gdf['poly_area'])
stretch_norm_dfcolumn(gdf, 'UP_Norm_Origin', 1, 1000)

Compare difference of UP_Norm and UP_Norm_Origin

# gdf[[TARGET_SHAPE_ID_COL, 'UP', 'UP_Norm', 'UP_Norm_Origin', 'UD', 'UPL', 'UP_Instagram', 'UP_Flickr', 'UP_Twitter']].sort_values('UP', ascending=False).head(20)
gdf[[TARGET_SHAPE_ID_COL, 'UP', 'UP_Norm', 'UP_Norm_Origin', 'UP_Instagram', 
     'UP_Flickr', 'UP_Twitter', 'area_skm', 'area_skm_SR25', 'area_skm_SR50']].sort_values('UP', ascending=False).head(20)

rows with more equal distribution of posts across Flickr, Twitter and Instagram are increased in ranking/weights
rows with unequal distribution of posts towards only one data source (e.g. Instagram) are reduced in their importance

gdf['UP_Norm_Origin'].hist(bins=20, alpha=0.5)

<matplotlib.axes._subplots.AxesSubplot at 0x7fef6cdfb5d0>

# get top poly
top_norm = gdf.sort_values('UP_Norm_Origin', ascending=False).iloc[0]
top_norm_count = gdf.sort_values('UP', ascending=False).iloc[0]

# normalized
top_norm_poly = top_norm.geometry
top_norm_value = top_norm.UP_Norm_Origin

# raw
top_raw_poly_count = top_norm_count.geometry
top_raw_value_count = top_norm_count.UP

# construct geoviews layer
gv_sample_poly_top_norm = gv.Shape(
    top_norm_poly,
    label=f'meinGrün Top Target Polygon ({top_norm_value} Normalized Post Count based on Area and Service Origin)',
    crs=ccrs.epsg(epsg_code))

# construct geoviews layer
gv_sample_poly_top_count = gv.Shape(
    top_raw_poly_count,
    label=f'meinGrün Top Target Polygon ({top_raw_value_count} Total Post Count)',
    crs=ccrs.epsg(epsg_code))

# calculate boundary to show
xmin, ymin, xmax, ymax = add_buffer(top_norm_poly)
sample_bottomleft = proj_transformer.transform(
    xmin, ymin)
sample_topright = proj_transformer.transform(
    xmax, ymax)

hv.Overlay(
    gv.tile_sources.EsriImagery * \
    gv_sample_poly_top_count.opts(line_color='yellow', line_width=3, fill_alpha=0.2, show_legend=True) * \
    gv_sample_poly_top_norm.opts(line_color='red', line_width=3, fill_alpha=0.2, show_legend=True)
    ).opts(
        width=800,
        height=480,
        xlim=(sample_bottomleft[0], sample_topright[0]),
        ylim=(sample_bottomleft[1], sample_topright[1]),
        title_format=f"Selected Target Geometries for Testpoint and {SEARCH_RADIUS}m/50m Search Radius",
    )

Visualize Results¶

Now we can easily pass the GeoDataFrame to a Polygon object and declare the Measure as the first value dimension which it will color by:

Plot Sample Areas¶

if GREEN_TARGET_COL is not None:
    gdf_meingruen_official_targets = gdf.query(
        f'{GREEN_TARGET_COL} == 1')

# make a legend using a dictionary to assign colors
cmap = matplotlib.cm.get_cmap('Greens')
popularity_legend = {
    'very high': cmap(1000/1000), 
    'high': cmap(450/1000),
    'average': cmap(165/1000),
    'low': cmap(56/1000), 
    'very low': cmap(14/1000), 
    'no data': cmap(1)}

def add_patches(legend):
    ax = legend.axes
    handles, labels = ax.get_legend_handles_labels()
    labels.append("LBSM Popularity")
    handles.append(Patch(facecolor="none", fill=False, edgecolor='none', linewidth=0))
    for key in popularity_legend:
            data_key = Patch(edgecolor='none', facecolor=popularity_legend[key])
            handles.append(data_key)
            labels.append(key)
    labels.append("Official green target")
    handles.append(Patch(facecolor="none", fill=False, edgecolor='red', linewidth=0.7))

    legend._legend_box = None
    legend._init_legend_box(handles, labels)
    legend._set_loc(legend._loc)
    legend.set_title(legend.get_title().get_text())
    
def get_map(lim_lng_min, lim_lat_min, lim_lng_max, lim_lat_max, title, gdf, column, show_green_targets = None):
    """Plot area in matplotlib (pyplot).
    Note: Natural Breaks algorithm is used here to classify colors, remove to see raw values"""
    bb_bottomleft = proj_transformer.transform(
        lim_lng_min, lim_lat_min)
    bb_topright = proj_transformer.transform(
        lim_lng_max, lim_lat_max)
    # bb_bottomleft = pyproj.transform(pyproj.Proj(init='EPSG:4326'), pyproj.Proj(init=f'EPSG:{epsg_code}'), lim_lng_min, lim_lat_min)
    # bb_topright = pyproj.transform(pyproj.Proj(init='EPSG:4326'), pyproj.Proj(init=f'EPSG:{epsg_code}'), lim_lng_max, lim_lat_max)
    fig, ax = plt.subplots()
    fig.set_size_inches(18.5, 10.5, forward=True)
    ax.set_title(f'LBSM Target Geometry Weights for {title}', fontsize=20)
    ax.set_xlim([bb_bottomleft[0],bb_topright[0]])
    ax.set_ylim([bb_bottomleft[1],bb_topright[1]])
    
    # plt.legend(handles=patch_list)
    plt.savefig('legend.png', bbox_inches='tight')
    base = gdf.plot(column=column, figsize=(20,10), cmap='Greens', ax=ax, scheme="natural_breaks", label='Normalized User Post Locations (UPL) based on Area', legend = True)
    if show_green_targets:
        lgd = ax.legend(loc='upper right')
        add_patches(lgd)
        gdf_meingruen_official_targets.plot(ax=base, facecolor="none", edgecolor='red', lw=0.7, label='Official green target', legend = True)
        # plot the data

def get_map_b(lim_lng_min, lim_lat_min, lim_lng_max, lim_lat_max, title, gdf, column, show_green_targets = None):
    """Plot area in matplotlib (pyplot).
    Note: Natural Breaks algorithm is used here to classify colors, remove to see raw values"""
    bb_bottomleft = proj_transformer.transform(
        lim_lng_min, lim_lat_min)
    bb_topright = proj_transformer.transform(
        lim_lng_max, lim_lat_max)
    # bb_bottomleft = pyproj.transform(pyproj.Proj(init='EPSG:4326'), pyproj.Proj(init=f'EPSG:{epsg_code}'), lim_lng_min, lim_lat_min)
    # bb_topright = pyproj.transform(pyproj.Proj(init='EPSG:4326'), pyproj.Proj(init=f'EPSG:{epsg_code}'), lim_lng_max, lim_lat_max)
    axes = plt.gca()
    fig = plt.gcf()
    fig.set_size_inches(18.5, 10.5, forward=True)
    # fig.suptitle('LBSM Target Geometry Weights for Dresden Altstadt', fontsize=20)
    axes.set_title(f'LBSM Target Geometry Weights for {title}', fontsize=20)
    axes.set_xlim([bb_bottomleft[0],bb_topright[0]])
    axes.set_ylim([bb_bottomleft[1],bb_topright[1]])
    base = gdf.plot(column=column, figsize=(20,10), cmap='Greens', ax=axes, scheme="natural_breaks", label='Normalized User Post Locations (UPL) based on Area', legend = True)
    if show_green_targets:
        gdf_meingruen_official_targets.plot(ax=base, facecolor="none", edgecolor='red', lw=0.7, label='Official green target', legend = True)
    
    #gdf.to_crs({'init': 'epsg:3395'}).plot(column='UP_Norm', figsize=(20,10), cmap='Reds', ax=axes)

get_map(LIM_LNG_MIN, LIM_LAT_MIN, LIM_LNG_MAX, LIM_LAT_MAX, f'{CITY_NAME} {LEVEL} {TARGETSHAPE_VERSION}', gdf, 'UP_Norm_Origin', show_green_targets=False)

get_map(LIM_LNG_MIN_4, LIM_LAT_MIN_4, LIM_LNG_MAX_4, LIM_LAT_MAX_4, f"Heidelberg Center {LEVEL} {TARGETSHAPE_VERSION}", gdf, 'UP_Norm_Origin', show_green_targets=False)

get_map(LIM_LNG_MIN_1, LIM_LAT_MIN_1, LIM_LNG_MAX_1, LIM_LAT_MAX_1, f"Dresden Altstadt {LEVEL} {TARGETSHAPE_VERSION}", gdf, 'UP_Norm_Origin', show_green_targets=False)

get_map(LIM_LNG_MIN_2, LIM_LAT_MIN_2, LIM_LNG_MAX_2, LIM_LAT_MAX_2, f"Dresden Center {LEVEL} {TARGETSHAPE_VERSION}", gdf, 'UP_Norm_Origin', show_green_targets=False)

get_map(LIM_LNG_MIN_3, LIM_LAT_MIN_3, LIM_LNG_MAX_3, LIM_LAT_MAX_3, f"Dresden Neustadt {LEVEL} {TARGETSHAPE_VERSION}", gdf, 'UP_Norm_Origin', show_green_targets=False)

To save Figures above as SVG, run the following cell after each figure generation

plt.savefig(f"{CITY_NAME.lower()}_{LEVEL}_{TARGETSHAPE_VERSION}_A1.svg", format="svg")

<Figure size 432x288 with 0 Axes>

Optional: Plot Green Areas¶

if GREEN_TARGET_COL is not None:
    gdf_filtered = gdf.query(f'{GREEN_TARGET_COL} == 1').copy()
    # stretch_norm_dfcolumn(gdf_filtered, 'UP_Norm', 1, 1000)
    # get_map(LIM_LNG_MIN_5, LIM_LAT_MIN_5, LIM_LNG_MAX_5, LIM_LAT_MAX_5, "Dresden", gdf_filtered, 'UP_Norm')
    get_map_b(LIM_LNG_MIN_2, LIM_LAT_MIN_2, LIM_LNG_MAX_2, LIM_LAT_MAX_2, "Dresden Center", gdf_filtered, 'UP_Norm')

/home/alex/miniconda3/envs/jupyter_env_meingruen2/lib/python3.7/site-packages/geopandas/plotting.py:532: UserWarning: The GeoDataFrame you are attempting to plot is empty. Nothing has been displayed.
  UserWarning,

Natural Breaks Classification¶

Using pysal, it is possible to update our dataframe using Natural Breaks Classification

see for other available classifiers

# Define the number of classes
n_classes = 6
# Create a Natural Breaks classifier
classifier = mc.NaturalBreaks.make(k=n_classes)
# classifier = mc.HeadTailBreaks.make()
# Classify the data
classifications = gdf[['UP_Norm']].apply(classifier)
gdf['UP_Norm_nb'] = classifications
classifications = gdf[['UP_Norm_Origin']].apply(classifier)
gdf['UP_Norm_Origin_nb'] = classifications

We'll also add bin labels for describing classes:

classes_ref_5 = {
    5 : 'very high', 
    4 : 'high',
    3 : 'average',
    2 : 'low', 
    1 : 'very low',
    0 : 'no data'}

gdf['popularity'] = gdf['UP_Norm_Origin_nb'].map(classes_ref_5)
# make sure that the bin "no data" is only assigned for records without UPL
gdf.loc[(gdf.UP_Norm_Origin > 1) & (gdf.UP_Norm_Origin_nb == 0), 'popularity'] = 'very low'
gdf.loc[(gdf.UP_Norm > 1) & (gdf.UP_Norm_nb == 0), 'UP_Norm_nb'] = 1
gdf.loc[(gdf.UP_Norm_Origin > 1) & (gdf.UP_Norm_Origin_nb == 0), 'UP_Norm_Origin_nb'] = 1

Have a look at the values and where we came from (left to right):

print(gdf[['UP', 'UP_Norm', 'UP_Norm_nb', 'UP_Norm_Origin', 'UP_Norm_Origin_nb', 'popularity']].head(20))
print(gdf[['UP', 'UP_Norm', 'UP_Norm_nb', 'UP_Norm_Origin', 'UP_Norm_Origin_nb', 'popularity']].sort_values('UP_Norm', ascending=False).head(20))

     UP   UP_Norm  UP_Norm_nb  UP_Norm_Origin  UP_Norm_Origin_nb popularity
0     0  1.000000           0        1.000000                  0    no data
1     0  1.000000           0        1.000000                  0    no data
2    20  1.022784           1       85.622447                  2        low
3     6  1.003360           1        4.546010                  1   very low
4    34  1.018698           1        4.314789                  1   very low
5    20  1.005965           1        2.561644                  1   very low
6     4  1.004678           1        2.029526                  1   very low
7     0  1.000000           0        1.000000                  0    no data
8     0  1.000000           0        1.000000                  0    no data
9     1  1.006926           1        1.719634                  1   very low
10   11  1.012518           1       11.643885                  1   very low
11    8  1.008596           1       11.697083                  1   very low
12    4  1.003936           1        7.064460                  1   very low
13   15  1.033078           1        3.932302                  1   very low
14  262  1.020139           1       19.609694                  1   very low
15    9  1.004667           1        2.224343                  1   very low
16    8  1.004624           1        2.123974                  1   very low
17   61  1.026392           1        9.843112                  1   very low
18   40  1.010256           1       58.960824                  1   very low
19    0  1.000000           0        1.000000                  0    no data
          UP      UP_Norm  UP_Norm_nb  UP_Norm_Origin  UP_Norm_Origin_nb  \
15644      2  1000.000000           5        2.305195                  1   
1262       9     3.065271           4        3.746256                  1   
14520  51808     2.902200           4      449.423308                  4   
894        2     2.824401           4       15.117290                  1   
5466   18591     2.083974           3      485.951976                  4   
23986  41149     1.671105           3      947.918761                  5   
4510    4954     1.648291           3      347.736999                  4   
26622   3326     1.633970           3      335.877165                  4   
19256   5400     1.586675           3      243.571850                  3   
15054   3260     1.570367           3      292.392146                  4   
18971  18842     1.532882           3      478.066684                  4   
7608   38238     1.528392           3      546.547712                  4   
21987  45287     1.521795           3     1000.000000                  5   
14390  11379     1.497437           3      330.113942                  4   
9817    1348     1.493129           3      476.273434                  4   
18150   1236     1.449970           3      337.192990                  4   
17382   1796     1.446839           3      265.237267                  3   
23925  44835     1.442822           3      846.766688                  5   
15779   9961     1.418252           3      333.353672                  4   
19513   5213     1.373211           3      297.973015                  4   

      popularity  
15644   very low  
1262    very low  
14520       high  
894     very low  
5466        high  
23986  very high  
4510        high  
26622       high  
19256    average  
15054       high  
18971       high  
7608        high  
21987  very high  
14390       high  
9817        high  
18150       high  
17382    average  
23925  very high  
15779       high  
19513       high

sns.countplot(x='UP_Norm_Origin_nb', hue='UP_Norm_Origin_nb', data=gdf)

<matplotlib.axes._subplots.AxesSubplot at 0x7fef65ffc490>

Filter relevant target geometries

Create geoviews layer for final display. We'll filter segments by a threshold, as a means to only show streets with a minimum of popularity.

We can also filter based on the column TARGET_green_target

# on mixed type input (e.g. 'Polygon', 'GeometryCollection')
# apply filter below to avoid Geopandas Exception
# gdf = gdf.loc[gdf['geometry'].apply(lambda x: isinstance(x, (Polygon, MultiPolygon)))]
if GREEN_TARGET_COL is None:
    gdf_filtered = gdf
else:
    gdf_filtered = gdf.query(f'{GREEN_TARGET_COL} == "true"').copy()
    print(f'Number of total segments: {len(gdf)}. Number of filtered segments: {len(gdf_filtered)}')

Number of total segments: 26707. Number of filtered segments: 5838

gv_shapes_weighted = gv.Polygons(
    gdf_filtered, vdims=['UP_Norm_Origin_nb'], crs=ccrs.epsg(epsg_code))

Final Output:

if you want to output to a html file with full-width-map, remove comment from:
- %%output filename="dresden_streets" and
- responsive=True
if you want to preview inside this jupyter notebook, leave comments for the above two lines and remove comments from:
- width=800
- height=480

%%output filename=f"{TODAY}_{CITY_NAME}_green_targets_{TARGETSHAPE_VERSION}_{LEVEL}_weighted_{INTERSECT_VERSION}"
def set_active_tool(plot, element):
    # enable wheel_zoom by default
    plot.state.toolbar.active_scroll = plot.state.tools[0]

# the Overlay will automatically extent to data dimensions,
# since we want the initial zoom to be zoomed in to the Cite Center
# we need to supply xlim and ylim
# The Bokeh renderer only support Web Mercator Projection
# we therefore need to project Großer Garten Bounds first
# from WGS1984 Decimal Degrees coordinates to Web Mercator

# define Transformer ahead of time
# with xy-order of coordinates
proj_transformer_mercator = Transformer.from_crs(
    "epsg:4326", "epsg:3857", always_xy=True)

bb_bottomleft = proj_transformer_mercator.transform(
    LIM_LNG_MIN, LIM_LAT_MIN)
bb_topright = proj_transformer_mercator.transform(
    LIM_LNG_MAX, LIM_LAT_MAX)
    
hv.Overlay(
    gv.tile_sources.EsriImagery.opts(alpha=0.65) * \
    gv_shapes_weighted.opts(line_width=0.1, cmap='Greens', color_index='UP_Norm_Origin_nb', colorbar=True, fill_alpha=0.9, color_levels=5)
    # gv_shapes_weighted.opts(line_width=1+np.sqrt(dim('UP_Sqr')), cmap='Reds', color_index='UP_Norm', colorbar=True)
    ).opts(
        title_format=f"Weighted meinGrün Target Shapes {TARGETSHAPE_VERSION} for {CITY_NAME} and {LEVEL} based on LBSM user-post-location proximity (based on {len(df)} posts from Flickr, Instagram, and Twitter)",
        responsive=True,
        finalize_hooks=[set_active_tool],
        # width=800,
        # height=480,
        xlim=(bb_bottomleft[0],bb_topright[0]),
        ylim=(bb_bottomleft[1],bb_topright[1])
    )

Export Results (shapefile, CSV)¶

Have a look at the header first:

gdf.head()

Select columns to export

sel_cols = [TARGET_SHAPE_ID_COL, 'geometry', 'UP', 'UP_Flickr', 'UP_Twitter', 'UP_Instagram', 'UD', 'UPL', 'UP_Norm', 'UP_Norm_Origin', 'popularity']

Write to Shapefile

# if not calculated, drop:
if 'UP' not in gdf_filtered.index and 'UD' in sel_cols:
    sel_cols.remove('UD')
if 'UP' not in gdf_filtered.index and 'UPL' in sel_cols:
    sel_cols.remove('UPL')
print(sel_cols)
gdf_filtered_columns = gdf_filtered[sel_cols].copy()

# define column types manually
# gdf_filtered_columns = gdf[sel_cols].copy()
# note that for clarity we rename UP to UPL (UserPostLocation) here
# gdf_filtered_columns.rename(columns = {'UP':'UPL'}, inplace = True)
filename = f"{TODAY}_meingruenshapes_{TARGETSHAPE_VERSION}_{CITY_NAME.lower()}_{LEVEL}_weighted_{INTERSECT_VERSION}"
schema = {'geometry': 'Polygon',
          'properties': {
              TARGET_SHAPE_ID_COL: 'str',
              'UP': 'int',
              'UP_Flickr': 'int',
              'UP_Twitter': 'int',
              'UP_Instagram': 'int',
              # 'UD': 'int',
              # 'UPL': 'int',
              'UP_Norm': 'float',
              'UP_Norm_Origin': 'float',
              'popularity': 'str'}
         }
gdf_filtered_columns.to_crs(crs_wgs).to_file(driver = 'ESRI Shapefile', schema=schema, filename= f"{filename}.shp")

['TARGET_ID', 'geometry', 'UP', 'UP_Flickr', 'UP_Twitter', 'UP_Instagram', 'UP_Norm', 'UP_Norm_Origin', 'popularity']

Write to CSV

if 'geometry' in sel_cols:
    sel_cols.remove('geometry')
sel_headers = [TARGET_SHAPE_ID_COL,'UP', 'UP_Flickr', 'UP_Twitter', 'UP_Instagram', 'UD', 'UPL', 'UP_Norm', 'UP_Norm_Origin', 'popularity']

if 'UP' not in gdf_filtered.index and 'UD' in sel_headers:
    sel_headers.remove('UD')
if 'UP' not in gdf_filtered.index and 'UPL' in sel_headers:
    sel_headers.remove('UPL')
gdf_filtered_columns.to_csv(f'{filename}.csv',
           columns=sel_cols, header=sel_headers,
           index=False)

	OBJECTID	TARGET_ID	TARGET_green_target	TARGET_kontrolle_noetig	TARGET_tats_zugang	TARGET_stadt_area_perc	TARGET_stadt_and_nbk_area_perc	TARGET_tsp_enthalten	TARGET_spielplatz_enthalten	TARGET_park_enthalten	...	TARGET_type	TARGET_types	TARGET_origin	TARGET_origins	TARGET_AREA	TARGET_LENGTH	SHAPE_Length	SHAPE_Area	TARGET_access2	geometry
0	1	dd_EBK-TSP-OSM_67587	true	no	ja	NaN	NaN	yes	yes	None	...	playground	playground	manually_drawn	manually_drawn	578.780178	112.240699	0.001331	7.417259e-08	ja	MULTIPOLYGON (((1523501.280 6594671.717, 15235...
1	2	dd_EBK-TSP-OSM_67588	true	no	ja	NaN	NaN	yes	yes	None	...	playground	playground	manually_drawn	manually_drawn	4195.322701	303.529934	0.003402	5.377974e-07	ja	MULTIPOLYGON (((1521147.347 6596893.407, 15211...
2	3	dd_EBK-TSP-OSM_67589	true	no	ja	NaN	NaN	yes	yes	None	...	playground	playground	manually_drawn	manually_drawn	825.414228	153.918883	0.001952	1.058222e-07	ja	MULTIPOLYGON (((1531817.714 6597981.115, 15318...
3	4	dd_EBK-TSP-OSM_67590	true	no	ja	NaN	NaN	yes	yes	None	...	playground	playground	manually_drawn	manually_drawn	11386.770142	802.102931	0.009132	1.459170e-06	ja	MULTIPOLYGON (((1531103.332 6594220.009, 15311...
4	5	dd_EBK-TSP-OSM_67591	true	no	ja	NaN	NaN	yes	yes	None	...	playground	playground	manually_drawn	manually_drawn	2083.472845	215.032353	0.002520	2.670594e-07	ja	MULTIPOLYGON (((1534570.917 6596327.520, 15345...

	origin_id	latitude	longitude	user_guid	post_date
0	3	51.040557	13.731987	2de2c7b4c046fb9c2a7fcfeaa479356d	2016-06-10 10:46:41
1	1	51.091362	13.697633	145301fdbcf5d5261ff46d63e0962586	2015-10-09 16:11:39
2	1	51.042050	13.797690	df8244ec7295c142668fbcb530b6c2cc	2018-02-06 00:41:07
3	1	51.102810	13.686440	98d6841fdbfef53a01b6247d70bae2bc	2018-03-11 18:48:22
4	1	51.044881	13.735635	5832633cf2de965477d171bdca251eb5	2016-05-16 09:56:44

	origin_id	latitude	longitude	user_guid	post_date	geometry
0	3	51.040557	13.731987	2de2c7b4c046fb9c2a7fcfeaa479356d	2016-06-10 10:46:41	POINT (411103.140 5655099.959)
1	1	51.091362	13.697633	145301fdbcf5d5261ff46d63e0962586	2015-10-09 16:11:39	POINT (408794.664 5660791.445)
2	1	51.042050	13.797690	df8244ec7295c142668fbcb530b6c2cc	2018-02-06 00:41:07	POINT (415711.959 5655188.756)
3	1	51.102810	13.686440	98d6841fdbfef53a01b6247d70bae2bc	2018-03-11 18:48:22	POINT (408033.564 5662078.378)
4	1	51.044881	13.735635	5832633cf2de965477d171bdca251eb5	2016-05-16 09:56:44	POINT (411367.158 5655576.364)

	origin_id	latitude	longitude	user_guid	post_date	geometry
0	3	51.040557	13.731987	2de2c7b4c046fb9c2a7fcfeaa479356d	2016-06-10 10:46:41	POINT (411103.140 5655099.959)
1	1	51.091362	13.697633	145301fdbcf5d5261ff46d63e0962586	2015-10-09 16:11:39	POINT (408794.664 5660791.445)
2	1	51.042050	13.797690	df8244ec7295c142668fbcb530b6c2cc	2018-02-06 00:41:07	POINT (415711.959 5655188.756)
3	1	51.102810	13.686440	98d6841fdbfef53a01b6247d70bae2bc	2018-03-11 18:48:22	POINT (408033.564 5662078.378)
4	1	51.044881	13.735635	5832633cf2de965477d171bdca251eb5	2016-05-16 09:56:44	POINT (411367.158 5655576.364)

	OBJECTID	TARGET_ID	TARGET_green_target	TARGET_kontrolle_noetig	TARGET_tats_zugang	TARGET_stadt_area_perc	TARGET_stadt_and_nbk_area_perc	TARGET_tsp_enthalten	TARGET_spielplatz_enthalten	TARGET_park_enthalten	...	TARGET_origin	TARGET_origins	TARGET_access2	geometry	User_Days	User_Post_Locations
0	1	dd_EBK-TSP-OSM_67587	true	no	ja	NaN	NaN	yes	yes	None	...	manually_drawn	manually_drawn	ja	MULTIPOLYGON (((407862.147 5654808.401, 407881...	{}	{}
1	2	dd_EBK-TSP-OSM_67588	true	no	ja	NaN	NaN	yes	yes	None	...	manually_drawn	manually_drawn	ja	MULTIPOLYGON (((406404.995 5656234.092, 406421...	{}	{}
2	3	dd_EBK-TSP-OSM_67589	true	no	ja	NaN	NaN	yes	yes	None	...	manually_drawn	manually_drawn	ja	MULTIPOLYGON (((413134.935 5656801.489, 413128...	{}	{}
3	4	dd_EBK-TSP-OSM_67590	true	no	ja	NaN	NaN	yes	yes	None	...	manually_drawn	manually_drawn	ja	MULTIPOLYGON (((412645.141 5654440.722, 412654...	{}	{}
4	5	dd_EBK-TSP-OSM_67591	true	no	ja	NaN	NaN	yes	yes	None	...	manually_drawn	manually_drawn	ja	MULTIPOLYGON (((414851.053 5655731.482, 414864...	{}	{}

	OBJECTID	TARGET_ID	TARGET_green_target	TARGET_kontrolle_noetig	TARGET_stadt_and_nbk_area_perc	TARGET_tsp_enthalten	TARGET_spielplatz_enthalten	TARGET_park_enthalten	TARGET_tsp_only	TARGET_gemeinde_DD	TARGET_origin	TARGET_origins	TARGET_access2	geometry
14948	14949	dd_EBK-TSP-OSM_34924	true	no	97.751491	no	no	yes	no	yes	Erweiterte_Blockkarte	Erweiterte_Blockkarte#osm	ja	MULTIPOLYGON (((411333.693 5656602.828, 411333...
8372	8373	dd_EBK-TSP-OSM_30820	false	None	0.030125	no	no	no	no	yes	Erweiterte_Blockkarte	Erweiterte_Blockkarte	None	MULTIPOLYGON (((411426.080 5656657.292, 411422...
8521	8522	dd_EBK-TSP-OSM_36432	true	no	99.819084	no	no	no	no	yes	Erweiterte_Blockkarte	Erweiterte_Blockkarte	ja	MULTIPOLYGON (((411344.446 5656598.169, 411372...
4357	4358	dd_EBK-TSP-OSM_29230	true	no	99.829582	no	no	no	no	yes	Erweiterte_Blockkarte	Erweiterte_Blockkarte	ja	MULTIPOLYGON (((411426.080 5656657.292, 411426...

	TARGET_ID	UP	UP_Instagram	UP_Flickr	UP_Twitter	area_skm	area_skm_SR25	area_skm_SR50
0	dd_EBK-TSP-OSM_67587	0	0	0	0	0.000578	0.005217	0.013702
1	dd_EBK-TSP-OSM_67588	0	0	0	0	0.004193	0.013717	0.027154
2	dd_EBK-TSP-OSM_67589	20	18	1	1	0.000825	0.006623	0.016338
3	dd_EBK-TSP-OSM_67590	6	5	1	0	0.011380	0.027863	0.045837
4	dd_EBK-TSP-OSM_67591	34	34	0	0	0.002082	0.009415	0.020667

	TARGET_ID	UP	UP_Norm	UP_Instagram	UP_Flickr	UP_Twitter
15644	dd_EBK-TSP-OSM_48946	2	1000.000000	2	0	0
1262	dd_EBK-TSP-OSM_20340	9	3.065271	9	0	0
14520	dd_EBK-TSP-OSM_29348	51808	2.902200	51095	712	1
894	dd_OSM_65381	2	2.824401	0	2	0
5466	dd_EBK-TSP-OSM_29584	18591	2.083974	17449	1139	3

	TARGET_ID	UP	UP_Norm	UP_Norm_Origin	UP_Instagram	UP_Flickr	UP_Twitter	area_skm	area_skm_SR25	area_skm_SR50
23209	dd_EBK-TSP-OSM_33912	65434	1.211277	401.981502	57421	7994	19	0.031387	0.052489	0.075882
14948	dd_EBK-TSP-OSM_34924	58594	1.177179	442.081517	53827	4640	127	0.039964	0.094414	0.142717
14520	dd_EBK-TSP-OSM_29348	51808	2.902200	449.423308	51095	712	1	0.000307	0.004071	0.011752
21987	dd_EBK-TSP-OSM_37884	45287	1.521795	1000.000000	39994	5210	83	0.003561	0.011325	0.023009
23925	dd_EBK-TSP-OSM_39622	44835	1.442822	846.766688	41352	3405	78	0.004896	0.014049	0.027122
23986	dd_EBK-TSP-OSM_23946	41149	1.671105	947.918761	38049	3022	78	0.001956	0.009766	0.021493
7608	dd_EBK-TSP-OSM_19479	38238	1.528392	546.547712	36116	2105	17	0.002932	0.011802	0.024593
7389	dd_EBK-TSP-OSM_39936	34332	1.239266	501.121620	31420	2876	36	0.012841	0.026127	0.043333
8521	dd_EBK-TSP-OSM_36432	26179	1.305245	768.128587	22663	3440	76	0.006016	0.016186	0.030271
8372	dd_EBK-TSP-OSM_30820	24644	1.151947	486.576485	22702	1866	76	0.022854	0.041014	0.062504
4357	dd_EBK-TSP-OSM_29230	24567	1.352331	742.251105	22867	1625	75	0.004237	0.013793	0.027259
23333	dd_EBK-TSP-OSM_25070	23204	1.210591	616.500919	20414	2712	78	0.011203	0.025575	0.043552
19069	dd_EBK-TSP-OSM_27146	22701	1.214940	279.628154	19025	3666	10	0.010521	0.043658	0.080672
16992	dd_EBK-TSP-OSM_25176	20199	1.104868	241.137584	18842	1339	18	0.039326	0.063117	0.090822
8523	dd_EBK-TSP-OSM_38226	19158	1.074724	187.190765	14771	4379	8	0.073465	0.110315	0.150938
15742	dd_EBK-TSP-OSM_22286	18880	1.089103	202.123483	16668	2201	11	0.050916	0.080787	0.114390
18971	dd_EBK-TSP-OSM_44566	18842	1.532882	478.066684	16945	1890	7	0.001421	0.008348	0.019197
5466	dd_EBK-TSP-OSM_29584	18591	2.083974	485.951976	17449	1139	3	0.000339	0.004249	0.012062
14010	dd_EBK-TSP-OSM_38694	17983	1.225126	702.429012	15796	2103	84	0.007597	0.018334	0.032992
18032	dd_EBK-TSP-OSM_19447	17079	1.168563	451.624940	12136	4919	24	0.012870	0.028021	0.046943