What is the most "normal" city in Canada?

What does it mean to be “normal”?

Consider for a moment the quintessential Canadian city. What does that image in your head look like? Are there outdoor rinks, wheat fields, and abundant Timmies? Does your mental image include a yoga studio, a dispensary, and a third-wave coffee shop? Canadians like to celebrate a self-constructed identity of diversity-the cultural mosaic- that is punctuated with conspicuous Canadiana that binds everything together. Canada is a highly urbanized country–it is a nation of cities, and while every Canadian city has elements that scream “Canada!”, is there a city that can stand above the rest and definitely claim to be the most quintessentially Canadian city of them all?

One way of approaching a question like this would be to define some kind of criteria that best represents this Canadian essence–but that would be hard and, truthfully, rather icky. The idea of the “normal” place is a common rhetorical approach often used to conjure up an image of a majority-dominant heartland that truly represents the values of a country. You can see this in politics all the time: cities are more diverse and heterogeneous than rural places, and, in many industrialized countries, account for the majority of the population, but are seldom viewed as representative of a country’s characteristics.

A more data-driven approach

So instead of a selecting for some arbitrary characteristics that define what the Canadian “normal” is, let’s take a data-driven approach. Fortunately, this FiveThirtyEight article by the economist Jed Kolko looking at identifying the most “normal” place in America provides a good template to start. The motivation behind the FiveThirtyEight article is simple: there is a disconnect between what pundits and some politicians view as representative of a “normal America” and what actual demographics say. The most “normal” place in America is not Oshkosh, Wisconsin but New Haven, Connecticut. Kolko’s approach is pretty simple as it relies on a quantitative analysis of demographics across US metros using age, education, and race as key indicators. The demographic mix of each metro area is compared against the demographic mix of the US as a whole using a dissimilarity index across every combination of selected demographic indicators from the American Community Survey.

We can do something similar using Canadian data. While we lack the crosstabs across Census values to apply the exact same methodology as the FiveThirtyEight article, a roughly similar approach can be taken by calculating the proportion or incidence of selected Census demographic or behavioral values for a given place and comparing it to a national benchmark. The cumulative sum of the geometric distance between the values for the national rate and that of the selected place provides an indication of similarity or dissimilarity. The most “normal” city in Canada will be the city that has the most similar demographic and characteristic makeup to the national benchmark.

The code chunk for this post is at the bottom of the page and the similarity calculations are contained in the last few lines of code.

The demographic mix

While the FiveThirtyEight article looks at age, education, and race only, we have the entire range of Canadian Census values to work with. This distinction is important because the result of any similarity calculation like the one in this post will depend on the variables that are selected. Using all available Census variables is not feasible, but we can go beyond age, education, and visible minority groups to also include immigration status, household tenure type, and commute types to capture some additional indicators that define how a city looks and feels. Income variables are not used here for several reasons: one, they are likely to be highly correlated with other variables that are included like education and household tenure; and, two, income can vary across cities, regions, and provinces for many reasons that can be specific to those places. Age and education variables are collapsed into broader groups to streamline things a bit.

Census subdivisions

Census subdivisions are a standard Statistics Canada geographic area that represents municipalities or areas deemed to be equivalent such as indigenous reserves. Using Census subdivisions instead of Census Metropolitan areas provides more granularity and more interesting results–there are some very different demographic mixes in the different cities that make up the Greater Toronto Area that could average out at the metro area. The latest Census has data for 5,148 Census subdivisions, though most of these are areas with very low population. Similarity is calculated here for the 413 subdivisions with population of 10,000 or more.

The most “normal” places in Canada

Using geometric distances between demographic and characteristic compositions the most “normal” or most representative municipality in Canada is Hamilton, Ontario. Guelph, London, Winnipeg, and Kitchener round out the bottom five.

Municipality	Population	Similarity Score
Hamilton	536,917	91.6
Guelph	131,794	89.1
London	383,822	88.0
Winnipeg	705,244	87.4
Kitchener	233,222	86.5
Note:
Source: Statistics Canada Census 2016. Similarity score calculated using a combination of demographic and characteristic Census variables as compared with national rates.

Model of a modern Canadian city

Photograph of downtown Hamilton, Ontario taken from Sam Lawrence Park, Wikipedia CC BY-SA 2.5

Hamilton has almost identical proportions to Canada as a whole for almost every variable looked at in calculating the score. Like Canada overall, the majority of the population falls into either the 45-64 or 15-29 groups that represent the Boomer and Millennial mega-generations. Educational attainment similar with the majority of people living in Canada possessing a post-secondary diploma below a university degree. The individual proportions of visible minority groups is remarkably similar as almost every group has a share of the population that falls within 1% of the national proporton. Compare this with the city of Vancouver, below, which drastically differs from the national averages in across all categories. Vancouver has fewer children under 15, more university educated people, a far higher incidence of renters and immigrants, and a far lower incidence of commuting by car. Canada’s major population centres: the GTA, Montreal, and the Lower Mainland all tend to have a higher educational attainment, different commuting patterns, more renters, and much more visible diversity. Because these few metropolitan regions account for so much of the national population, they will skew the national rates to reflect these differences.

Of the most similar cities on the list, the first 9 municipalities all have larger populations. The smaller towns with high similarity scores, View Royal, BC and L’Île-Perrot, QC are extended suburbs of Victoria and Montreal, respectively. If the objective of something like this is to find the definitive small Canadian town with the demographic mix that best represents the country and is not within the immediate geographic influence of a larger metropolitan area, you have to wait to the 24-th ranked city on the list, Brandon, MB. This is not unexpected as any Canadian average will be significantly weighted by larger metropolitan areas that tend to be more educated and more diverse than smaller cities outside of core metro areas.

Municipality	Population	Similarity Score
Châteauguay	47,906	82.8
Brandon	48,859	81.6
Vaudreuil-Dorion	38,117	81.4
Whitehorse	25,085	81.1
Langford	35,342	80.7
Note:
Source: Statistics Canada Census 2016

Find your city

You can scroll the full table below which has the similarity scores for all Canadian municipalities and municipality-like regions with populations above 10,000.

Municipality	Population	Similarity Score
Hamilton (C)	536,917	91.6
Guelph	131,794	89.1
London	383,822	88.0
Winnipeg	705,244	87.4
Kitchener	233,222	86.5
Saskatoon	246,376	86.3
Laval	422,993	86.0
Saanich	114,148	85.9
Regina	215,106	85.7
View Royal	10,408	85.5
L’Île-Perrot	10,756	84.1
Pitt Meadows	18,573	83.9
St. Catharines	133,113	83.4
Windsor	217,188	83.0
Oshawa	159,458	82.8
Châteauguay	47,906	82.8
Gatineau	276,245	82.4
Dorval	18,980	82.2
Kelowna	127,380	82.2
Newmarket	84,224	82.0
Nanaimo	90,504	82.0
Burlington	183,314	81.9
Squamish	19,512	81.9
Brandon	48,859	81.6
Lethbridge	92,729	81.5
Vaudreuil-Dorion	38,117	81.4
Edmonton	932,546	81.2
Cambridge	129,920	81.2
Whitehorse	25,085	81.1
Red Deer	100,418	80.9
Kingston	123,798	80.7
Langford	35,342	80.7
Niagara Falls	88,071	80.5
Port Coquitlam	58,612	80.4
Barrie	141,434	80.1
Whitby	128,377	80.0
Maple Ridge	82,256	79.8
Langley (CY)	25,888	79.7
La Prairie	24,110	79.5
White Rock	19,952	79.4
Ottawa	934,243	79.4
Halifax	403,131	79.4
Calgary	1,239,220	79.1
Longueuil	239,700	78.9
Terrace	11,643	78.8
Colwood	16,859	78.3
Pointe-Claire	31,380	78.2
Kamloops	90,280	78.1
Langley (DM)	117,285	78.1
Waterloo	104,986	77.7
Sarnia	71,594	77.6
Peterborough	81,032	77.4
Brantford	97,496	77.1
Delta	102,238	77.0
Courtenay	25,599	77.0
Canmore	13,992	77.0
Deux-Montagnes	17,496	76.9
Thorold	18,801	76.9
Thunder Bay	107,909	76.9
Port Moody	33,551	76.9
North Battleford	14,315	76.8
Central Saanich	16,814	76.7
Stratford	31,465	76.7
Prince George	74,003	76.7
Steinbach	15,829	76.4
Boisbriand	26,884	76.4
Bradford West Gwillimbury	35,325	76.3
Moncton	71,889	76.1
Lloydminster (Part) (CY)	19,645	76.1
Fredericton	58,220	76.1
Collingwood	21,793	75.9
North Cowichan	29,676	75.9
Repentigny	84,285	75.9
Saltspring Island	10,557	75.8
Penticton	33,761	75.8
Camrose	18,742	75.7
St. John’s	108,860	75.7
Sechelt	10,216	75.7
Vernon	40,116	75.6
Saint-Eustache	44,008	75.4
Lacombe	13,057	75.4
High River	13,584	75.3
Yellowknife	19,569	75.1
Sooke	13,001	75.0
Chilliwack	83,788	75.0
Comox	14,028	74.8
Sault Ste. Marie	73,368	74.8
Terrebonne	111,575	74.7
Orangeville	28,900	74.7
Cold Lake	14,961	74.7
Aurora	55,445	74.7
Prince Rupert	12,220	74.6
King	24,512	74.6
Sainte-Catherine	17,047	74.6
Saint-Constant	27,359	74.5
Greater Sudbury	161,531	74.5
Wood Buffalo	71,589	74.5
Cobourg	19,440	74.5
Yorkton	16,343	74.5
Belleville	50,716	74.4
Swift Current	16,604	74.4
Weyburn	10,870	74.3
Pincourt	14,558	74.3
Campbell River	32,588	74.3
Cranbrook	20,047	74.2
Mission	38,833	74.2
Abbotsford	141,397	74.1
Leduc	29,993	74.1
North Vancouver (DM)	85,935	74.0
Medicine Hat	63,260	74.0
Prince Albert	35,926	73.9
Carleton Place	10,644	73.9
Lake Country	12,922	73.9
Grande Prairie	63,166	73.9
St. Albert	65,589	73.9
Halton Hills	61,161	73.9
Wetaskiwin	12,655	73.9
Moose Jaw	33,890	73.8
Welland	52,293	73.8
Orillia	31,166	73.8
North Bay	51,553	73.7
Estevan	11,483	73.7
Kincardine	11,389	73.6
Saugeen Shores	13,715	73.5
Charlottetown	36,094	73.5
Fort St. John	20,155	73.4
Dawson Creek	12,178	73.4
Lloydminster (Part) (CY)	11,765	73.4
Chambly	29,120	73.3
Salmon Arm	17,706	73.3
Fort Saskatchewan	24,149	73.2
Okotoks	28,881	73.1
Dollard-Des Ormeaux	48,899	73.1
Woodstock	40,902	73.0
Airdrie	61,581	73.0
Powell River	13,157	72.9
Nelson	10,572	72.9
West Kelowna	32,655	72.8
Saint John	67,575	72.8
St. Thomas	38,909	72.8
East Gwillimbury	23,991	72.7
Grimsby	27,314	72.7
Port Hope	16,753	72.7
Blainville	56,863	72.7
Sherbrooke	161,323	72.6
Québec	531,902	72.6
Sidney	11,672	72.5
Dieppe	25,384	72.4
Caledon	66,502	72.2
Strathmore	13,756	72.2
Fort Erie	30,710	72.2
Uxbridge	21,176	72.2
Stony Plain	17,189	72.1
Port Alberni	17,678	72.1
Williams Lake	10,753	72.1
Midland	16,864	72.1
Strathroy-Caradoc	20,867	72.0
Brockville	21,346	72.0
Cochrane	25,853	71.9
Chatham-Kent	101,647	71.9
Centre Wellington	28,191	71.9
Spruce Grove	34,066	71.8
Beloeil	22,458	71.8
Woolwich	25,006	71.8
Saint-Jean-sur-Richelieu	95,114	71.7
Kenora	15,096	71.7
Esquimalt	17,655	71.7
Sainte-Adèle	12,919	71.7
L’Ancienne-Lorette	16,543	71.6
Kings, Subd. B	11,858	71.6
Summerland	11,615	71.6
Tecumseh	23,229	71.6
New Tecumseth	34,242	71.5
Whistler	11,854	71.5
Clarington	92,013	71.5
Timmins	41,788	71.4
Portage la Prairie	13,304	71.3
Owen Sound	21,341	71.2
L’Assomption	22,429	71.2
Mercier	13,115	71.2
Mascouche	46,692	71.2
Lévis	143,414	71.1
Riverview	19,667	71.1
Lincoln	23,787	71.1
Whitchurch-Stouffville	45,837	71.0
Georgina	45,418	71.0
Meaford	10,991	71.0
Saint-Sauveur	10,231	70.8
Pickering	91,771	70.8
Lethbridge County	10,353	70.7
Varennes	21,257	70.7
Rouyn-Noranda	42,334	70.7
Gander	11,688	70.6
Port Colborne	18,306	70.6
Corner Brook	19,806	70.6
Parksville	12,514	70.6
Clarence-Rockland	24,512	70.6
Edmundston	16,580	70.6
Sainte-Marie	13,565	70.6
Candiac	21,047	70.6
Strathcona County	98,044	70.6
Tillsonburg	15,872	70.5
Niagara-on-the-Lake	17,511	70.4
Sainte-Thérèse	25,989	70.4
Marieville	10,725	70.3
Cape Breton	94,285	70.3
Kingsville	21,552	70.3
Whitecourt	10,204	70.2
Mount Pearl	22,957	70.2
Mississippi Mills	13,163	70.2
Trois-Rivières	134,413	70.2
Leamington	27,595	70.2
Rimouski	48,664	70.2
Bracebridge	16,010	70.1
Rocky View County	39,407	70.1
Mirabel	50,513	70.1
Brossard	85,721	70.1
Victoriaville	46,130	70.0
Oakville	193,832	70.0
Huntsville	19,816	70.0
Norfolk County	64,044	70.0
Loyalist	16,971	70.0
Beaumont	17,396	70.0
Wilmot	20,545	70.0
Brooks	14,451	69.9
Miramichi	17,537	69.9
Sept-Îles	25,400	69.8
Quinte West	43,577	69.8
Petawawa	17,187	69.8
Val-d’Or	32,491	69.8
Grand Falls-Windsor	14,171	69.8
Russell	16,520	69.8
South Huron	10,096	69.8
Sylvan Lake	14,816	69.8
Scugog	21,617	69.8
North Saanich	11,249	69.8
Sainte-Julie	29,881	69.7
Essa	21,083	69.7
Greater Napanee	15,892	69.7
New Westminster	70,996	69.7
Kings, Subd. A	22,234	69.7
Ingersoll	12,757	69.7
Magog	26,669	69.6
Rawdon	11,057	69.5
Colchester, Subd. B	19,534	69.5
Foothills No. 31	22,766	69.5
Coldstream	10,648	69.4
Rosemère	13,958	69.4
Rothesay	11,659	69.4
Baie-Comeau	21,536	69.4
Gravenhurst	12,311	69.4
Prince Edward County	24,735	69.4
North Dundas	11,278	69.4
Granby	66,222	69.4
Amherstburg	21,936	69.3
Selkirk	10,278	69.3
Erin	11,439	69.3
Brant	36,707	69.3
Innisfil	36,566	69.3
Pembroke	13,882	69.3
Saint-Hyacinthe	55,648	69.3
Mont-Saint-Hilaire	18,585	69.3
North Perth	13,130	69.2
Saguenay	145,949	69.2
Guelph/Eramosa	12,854	69.2
Beauharnois	12,884	69.2
Summerside	14,829	69.2
North Dumfries	10,215	69.1
Amos	12,823	69.1
Chestermere	19,887	69.1
Saint-Jérôme	74,346	69.0
Saint-Georges	32,513	69.0
Drummondville	75,423	68.9
Pelham	17,110	68.8
Bathurst	11,897	68.8
North Glengarry	10,109	68.8
Saint-Félicien	10,238	68.8
West Nipissing	14,364	68.7
Brighton	11,844	68.7
The Nation	12,808	68.7
Matane	14,311	68.7
Alma	30,776	68.7
Sainte-Agathe-des-Monts	10,223	68.7
LaSalle	30,180	68.6
Brock	11,642	68.6
Sorel-Tracy	34,755	68.6
Bécancour	13,031	68.5
North Vancouver (CY)	52,898	68.4
Lambton Shores	10,631	68.4
Sainte-Anne-des-Plaines	14,421	68.4
Cornwall	46,589	68.4
North Grenville	16,451	68.4
Trent Hills	12,900	68.3
Haldimand County	45,608	68.3
Saint-Basile-le-Grand	17,059	68.3
Winkler	12,591	68.2
Rivière-du-Loup	19,507	68.2
Sturgeon County	20,495	68.2
Roberval	10,046	68.2
Norwich	11,001	68.1
Lakeshore	36,611	68.1
Essex	20,427	68.0
Wellington North	11,914	68.0
Gaspé	14,568	68.0
Sainte-Marthe-sur-le-Lac	18,074	67.9
Springwater	19,059	67.9
South Glengarry	13,150	67.8
Notre-Dame-de-l’Île-Perrot	10,654	67.8
La Tuque	11,001	67.8
Kawartha Lakes	75,423	67.8
Dolbeau-Mistassini	14,250	67.7
Elliot Lake	10,741	67.7
Montmagny	11,255	67.7
Selwyn	17,060	67.6
Clearview	14,151	67.6
Central Elgin	12,607	67.6
Thetford Mines	25,403	67.6
Middlesex Centre	17,262	67.6
Colchester, Subd. C	13,098	67.6
South Dundas	10,833	67.6
East Hants	22,453	67.6
Lacombe County	10,343	67.5
St. Clair	14,086	67.5
Oak Bay	18,094	67.5
Thames Centre	13,191	67.4
Saint-Lazare	19,889	67.4
Mont-Laurier	14,116	67.4
West Grey	12,518	67.4
Milton	110,128	67.3
Boucherville	41,671	67.3
Truro	12,261	67.3
Chester	10,310	67.3
Wasaga Beach	20,675	67.3
Conception Bay South	26,199	67.2
Shawinigan	49,349	67.2
Salaberry-de-Valleyfield	40,745	67.2
Leduc County	13,780	67.2
Coquitlam	139,284	67.1
Saint-Bruno-de-Montarville	26,394	67.1
Saint-Charles-Borromée	13,791	67.0
West Lincoln	14,500	67.0
Mountain View County	13,074	67.0
Prévost	13,002	66.7
Cowansville	13,656	66.6
West Hants	15,368	66.6
Saint-Amable	12,167	66.5
Paradise	21,389	66.5
Saint-Raymond	10,221	66.5
Lavaltrie	13,657	66.5
Adjala-Tosorontio	10,975	66.3
Tiny	11,787	66.3
Vaughan	306,233	66.3
Val-des-Monts	11,582	66.3
Kirkland	20,151	66.2
Severn	13,477	66.2
Queens	10,307	66.1
Warman	11,020	66.1
Sainte-Sophie	15,690	66.0
Les Îles-de-la-Madeleine	12,010	66.0
Taché	11,568	65.9
Wetaskiwin County No. 10	11,181	65.8
Red Deer County	19,541	65.8
Cantley	10,699	65.8
South Stormont	13,110	65.8
Oro-Medonte	21,036	65.6
Saint-Lin–Laurentides	20,786	65.6
Saint-Lambert	21,861	65.5
Perth East	12,261	65.5
Yellowhead County	10,995	65.5
Parkland County	32,097	65.5
Quispamsis	18,245	65.4
Rideau Lakes	10,326	65.3
Georgian Bluffs	10,479	65.2
South Frontenac	18,646	65.2
Tay	10,033	65.1
Hamilton (TP)	10,942	65.1
Saint-Colomban	16,019	65.0
Joliette	20,484	64.9
Ajax	119,677	64.9
Lunenburg	24,863	64.7
Hanover	15,733	64.4
Saint-Augustin-de-Desmaures	18,820	64.4
Beaconsfield	19,324	64.3
Springfield	15,342	64.3
St. Andrews	11,913	64.2
Clearwater County	11,947	64.2
Grande Prairie County No. 1	22,303	64.2
Surrey	517,887	64.1
West Vancouver	42,473	64.1
Lachute	12,862	63.9
St. Clements	10,876	63.9
Wellesley	11,260	63.9
Côte-Saint-Luc	32,448	63.9
Bonnyville No. 87	13,575	63.8
Hawkesbury	10,263	63.5
Tracadie	16,114	63.5
Lac Ste. Anne County	10,899	63.5
Victoria	85,792	63.1
Mapleton	10,527	63.0
Mont-Royal	20,276	63.0
Mississauga	721,599	61.6
Montréal	1,704,694	60.5
Toronto	2,731,571	59.6
Westmount	20,312	59.2
Vancouver	631,486	58.3
Burnaby	232,755	57.9
Richmond Hill	195,022	56.6
Mackenzie County	11,171	55.5
Brampton	593,638	55.3
Markham	328,966	51.9
Richmond	198,309	51.7
Greater Vancouver A	16,133	46.6
Note:
Source: Statistics Canada Census 2016.

Mirror images - a detour

There are many dimensions we can use to measure similarity across cities and, by extension, many assumptions to consider when doing so. The similarity numbers used here reflect assumptions about both which dimensions should be included or excluded and the method through which we evaluate similarity. A different selection of variables or dimensions would likely lead to a different result. Even the way that variables are aggregated here could affect the results. This is all to say that this post is not trying to be a definitive study of city similarity and that it is affected by the choices made by the author in terms of what to measure–and how. This caveat applies to both this post and to the source FiveThirtyEight post. New Haven is the most “normal” city in America (based on that article’s author’s choice of relevant variables and methodology). It might still be the most “normal” given a different set of input variables, but it might also not be.

Just as using a different set of variables could lead to different results, a different distance measure for similarity or a different measurement methodology could lead to different results as well. There are many different ways to measure something like this.

One approach to something like this is to use algorithms that map high-dimensional data into lower dimensional space to visually display similarity between many high-dimensional objects at the same time. In an older post, I wrote about identifying clusters of Canadian cities based on similarity in either visible minority group, education, or occupation. That post used an application of the t-SNE algorithm for visualizing closeness for high-dimensional data. The upside of that approach versus something like the geometric similarity index used in this post is that it shows relationships between all selected cities instead of one city at a time.

Something similar can be done with the multi-dimensional data used in calculating the similarity indices above. The unifold manifold approximation and projection (UMAP) algorithm is another method that is increasingly used in lieu of t-SNE for dimensional reduction and visualization in lower dimensional space. UMAP has some advantages over t-SNE in that it tends to be computationally faster and, more importantly for us, tends to do a better job in preserving global structure in lower-dimensional embeddings. Typically in t-SNE embeddings objects close together are similar across their many dimensions, but objects far from one another are not necessarily dissimilar, which can be counter-intuitive for someone looking at the resulting visualization.

The below visualization shows a UMAP embedding using the same variables as in the similarity index for all Census subdivisions with populations above 100,000. In this visualization, the national demographic and characteristic mix is embedded in the middle and the CSDs with the most similar mixes should be located closest to that point. Notably, this approach produces some differences in results. Hamilton, while still very similar, is no longer the most “normal” CSD. That distinction falls to Windsor, Ontario, with Saanich, BC also in close proximity. There is a pretty good overlap in the cities identified by this approach and the similarity indices above which is a good sign that we can consider that group of cities as representative of a “normal” Canada–if at least for this combination of demographic and characteristic variables. UMAP embeddings have an element of randomness to them and the exact layout will be different every time it is generated, but there should be consistency in the approximate placement of individual points if there is an actual structure to the data. The advantage of this approach is it shows both the similarity and dissimilarity of these CSDs to the national mix, but it also can be interpreted to show similarity and dissimilarity of these CSDs from one another. Richmond and Richmond Hill have high similarity to another one, as do Sherbrooke and St. John’s, but the same visual layout shows that Sherbrooke and Richmond Hill have very different compositions from one another.

Show us the guts

The code for getting Census data, transforming it, and calculating the similarity scores is below. The code for this entire page is, as always, view-able on Github. The UMAP visualization takes advantage of the excellent umap R package.

# Load required packages
library(cancensus)
library(dplyr)
library(tidyr)

# Generate list of Census Subdivisions in Canada with pop >= 100000
csdlist <- list_census_regions("CA16") %>% 
  filter(pop >= 10000, level == "CSD") %>% 
  as_census_region_list()

# Get region code for Canada-wide region
national_region <- list_census_regions("CA16") %>% filter(level == "C") %>% as_census_region_list()

# Convenience function to tidy up CSD names
clean_names2 <- function (dfr) {
  dfr <- dfr %>% mutate(`Region Name` = as.character(`Region Name`))
  replacement <- dfr %>% mutate(`Region Name` = gsub(" \\(.*\\)", 
                                                     "", `Region Name`)) %>% pull(`Region Name`)
  duplicated_rows <- c(which(duplicated(replacement, fromLast = TRUE)), 
                       which(duplicated(replacement, fromLast = FALSE)))
  replacement[duplicated_rows] <- dfr$`Region Name`[duplicated_rows]
  dfr$`Region Name` <- factor(replacement)
  dfr
}

# Identify appropriate vector codes 
# 0-14, 15-29, 30-44, 45-64, 65+
age_vectors <- c("v_CA16_1", "v_CA16_4", "v_CA16_64", "v_CA16_82","v_CA16_100",
          "v_CA16_118","v_CA16_136", "v_CA16_154","v_CA16_172","v_CA16_190",
          "v_CA16_208","v_CA16_226","v_CA16_244")

# Visible minorities vectors
parent_vector <- "v_CA16_3954"
minorities <- list_census_vectors("CA16") %>% 
  filter(vector == "v_CA16_3954") %>% 
  child_census_vectors(leaves_only = TRUE) %>% 
  pull(vector)
minority_vectors <- c(parent_vector, minorities)

# Education vectors - uses only data for population aged 25 and over. 
# no diploma, high-school, post-secondary certificate, university diploma or higher
educ_vectors <- c("v_CA16_5096", "v_CA16_5099", "v_CA16_5102", "v_CA16_5105", "v_CA16_5123")

# Immigration status vectors
imm_vectors <- c("v_CA16_3405","v_CA16_3408","v_CA16_3411","v_CA16_3435")

# Tenure vectors
ten_vectors <- c("v_CA16_4836","v_CA16_4837","v_CA16_4838","v_CA16_4839")

# Commute vectors
com_vectors <- c("v_CA16_5792","v_CA16_5795","v_CA16_5798","v_CA16_5801",
                 "v_CA16_5804","v_CA16_5807")

# Mobility vectors
#mob_vectors <- c("v_CA16_6692", "v_CA16_6707","v_CA16_6716")

# Coerce all vectors requested together
demo_vectors <- c(age_vectors, minority_vectors, educ_vectors, imm_vectors, ten_vectors, com_vectors)

# Download census data for national level
national <- get_census("CA16", level = "C", regions = national_region, vectors = demo_vectors, labels = "short")

# Group vectors where appropriate and calculate proportions
national_demo <- national %>% 
  mutate(age_014 = v_CA16_4/v_CA16_1,
         age_1529 = (v_CA16_64 + v_CA16_64 + v_CA16_82 + v_CA16_100)/v_CA16_1,
         age_3044 = (v_CA16_118 + v_CA16_136 + v_CA16_154)/v_CA16_1,
         age_4564 = (v_CA16_172 + v_CA16_190 + v_CA16_208 + v_CA16_226)/v_CA16_1,
         age_65p = v_CA16_244/v_CA16_1,
         min_white = v_CA16_3996/v_CA16_3954,
         min_sasian = v_CA16_3960/v_CA16_3954,
         min_chinese = v_CA16_3963/v_CA16_3954,
         min_black = v_CA16_3966/v_CA16_3954,
         min_filipino = v_CA16_3969/v_CA16_3954,
         min_latinam = v_CA16_3972/v_CA16_3954,
         min_arab = v_CA16_3975/v_CA16_3954,
         min_seasian = v_CA16_3978/v_CA16_3954,
         min_wasian = v_CA16_3981/v_CA16_3954,
         min_korean = v_CA16_3984/v_CA16_3954,
         min_japanese = v_CA16_3987/v_CA16_3954,
         min_oth = v_CA16_3990/v_CA16_3954,
         educ_nohs = v_CA16_5099/v_CA16_5096,
         educ_hs = v_CA16_5102/v_CA16_5096,
         educ_dip = v_CA16_5105/v_CA16_5096,
         educ_uni = v_CA16_5123/v_CA16_5096,
         imm_nat = v_CA16_3408/v_CA16_3405,
         imm_imm = v_CA16_3411/v_CA16_3405,
         imm_non = v_CA16_3435/v_CA16_3405,
         ten_own = v_CA16_4837/v_CA16_4836,
         ten_rent = v_CA16_4838/v_CA16_4836,
         ten_band = v_CA16_4839/v_CA16_4836,
         com_car = (v_CA16_5795 + v_CA16_5798)/v_CA16_5792,
         com_trans = v_CA16_5801/v_CA16_5792,
         com_walk = v_CA16_5804/v_CA16_5792,
         com_bike = v_CA16_5807/v_CA16_5792) %>% 
  select(GeoUID, `Region Name`, pop = Population, starts_with("age"), starts_with("min"), starts_with("educ"),
         starts_with("imm"), starts_with("ten"), starts_with("com"))

# Get demographic vectors for all selected csds
csds <- get_census("CA16", level = "CSD", regions = csdlist, vectors = demo_vectors, labels = "short")

# adjust into groups and calculate proportions - same as above. 
csds <- csds %>% 
  mutate(age_014 = v_CA16_4/v_CA16_1,
         age_1529 = (v_CA16_64 + v_CA16_64 + v_CA16_82 + v_CA16_100)/v_CA16_1,
         age_3044 = (v_CA16_118 + v_CA16_136 + v_CA16_154)/v_CA16_1,
         age_4564 = (v_CA16_172 + v_CA16_190 + v_CA16_208 + v_CA16_226)/v_CA16_1,
         age_65p = v_CA16_244/v_CA16_1,
         min_white = v_CA16_3996/v_CA16_3954,
         min_sasian = v_CA16_3960/v_CA16_3954,
         min_chinese = v_CA16_3963/v_CA16_3954,
         min_black = v_CA16_3966/v_CA16_3954,
         min_filipino = v_CA16_3969/v_CA16_3954,
         min_latinam = v_CA16_3972/v_CA16_3954,
         min_arab = v_CA16_3975/v_CA16_3954,
         min_seasian = v_CA16_3978/v_CA16_3954,
         min_wasian = v_CA16_3981/v_CA16_3954,
         min_korean = v_CA16_3984/v_CA16_3954,
         min_japanese = v_CA16_3987/v_CA16_3954,
         min_oth = v_CA16_3990/v_CA16_3954,
         educ_nohs = v_CA16_5099/v_CA16_5096,
         educ_hs = v_CA16_5102/v_CA16_5096,
         educ_dip = v_CA16_5105/v_CA16_5096,
         educ_uni = v_CA16_5123/v_CA16_5096,
         imm_nat = v_CA16_3408/v_CA16_3405,
         imm_imm = v_CA16_3411/v_CA16_3405,
         imm_non = v_CA16_3435/v_CA16_3405,
         ten_own = v_CA16_4837/v_CA16_4836,
         ten_rent = v_CA16_4838/v_CA16_4836,
         ten_band = v_CA16_4839/v_CA16_4836,
         com_car = (v_CA16_5795 + v_CA16_5798)/v_CA16_5792,
         com_trans = v_CA16_5801/v_CA16_5792,
         com_walk = v_CA16_5804/v_CA16_5792,
         com_bike = v_CA16_5807/v_CA16_5792) %>% 
  select(GeoUID, `Region Name`, pop = Population, starts_with("age"), starts_with("min"), starts_with("educ"),
         starts_with("imm"), starts_with("ten"), starts_with("com"))

# Finally, calculate pairwise dissimilarity between each CSD and the national rates using a Euclidian distance measurement and then calculate an index value for how closely similar or not CSDs are to national rates.
diss <- bind_rows(national_demo, csds) %>% 
  mutate_at(vars(age_014:com_bike), funs((.- first(.))^2)) %>% 
  tidyr::gather(vars, values, age_014:com_bike) %>% 
  group_by(`Region Name`, pop) %>% 
  summarise(index = sqrt(sum(values))) %>% 
  mutate(sim = (1/(1+index)*100)) %>% 
  ungroup() %>% 
  filter(`Region Name` != "Canada") %>% 
  clean_names2()

December 21, 2018