Researchers' Blog Emerging patterns in Gephi Networks

Thanks to Elijah's help, I've been able to move forward with Gephi visualizations using Geospatial Layout, which plots locations in Saskatchewan based on long, lat, and a tool called Ego Network, which it allows me to select any node in the network and filter the network to only see its connections.

Below, you will see three visualizations with selective labels. The first one, Geo exploration, is the entire Saskatchewan network and shows destinations in green and families in red. The size of the nodes represent their Eigenvector centrality factor, which is the measure of a node's influence in the network. It assigns a relative score to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of a node than equal connections to low-scoring nodes. Eigenvector Centrality is proximal to betweenness centrality, which measures how frequently a node occurs in the shortest path between any other two nodes in a network. To have a high betweenness centrality score would mean that you are crucial in connecting elements within the network. As you can see in the Saskatchewan network, three major destinations emerge: Saskatoon, Regina, and Moose Jaw. Swift Current, while a major destination, does not have as many links to highly-connected places as Saskatoon, Regina, and Moose Jaw (which predictably are very connected to each other). Some of the major families in this network also start to emerge. We have been talking about families organization in China and once in Canada--once again, the Ma pattern of migration seems highly organized, as we saw from my initial mockup. Could this imply the strength of family associations in North America?

The other two visualizations show the 2-degree networks of Moose Jaw and Regina, respectively. That means that all the nodes that are connected to Moose Jaw or Regina are shown, as well as all the nodes that are connected to nodes connected to Moose Jaw and Regina--so 2 degrees out from the ego. The Ma family appears to be much more important in the Regina network than in the Moose Jaw network.

Gephi has also been a great way to identify any errors or mislabeling in the initial categorization. In order to produce these networks, I had to go through all the immigrants that listed Saskatchewan as their destination during our project years and deduce what their last name was Romanized from. There are quite a few Romanizations that have multiple possible Chinese surnames associated with it. Looking at the Regina network, it becomes clear that the Luo and Liu who are from Yuemingcun in Sen Ning are probably the same family--either Luo or Liu, not both. Also, looking at the Moose Jaw network, the Guan from Gee How San in Hoy Ping and Gee On Lee in Hoy Ping are potentially the same family from the same village, but referred to differently. When I talked to my grandma, who is Toisanese, about the way people refer to villages, she mentioned that sometimes people say "How" or "Lee" after their village name, which literally means "head" and "tail." These people come from the same village, just from different ends of it. That's a good thing to keep in mind as I use Gephi further to identify any previous errors I might have made.

Moving on, I'd like to look at ego networks for specific families, particularly the ones that I have labeled on these mockups as the major families in the network. For our final visualization, I'm hoping to project our network data onto a map so that we have the geographic data available for reference.

Geoexploration PDF

Moose Jaws 2 degree network PDF

Regina's 2degree Network PDF