Sources
Our original source is the dataset from the CMOA. As an open source repository on GitHub, this dataset provides the information of the photography collection at CMOA. This original dataset contains metadata for more than 3 thousand photographs at CMOA, and provides information such as the titles, sources of photographs, creation date, photographer background information, and more. These different variables within the dataset enable us to investigate the composition of the CMOA photography collection and explore any potential trends in the data. In the process of trying to understand the role of the CMOA in the broader function of museums in America, we also looked at numerous relevant journals, articles, and research papers.
Processing
Due to some inconsistencies in the data, we quickly recognized that the original state of the dataset was not conducive for data visualization and trend exploration. We first determined which variables would be most informative for answering our research question, and then cleaned the values for those variables. Because we wanted to investigate the dates that the photographs were taken, we cleaned the “creation_date” variable using OpenRefine. Variations of the year format were all reconciled into the format ‘xxxx,’ where each x is a digit between 0 and 9. In order for the presentation of the data to be more concise later, we also created a new decade column by flooring year values in Excel so that different years could be grouped together in a bar chart visualization. The sources of the photographs (funds or people that CMOA acquired these photographs from) were also made consistent by merging similar entities in OpenRefine. The title names of the photographs were also clustered when they appeared similar enough and were by the same photographers, and Python was used to remove any brackets or parentheses around the titles. In addition, the photographers’ background information that was in a JSON format was converted into their own columns in the data set through Python for easier data investigation.
In the process of cleaning our data, we noticed that we were able to better skim the data to recognize trends that we could investigate for a possible research question. Forming these hypotheses as a group, we were able to lay out twelve potential research questions that our dataset could give insight into. We narrowed these twelve potential questions down to one: what is the significance of the photographs that the CMOA chooses to show in its collection and what does that say about the CMOA’s motivations? In investigating this one question, we look to explore other auxiliary questions that would illuminate the answer to the main question, such as the sources and subject content of these photographs throughout different eras.
Presentation
The goal of our project was to provide a humanistic background behind the photos archived in the CMOA. As explained by Trouillot in Silencing the Past, “humans beings participate in history both as actors and narrators” (Trouillot 2). In the case of our data, the CMOA is serving as the narrator of history as they get to choose which photos of history they want to store in their archives. We wanted to know why the museum chose to display pieces of art and not others, and what the consistent themes were in the art pieces selected by the CMOA.
To present this our team used WordPress, a web host supported by UCLA’s Humspace portal, to create a webpage. We wanted to put ourselves in the reader’s shoes, in order to produce the most enjoyable reading environment. In order to create this environment, we used a classic theme of a white background with black texts as it is easy for the viewer to legibly read texts, and be able to read graphs. We also wanted to accommodate those who have vision impairments with color, as we tried to make our texts and graphics with distinct contrasting colors. We also wanted to make our graphics as informative and easy to understand as possible. As mentioned in Nathan Yau’s, Visualizing with Clarity, “when you use graphics to present results to other people, you must make your graphics readable to those who don’t know your data as well as you do” (Yau 2013).
In order to create these easy to understand data visuals, we used platforms and tools such as Tableau, TimelineJS, Python, and Palladio to illustrate this raw data in easy-to-understand graphs, charts, maps, and timelines. Most of our visualizations were created on Tableau due to its ability to allow users to upload a data set and explore the relationships the different variables have with one another. These relationships were seen as the platform allowed us to combine quantitative and qualitative data into an easy to understand bar chat, histogram, or map. Tableau also allows for plenty of customization with their visualizations. We were able to utilize this by color coding our visualizations and having the inputs be adjustable to size based on numerical output, so that they are easier to understand at a quick glance. We also made sure to use contrasting colors and size indicators, so that those with color blindness can still tell the difference between two different results.