All posts by media-man

New: You Can Now Search the Full Text of 3 Million Nonprofit Tax Records for Free

On Thursday, we launched a new feature for our Nonprofit Explorer database: The ability to search the full text of nearly 3 million electronically filed nonprofit tax filings sent to the IRS since 2011.

Nonprofit Explorer already lets researchers, reporters and the general public search for tax information from more than 1.8 million nonprofit organizations in the United States, as well as allowing users to search for the names of key employees and directors of organizations.

Now, users of our free database can dig deep and search for text that appears anywhere in a nonprofit’s tax records, as long as those records were filed digitally — which according to the IRS covers about two-thirds of nonprofit tax filings in recent years.

How can this be useful to you? For one, this feature lets you find organizations that gave grants to other nonprofits. Any nonprofit that gives grants to another must list those grants on its tax forms — meaning that you can research a nonprofit’s funding by using our search. A search for “ProPublica,” for example, will bring up dozens of foundations that have given us grants to fund our reporting (as well as a few filings that reference Nonprofit Explorer itself).

Just another example: When private foundations have investments or ownership interest in for-profit companies, they have to list those on their tax filings as well. If you want to research which foundations have investments in a company like ExxonMobil, for example, you can simply search for the company name and check which organizations list it as an investment.

The possibilities are nearly limitless. You can search for the names or addresses of independent contractors that made more than $100,000 from a nonprofit, you can search for addresses, keywords in mission statements or descriptions of accomplishments. You can even use advanced search operators, so for instance you can find any filing that mentions either “The New York Times,” “nytimes” or “nytimes.com” in one search.

The new feature contains every electronically filed Form 990, 990-PF and 990-EZ released by the IRS from 2011 to date. That’s nearly 3 million filings. The search does not include forms filed on paper.

So please, give this search a spin. If you write a story using information from this search, or you come across bugs or problems, drop us a line! We’re excited to see what you all do with this new superpower.

read more...

New Plugin Launch: Republication Tracker Tool

INN Labs is happy to announce our newest plugin, the Republication Tracker Tool.

The Republication Tracker Tool allows publishers to share their stories by other websites and then track engagement of those shared stories with Google Analytics. The technology behind this tracking is similar to ProPublica’s PixelPing.

Why Might You Want to Use This Plugin?

  • Grow your audience and pageviews: Other publishers and readers acquire and re-distribute your content with a Creative-Commons license.
  • Better republishing reporting: View what publishers that are republishing your content and analyze engagement.
  • Foster collaborations: Gather supporting data to build relationships with other publishers who may be republishing your content.

How Publishers Republish Your Content

A simple “Republish This Story” button is added to your posts through a WordPress widget. This enables your stories to be republished by other sites who may want to use it and then to track engagement of those republished stories via Google Analytics

Sample republication button (style can be customized).

Track Republished Posts in WordPress

Once one of your stories has been republished, you will easily be able to see how many times it has been republished, how many republished views it has, who has republished it, and the URL of where it was republished, all from the WordPress edit screen for that story.

Example of republication data in the edit screen of a WordPress post.

Track Republished Posts in Google Analytics

Another valuable feature of the Republication Tracker Tool is all of your republished post data is also tracked in your Google Analytics account. Once you have your Google Analytics ID configured in the Republication Tracker Tool settings, you will be able to log into Google Analytics and view who has republished your stories, who is republishing most of your stories, and more.

Example of republication data within Google Analytics.

More Information and Feedback

For more information about how the plugin works:

You can download the Republication Tracker Tool from the WordPress.org plugin repository or through your website’s WordPress plugin page.

The initial release of this plugin was made possible by valuable INN member testing and feedback. If your organization uses the plugin, please let us know and continue sending us suggestions for improvement. Thank you!

The Republication Tracker Tool is one of the many WordPress plugins maintained by INN Labs, the tech and product team at the Institute for Nonprofit News.

Announcing Version 1.0 of the Link Roundups Plugin

INN Labs is pleased to announce an important update to the Link Roundups plugin!

If you run a daily or weekly newsletter collecting headlines from around the state, region, or within a particular industry, the Link Roundups plugin will make it easier to build and feed your aggregation posts into MailChimp.

The Link Roundups plugin helps editors aggregate links from around the web and save them in WordPress as “Saved Links”. You can publish these curated links in a Link Roundup (more below), display Saved Links and Link Roundups in widgets and posts in your WordPress site, or push Link Roundups directly to subscribers via MailChimp. It's designed to replace scattered link-gathering workflows that may span email, Slack, Google Docs and spreadsheets and streamlines collaborations between different staffers.

Why might you want to use this plugin? Here are a few reasons:

  • It creates a single destination for collecting links and metadata
  • On sites that publish infrequently, it provides recently published (curated) content for your readers
  • Weekly roundup newsletters or posts are a great way to recap your own site's coverage and build and diversify your audience, which can increase donations

Saved Links

The central function of the Link Roundups plugin is the Saved Link. It's a way of storing links in your WordPress database, alongside metadata such as the link's title, source site, and your description of the link's contents.

A screenshot of the Saved Links interface, showing many saved links and their respective metadata: authors, links, descriptions, and tags.

Save to Site Bookmarklet

When WordPress 4.9 removed the "Press This" functionality, this plugin's bookmarklet broke. This release's updates to the Saved Links functionality include a renewal of the "Save to Site" bookmarklet, based off of the canonical Press This plugin's functions. If your site has the WordPress-maintained Press This plugin active, your site users will be able to generate new bookmarklets. We include instructions on how to use the bookmarklet in the latest release.

A screenshot of the "Save to Site" button and its copy button

Once you've accumulated a few Saved Links, you can display them on your site using the Saved Links Widget or start to create Link Roundups (see next).

Saved Links Widget

Common uses of this widget include "coverage from around the state" or "recommended reads" or "from our partners" links.

It's a good way to point your to expert coverage from newsrooms you partner with. With the ability to sort Saved Links by tag, you can easily filter a widget to only show a selection of all the links saved on your site. Here's how Energy News Network uses the widget:

A screenshot of the widget as it appears at Energy News Network, showing a selection of links from the last day.
A screenshot of the widget as it appears at Energy News Network, showing a selection of links from the last day.

Link Roundups

Link Roundups are one of the best ways to present Saved Links to your readers. Collect links with Saved Links, then create a Link Roundup post with the week's curated links. The person who assembles the Link Roundup doesn't have to deal with messy cut-and-paste formatting or composing blurbs — when your users create Saved Links, they're already adding headlines, blurbs, and sources.

Add some opening and closing text, and you're most of the way to having composed a morning or weekly news roundup.

Link Roundups are a custom post type with all the Classic Editor tools and an easy interface for creating lists of Saved Links. As a separate post type, they can be integrated into your site's standard lists of posts or kept separate in their own taxonomies. You don't have to integrate the roundups with your standard posts flow; it's why we provide a Link Roundups widget to fulfill your widget area needs.

MailChimp Integration

Link Roundups don't have to stay on your site. If you configure your site to connect to the MailChimp API and create a newsletter template with editable content areas, you can send a Link Roundup directly to MailChimp from WordPress.

From the Link Roundup editor, you can choose a mailing list, and create MailChimp campaign drafts, send test emails, and send drafted campaigns directly. If you'd rather open a draft campaign in MailChimp to finalize the copy, there's a handy link to your draft campaign.

A screenshot of a settings metabox: choose a campaign type of regular or text. Choose a list to send to: the Link Roundups Mailchimp Tools Test list, with the group "are they Ben" option chosen: "Ben". The campaign title will be "Test Title Three Title", the test subject will be "Test Title Three Subject", and the template will be "Link Roundups Test 2"
Here's the MailChimp settings for the Link Roundups campaign editor: Many of the controls that you'd want to use to create and send a draft campaign.

More information

For more information about how the plugin works, see the Largo guide for administrators, the plugin's documentation on GitHub, or drop by one of our weekly open office hours sessions with your questions. You can also reach us by email at support@inn.org.

If you already have the Link Roundups plugin installed, keep an eye out for an update notice in your WordPress dashboard. If you'd like to install it, download it from the WordPress.org plugin repository or through your site dashboard's plugin page.

This update was funded in part by Energy News Network and Fresh Energy, with additional funding thanks to the generous support of the Democracy Fund, Ethics and Excellence in Journalism Fund, Open Society Foundation, and the John S. and James L. Knight Foundation.

Link Roundups is one of the many WordPress plugins maintained by INN Labs, the tech and product team at the Institute for Nonprofit News.

The Ticket Trap: Front to Back

Millions of motorists in Chicago have gotten a parking ticket. So when we built The Ticket Trap — an interactive news application that lets people explore ticketing patterns across the city — we knew that we’d be building something that shines a spotlight on an issue that affects people from all walks of life.

But we had a more specific story we needed to tell.

At ProPublica Illinois, we’d been reporting on Chicago’s aggressive parking and vehicle compliance ticket system for months. Our stories revealed a system that disproportionately punishes black and low-income residents and generates millions of dollars every year for the city by pushing massive debt onto Chicago’s poorest residents — even sending thousands into bankruptcy.

So when we thought about building an interactive database that allows the public, for the first time, to see all 54 million tickets issued over the last two decades, we wanted to make sure users understood the findings of the overall project. That’s why we centered the user experience around the disparities in the system, such as which wards have the most ticket debt and which have been hit hardest because residents can’t pay.

The Ticket Trap is a way for users to see lots of different patterns in tickets and to see how their wards fit into the bigger picture. It also gives civically active folks tools for talking about the issue of fines imposed by the city and helps them hold their elected officials accountable for how the city imposes debt.

The project also gave us an opportunity to try a bunch of technical approaches that could help a small organization like ours develop sustainable news apps. Although we’re part of the larger ProPublica, I’m the only developer in the Illinois office, so I want to make careful choices that will help keep our “maintenance debt” — the amount of time future-me will need to spend keeping old projects up and running — low.

Managing and minimizing maintenance debt is particularly important to small organizations that hope to do ambitious digital work with limited resources. If you’re at a small organization, or are just looking to solve similar problems, read on: These tools might help you, too.

In addition to lowering maintenance debt, I also wanted the pages to load quickly for our readers and to cost us as little as possible to serve. So I decided to eliminate, as much as possible, having executable code running on a server just to load pages that rarely change. That decision required us to solve some problems.

The development stack was JAMstack, which is a static front-end client with microservices to handle the dynamic features.

The learning curve for these technologies is steep (don’t worry if you don’t know what it all means yet). And while there are lots of good resources to learn the components, it can still be challenging to put them all together.

So let’s start with how we designed the news app before descending into the nerdy lower decks of technical choices. Design Choices

The Ticket Trap focuses on wards, Chicago’s primary political divisions and the most relevant administrative geography. Aldermen don’t legislate much, but they have more power over ticketing, fines, punishments and debt collection policies than anyone except the mayor.

We designed the homepage as an animated, sortable list that highlights the wards, instead of a table or citywide map. Our hope was to encourage users to make more nuanced comparisons among wards and to integrate our analysis and reporting more easily into the experience.

The top of the interface provides a way to select different topics and then learn about what they mean and their implications before seeing how the wards compare. If you click on “What Happens if You Don’t Pay,” you’ll learn that unpaid tickets can trigger late penalties, but they can also lead to license suspensions and vehicle impoundments. Even though many people from vulnerable communities are affected by tickets in Chicago, they’re not always familiar with the jargon, which puts them at a disadvantage when trying to defend themselves. Empowering them by explaining some basic concepts and terms was an important goal for us.

Below the explanation of terms, we display some small cards that show you the location of each ward, the alderman who represents it, its demographic makeup and information about the selected topic. The cards are designed to be easy to “skim and dive” and to make visual comparisons. You can also sort the cards based on what you’d like to know.

We included some code in our pages to help us track how many people used different features. About 50 percent of visitors selected a new category at least once and 27 percent sorted once they were in a category. We’d like to increase those numbers, but it’s in line with engagement patterns we saw for our Stuck Kids interactive graphic and better than we did on the interactive map in The Bad Bet, so I consider it a good start.

For more ward-specific information, readers can also click through to a page dedicated to their ward. We show much of the same information as the cards but allow you to home in on exactly how your ward ranks in every category. We also added some more detail, such as a map showing where every ticket in your ward has been issued.

We decided against showing trends over time on ward pages because the overall trend in the number of tickets issued is too big and complex a subject to capture in simple forms like line charts. As interesting as that may have been, it would have been outside the journalistic goals of highlighting systemic injustices.

For example, here’s the trend over time for tickets in the 42nd Ward (downtown Chicago). It’s not very revealing. Is there an upward trend? Maybe a little. But the chart says little about the overall effect of tickets on people’s lives, which is what we were really after.

On the other hand, the distributions of seizures/suspensions and bankruptcy are very revealing and show clear groupings and large variance, so each detail page includes visualizations of these variables.

Looking forward, there’s more we can do with these by layering on more demographic information and adding visual emphasis.

One last point about the design of these pages: I’m not a “natural” designer and look to colleagues and folks around the industry for inspiration and help. I made a map of some of those influences to show how many people I learned from as I worked on the design elements:

These include ProPublica news applications developer Lena Groeger’s work on Miseducation, as well as NPR’s Book Concierge, first designed by Danny DeBelius and most recently by Alice Goldfarb. I worked on both and picked up some design ideas along the way. Helga Salinas, then an engagement reporting fellow at ProPublica Illinois, helped frame the design problems and provided feedback that was crucial to the entire concept of the site. Technical Architecture

The Ticket Trap is the first news app at ProPublica to take this approach to mixing “baked out” pages with dynamic features like search. It’s powered by a static site generator (GatsbyJS), a query layer (Hasura), a database (Postgres with PostGIS) and microservices (Serverless and Lambda).

Let’s break that down:

  • Front-end and site generator: GatsbyJS builds a site by querying for data and providing it to templates built in React that handle all display-layer logic, both the user interface and content.
  • Deployment and development tools: A simple Grunt-based command line interface for deploying and administrative tasks.
  • Data management: All data analysis and processing is done in Postgres. Using GNU Make, the database can be rebuilt at any time. The Makefile also builds map tiles and uploads them to Mapbox. Hasura provides a GraphQL wrapper around Postgres so that GatsbyJS can query it, and GraphQL is just a query language for APIs.
  • Search and dynamic services: Search is handled by a simple AWS Lambda function managed with Serverless that ferries simple queries to an RDS database.

It’s all a bit buzzword-heavy and trendy-sounding when you say it fast. The learning curve can be steep, and there’s been a persistent and sometimes persuasive argument that the complexity of modern Javascript toolchains and frameworks like React are overkill for small teams.

We should be skeptical of the tech du jour. But this mix of technologies is the real deal, with serious implications for how we do our work. I found that once I could put all the pieces together, there was significantly less complexity than when using MVC-style frameworks for news apps, in my view.

Front End and Site Generator

GatsbyJS provides data to templates (built as React components) that contain both UI logic and content.

The key difference here from frameworks like Rails is that instead of splitting up templates and the UI (the classic “change template.html then update app.js” pattern), GatsbyJS bundles them together using React components. In this model, you factor your code into small components that bundle data and interactivity together. For example, all the logic and interface for the address search is in a component called AddressSearch. This component can be dropped into the code anywhere we want to show an address search using an HTML-like syntax (<AddressSearch />) or even used in other projects.

We’ll skip over what I did here, which is best summed up by this Tweet:

lol pic.twitter.com/UCpQK131J6— Thomas Wilburn (@thomaswilburn) January 16, 2019

There are better ways to learn React than my subpar code.

GatsbyJS also gives us a uniform system for querying our data, no matter where it comes from. In the spirit of working backward, look at this simplified query snippet from the site’s homepage, which provide access to data about each ward’s demographics, ticketing summary data, responsive images with locator maps for each ward, site configuration and editable snippets of text from a Google spreadsheet.

export const query = graphql` query PageQuery { configYaml { slug title description } allImageSharp { edges { node { fluid(maxWidth: 400) { ...GatsbyImageSharpFluid } } } } allGoogleSheetSortbuttonsRow { edges { node { slug label description } } } iltickets { citywideyearly_aggregate { aggregate { sum { current_amount_due ticket_count total_payments } } } wards { ward wardDemographics { white_pct black_pct asian_pct latino_pct } wardMeta { alderman address city state zipcode ward_phone email } wardTopFiveViolations { violation_description ticket_count avg_per_ticket } wardTotals { current_amount_due current_amount_due_rank ticket_count ticket_count_rank dismissed_ticket_count dismissed_ticket_count_rank dismissed_ticket_count_pct dismissed_ticket_count_pct_rank … } } }

Seems like a lot, and maybe it is. But it’s also powerful, because it’s the precise shape of the JSON that will be available to our template, and it draws on a variety of data sources: A YAML config file kept under version control (configYAML), images from the filesystem processed for responsiveness (allImageSharp), edited copy from Google Sheets (allGoogleSheetSortbuttonsRow) and ticket data from PostgreSQL (iltickets).

And data access in your template becomes very easy. Look at this snippet:

iltickets { wards { ward wardDemographics { white_pct black_pct asian_pct latino_pct } } }

In our React component, accessing this data looks like:

{data.iltickets.wards.map( (ward, i) => ( <p>Ward {ward.ward} is {ward.wardDemographics.latino_pct}% Latino.</p> ) )}

Every other data source works exactly the same way. The simplicity and consistency help keep templates clean and clear to read.

Behind the scenes, Hasura, a GraphQL wrapper for Postgres, is stitching together relational database tables and serializing them as JSON to pull in the ticket data.

Data Management

Hasura

Hasura occupies a small role in this project, but without it, the project would be substantially more difficult. It’s the glue that lets us build a static site out of a large database, and it allows us to query our Postgres database with simple JSON-esque queries using GraphQL. Here’s how it works.

Let’s say I have a table called “wards” with a one-to-many relationship to a table called “ward_yearly_totals”. Assuming I’ve set up the correct foreign key relationships in Postgres, a query from Hasura would look something like:

wards { ward alderman wardYearlyTotals { year ticket_count } }

On the back end, Hasura knows how to generate the appropriate join and turn it into JSON.

This process was also critical in working out the data structure. I was struggling with this but I realized that I just needed to work backward. Because GraphQL queries are declarative, I simply wrote queries that described the way I wanted the data to be structured for the front end and worked backward to create the relational database structures to fulfill those queries.

Hasura can do all sorts of neat things, but even the most simple use case — serializing JSON out of a Postgres database — is quite compelling for daily data journalism work.

Data Loading

GNU Make powers the data loading and processing workflow. I’ve written about this before if you want to learn how to do this yourself.

There’s a Python script (with tests) that handles cleaning up unescaped quotes and a few other quirks of the source data. We also use the highly efficient Postgres COPY command to load the data.

The only other notable wrinkle is that our source data is split up by year. That gives us a nice way to parallelize the process and to load partial data during development to speed things up.

At the top of the Makefile, we have these years:

PARKINGYEARS = 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

To load four years worth of data, processing in parallel across four processor cores looks like this:

PARKINGYEARS=”2015 2016 2017 2018" make -j 4 parking

Make, powerful as it is for filesystem-based workflows and light database work, has been more than a bit fussy when working so extensively with a database. Dependencies are hard to track without hacks, which means not all steps can be run without remembering and running prior steps. Future iterations of this project would benefit from either more clever Makefile tricks or a different tool.

However, being able to recreate the database quickly and reliably was a central tenet of this project, and the Makefile did just that.

Analysis and Processing for Display

To analyze the data and deliver it to the front end, we wrote a ticket loader (open sourced here) to use SQL queries to generate a series of interlinked views of the data. These techniques, which I learned from Joe Germuska when we worked together at the Chicago Tribune, are a very powerful way of managing a giant data set like the 54 million rows of parking ticket data used in The Ticket Trap.

The fundamental trick to the database structure is to take the enormous database of tickets and crunch it down into smaller tables that aggregate combinations of variables, then run all analysis against those tables.

Let’s take a look at an example. The query below groups by year and ward, along with several other key variables such as violation code. By grouping this way, we can easily ask questions like, “How many parking meter tickets were issued in the 3rd Ward in 2005?” Here’s what the summary query looks like:

create materialized view wardsyearly as select w.ward, p.violation_code, p.ticket_queue, p.hearing_disposition, p.year, p.unit_description, p.notice_level, count(ticket_number) as ticket_count, sum(p.total_payments) as total_payments, sum(p.current_amount_due) as current_amount_due, sum(p.fine_level1_amount) as fine_level1_amount from wards2015 w join blocks b on b.ward = w.ward join geocodes g on b.address = g.geocoded_address join parking p on p.address = g.address where g.geocode_accuracy > 0.7 and g.geocoded_city = 'Chicago' and ( g.geocode_accuracy_type = 'range_interpolation' or g.geocode_accuracy_type = 'rooftop' or g.geocode_accuracy_type = 'intersection' or g.geocode_accuracy_type = 'point' or g.geocode_accuracy_type = 'ohare' ) group by w.ward, p.year, p.notice_level, p.unit_description, p.hearing_disposition, p.ticket_queue, p.violation_code;

The virtual table created by this view looks like this:

This is very easy to query and reason about, and significantly faster than querying the full parking data set.

Let’s say we want to know how many tickets were issued by the Chicago Police Department in the 1st Ward between 2013 and 2017:

select sum(ticket_count) as cpd_tickets from wardsyearly where ward = '1' and year >= 2013 and year <= 2017 and unit_description = 'CPD'

The answer is 64,124 tickets. This query took 119 milliseconds on my system when I ran it, while a query to obtain the equivalent data from the raw parking records takes minutes rather than fractions of a second.

The Database as the “Single Source of Truth”

I promised myself when I started this project that all calculations and analysis would be done with SQL and only SQL. That way, if there's a problem with the data in the front end, there's only one place to look, and if there's a number displayed in the front end, the only transformation it undergoes is formatting. There were moments when I wondered if this was crazy, but it has turned out to be perhaps my best choice in this project.

With common table expressions (CTE), part of most SQL environments, I was able to do powerful things with a clear, if verbose, syntax. For example, we rank and bucket every ward by every key metric in the data. Without CTEs, this would be a task best accomplished with some kind of script with gnarly for-loops or impenetrable map/reduce functions. With CTEs, we can use impenetrable SQL instead! But at least our workflow is declarative and ensures any display of the data can and should contain no additional data processing.

Here’s an example of a CTE that ranks wards on a couple of variables using the intermediate summary view from above. Our real queries are significantly more complex, but the fundamental concepts are the same:

with year_bounds as ( select 2013 as min_year, 2017 as max_year ), wards_toplevel as ( select ward, sum(ticket_count) as ticket_count, sum(total_payments) as total_payments, from wardsyearly, year_bounds where (year >= min_year and year <= max_year) group by ward ) select ward, ticket_count, dense_rank() over (order by ticket_count desc) as ticket_count_rank, total_payments, dense_rank() over (order by total_payments desc) as total_payments_rank from wards_toplevel;

Geocoding

Geocoding the data — turning handwritten or typed addresses into latitude and longitude coordinates — was a critical step in our process. The ticket data is fundamentally geographic and spatial. Where a ticket is issued is of utmost importance for analysis. Because the input addresses can be unreliable, the address data associated with tickets was exceptionally messy. Geocoding this data was a six-month, iterative process.

An important technique we use to clean up the data is very simple. We “normalize” the addresses to the block level by turning street numbers like “1432 N. Damen” into “1400 N. Damen.” This gives us fewer addresses to geocode, which made it easier to repeatedly geocode some or all of the addresses. The technique doesn’t improve the data quality itself, but it makes the data significantly easier to work with.

Ultimately, we used Geocodio and were quite happy with it. Google's geocoder is still the best we've used, but Geocodio is close and has a more flexible license that allowed us to store, display and distribute the data, including in our Data Store.

We found that the underlying data was hard to manually correct because many of the errors were because of addresses that were truly ambiguous. Instead, we simply accepted that many addresses were going to cause problems. We omitted addresses that Geocodio wasn't confident about or couldn't pinpoint with enough accuracy. We then sampled and tested the data to find the true error rate.

About 12 percent of addresses couldn’t be used. Of the remaining addresses, sampling showed them to be about 94 percent accurate. The best we could do was make the most conservative estimates and try to communicate and disclose this clearly in our methodology.

To improve accuracy, we worked with Matt Chapman, a local civic hacker, who had geocoded the addresses without normalization using another service called SmartyStreets. We shared data sets and cross-validated our results. SmartyStreets’ accuracy was very close to Geocodio's. I attempted to see if there was a way to use results from both services. Each service did well and struggled with different types of address problems, so I wanted to know if combining them would increase the overall accuracy. In the end, my preliminary experiments revealed this would be technically challenging with negligible improvement. Deployment and Development Tools

The rig uses some simple shell commands to handle deployment and building the database. For example:

make all make db grunt publish grunt unpublish grunt publish --target=production Dynamic Search With Microservices

Because we were building a site with static pages and no server runtime, we had to solve the problem of offering a truly dynamic search feature. We needed to provide a way for people to type in an address and find out which ward that address is in. Lots of people don’t know their own wards or aldermen. But even when they do, there’s a decent chance they wouldn’t know the ward for a ticket they received elsewhere in the city.

To allow searching without needing to spin up any new services, we used Mapbox's autocomplete geocoder, AWS Lambda, to provide a tiny API, our Amazon Aurora database and Serverless to manage the connection.

Mapbox provides suggested addresses, and when the user clicks on one, we dispatch a request to the back-end service with the latitude and longitude, which are then run through a simple point-in-polygon query to determine the ward.

It’s simple. We have a serverless.yml config file that looks like this:

service: il-tickets-query plugins: - serverless-python-requirements - serverless-dotenv-plugin custom: pythonRequirements: dockerizePip: non-linux zip: true provider: name: aws runtime: python3.6 stage: ${opt:stage,'dev'} environment: ILTICKETS_DB_URL: ${env:ILTICKETS_DB_URL} vpc: securityGroupIds: - sg-XXXXX subnetIds: - subnet-YYYYY package: exclude: - node_modules/**

functions: ward: handler: handler.ward events: - http: method: get cors: true path: ward request: parameters: querystrings: lat: true lng: true

Then we have a handler.py file to execute the query:

try: import unzip_requirements except ImportError: pass import json import logging import numbers import os import records log = logging.getLogger() log.setLevel(logging.DEBUG) DB_URL = os.getenv('ILTICKETS_DB_URL')

def ward(event, context): qs = event["queryStringParameters"] db = records.Database(DB_URL) rows = db.query(""" select ward from wards2015 where st_within(st_setsrid(ST_GeomFromText('POINT(:lng :lat)'), 3857), wkb_geometry) """, lat=float(qs['lat']), lng=float(qs['lng']))

wards = [row['ward'] for row in rows]

if len(wards): response = { "statusCode": 200, "body": json.dumps({"ward": wards[0]}), "headers": { "Access-Control-Allow-Origin": "projects.propublica.org", } } else: response = { "statusCode": 404, "body": "No ward found", }

return response

That’s all there is to it. There are plenty of ways it could be improved, such as making the cross-origin resource sharing policies configurable based on the deployment stage. We’ll also be adding API versioning soon to make it easier to maintain different site versions. Minimizing Costs, Maximizing Productivity

The cost savings of this approach can be significant.

Using Amazon Lambda cost pennies per month (or less), while running even the smallest servers on Amazon’s Elastic Compute Cloud service usually costs quite a bit more. The thousands of requests and tens of thousands of milliseconds of computing time used by the app in this example are, by themselves, well within Amazon’s free tier. Serving static assets from Amazon’s S3 service also costs only pennies per month.

Hosting costs are a small part of the puzzle, of course — developer time is far more costly, and although this system may take longer up front, I think the trade-off is worth it because of the decreased maintenance burden. The time a developer will not have to spend maintaining a Rails server is time that he or she can spend reporting or writing new code.

For The Ticket Trap app, I only need to worry about a single, highly trusted and reliable service (our database) rather than a virtual server that needs monitoring and could experience trouble.

But where this system really shines is in its increased resiliency. When using traditional frameworks like Rails or Django, functionality like search and delivering client code are tightly coupled. So if the dynamic functionality breaks, the whole site will likely go down with it. In this model, even if AWS Lambda were to experience problems (which would likely be part of a major, internet-wide event), the user experience would be degraded because search wouldn’t work, but we wouldn’t have a completely broken app. Decoupling the most popular and engaging site features from an important but less-used feature minimizes the risks in case of technical difficulties.

If you’re interested in trying this approach, but don’t know where to begin, identify what problem you’d like to spend less time on, especially after your project is launched. If running databases and dynamic services is hard or costly for you or your team, try playing with Serverless and AWS Lambda or a similar provider supported by Serverless. If loading and checking your data in multiple places always slows you down, try writing a fast SQL-based loader. If your front-end code is always chaotic by the end of a development cycle, look into implementing the reactive pattern provided by tools like React, Svelte, Angular, Vue or Ractive. I learned each part of this stack one at a time, always driven by need.

read more...

Want to Start a Collaborative Journalism Project? We’re Building Tools to Help.

Today we’re announcing new tools, documentation and training to help news organizations collaborate on data journalism projects.

Newsrooms, long known for being cutthroat competitors, have been increasingly open to the idea of working with one another, especially on complex investigative stories. But even as interest in collaboration grows, many journalists don’t know where to begin or how to run a sane, productive partnership. And there aren’t many good tools available to help them work together. That’s where our project comes in.

Get the latest news from ProPublica every afternoon.

We’ll be sharing some of the software we built, and the lessons we learned, while creating our Documenting Hate project, which tracks hate crimes and bias-motivated harassment in the U.S.

The idea to launch Documenting Hate came shortly after Election Day 2016, in response to a widely reported uptick in hate incidents. Because data collection on hate crimes and incidents is so inadequate, we decided to ask people across the country to tell us their stories about experiencing or witnessing them. Thousands of people responded. To cover as many of their stories as we could, we organized a collaborative effort with local and national newsrooms, which eventually included more than 160 of them.

We’ll be building out and open-sourcing the tools we created to do Documenting Hate, as well as our Electionland project, and writing a detailed how-to guide that will let any newsroom do crowd-powered data investigations on any topic.

Even newsrooms without dedicated developers will be able to launch a basic shared investigation, including gathering tips from the public through a web-based form and funneling those tips into a central database that journalists can use to find stories and sources. Newsrooms with developers will be able to extend the tools to enable collaboration around any data sets.

We’ll also provide virtual trainings about how to use the tools and how to plan and launch crowd-powered projects around shared data sets.

This work will be a partnership with the Google News Initiative, which is providing financial support.

Launched in January 2017, ProPublica’s Documenting Hate project is a collaborative investigation of hate crimes and bias incidents in the United States. The Documenting Hate coalition is made up of more than 160 newsrooms and several journalism schools that collect tips from the public and records from police to report on hate. Together we’ve produced close to 200 stories. That work will continue in 2019.

We’re already hard at work writing a how-to guide on collaborative, crowd-powered data projects. We’ll be talking about it at the 2019 NICAR conference in Newport Beach, California, in March. We are also hiring a contract developer to work on this; read the job description and apply here.

The first release of the complete tools and playbook will be available this summer, and online trainings will take place in the second half of the year.

There are a thousand different ways to collaborate around shared data sets. We want to hear from you about what would be useful in our tool, and we’re interested in hearing from newsrooms that might be interested in testing our tools. Sign up for updates here.

read more...

Chasing Leads and Herding Cats: Shaping a New Role in the Newsroom

In this ever-changing industry, new roles are emerging that redefine how we do journalism: audience engagement director, social newsgathering reporter, Snapchat video producer. At ProPublica, I’ve been part of developing a new role for our newsroom. My title is partner manager, and I lead a large-scale collaboration: Documenting Hate, an investigative project to track and report on hate crimes and bias incidents.

ProPublica regularly collects large amounts of information that we can’t process by ourselves, including documents gathered in our reporting, tips solicited by our engagement journalists, and data published in our news applications.

Get the latest news from ProPublica every afternoon.

Since the beginning, we’ve seen collaboration as a key way to make sure that all of this reporting material can be used to fulfill our mission: to make an impact in the real world. Collaboration has been a fundamental part of ProPublica’s journalism model. We make our stories available to republish for free through Creative Commons and usually co-publish or co-report stories with other news outlets. When it comes to large data sets, we often offer up our findings to journalists or the public to enable new reporting. It’s a way of spreading the wealth, so to speak. Collaborations are typically a core responsibility of each editor in the newsroom, but some of our projects have large-scale collaborations at their center, and they require dedicated and sustained attention.

My role emerged after Electionland 2016, one of the largest-ever journalism collaborations, which many ProPublica staff members pitched in to organize. While the project was a journalistic success, its editors learned a key lesson about the need for somebody to own the relationship with partner newsrooms. In short, we came to think that the collaboration itself was something that needed editing, including recruiting partners, making sure they saw the reporting tips they needed to see, and tracking what partners were publishing. It also reinforced the need for a more strategic tip-sharing approach after the success of large engagement projects, like Lost Mothers and Agent Orange, which garnered thousands of leads — and more stories than we had time to tell.

That’s how my role was born. Soon after the 2016 election, ProPublica launched Documenting Hate. Hiring a partner manager was the first priority. We also hired a partner manager to work on Electionland 2018, which will cover this year’s midterm elections.

Our newsroom isn’t alone in dedicating resources to this type of role. Other investigative organizations, such as Reveal from the Center for Investigative Reporting and the International Consortium of Investigative Journalists, staffed up to support their collaborations. Heather Bryant — who founded Project Facet, which helps newsrooms work together — told me there are at least 10 others who manage long-term collaborations at newsrooms across the country, from Alaska, to Texas, to Pennsylvania. What I Do

My job is a hybrid of roles: reporter, editor, researcher, social media producer, recruiter, trainer and project manager.

I recruited our coalition of newsrooms, and I vet and onboard partners. To date, we have more than 150 national and local newsrooms signed on to the project, plus nearly 20 college newspapers. I speak to a contact at each newsroom before they join, and then I provide them with the materials they need to work on the project. I’ve written training materials and conduct online training sessions so new partners can get started more quickly.

The core of this project is a shared database of tips about hate incidents that we source from the public. For large collaborations like Documenting Hate and Electionland, our developer Ken Schwencke builds these private central repositories, which are connected directly to our tip submission form. We use Screendoor, a form-building service, to host the tip form.

In large-scale collaborations, we invite media organizations to be part of the newsgathering process. For Documenting Hate, we ask partners to embed this tip submission form to help us gather story leads. That way, we can harness the power of different audiences around the country, from Los Angeles Times readers, to Minnesota Public Radio listeners, to Univision viewers. At ProPublica, we try to talk about the project as much as we can in the media and at conferences to spread the word to both potential tipsters and partners.

The tips we gather are available to participating journalists — helping them to do their job and produce stories they might otherwise not have found. ProPublica and our partners have reported more than 160 stories, including pieces about hate in schools, on public transportation and on the road, in the workplace, and at places of worship, and incidents involving the president’s name and policies, to name just a few. Plus, each authenticated tip acts as a stepping stone for other partners to build on their reporting.

At ProPublica, we’ve been gathering lots of public records from police on hate crimes to do our own reporting and sharing those records with partners, too. Any time we produce an investigation in-house, I share the information we have available so reporters can republish or localize the story.

As partner manager, I’m a human resource to share knowledge. I’ve built expertise in the hate beat and serve as a kind of research desk for our network, pointing reporters to sources and experts. I host a webinar or training once a month to help reporters understand the project or to build this beat, and I send out a weekly internal newsletter.

Another part of my job is being an air-traffic controller, sending out incoming tips to reporters who might be interested and making sure that multiple people aren’t working on the same tip at the same time. This is especially important in a project like ours; given the sensitivity of the subject, we don’t want to scare off tipsters by having multiple reporters reach out at once. I pitch story ideas based on patterns I’ve identified to journalists who might want to dig further. I’m constantly researching leads to share with our network and with specific journalists working on relevant stories.

And I’m also a signal booster: When partners publish reporting on hate, we share their work on our social channels to make sure these stories get as big an audience as possible. We keep track of all of the stories that were reported with sourcing from the project to make them available in one place. The Challenges

While the Documenting Hate project has produced some incredible work, this is not an easy job.

Many journalists are eager to work with ProPublica, but not always with each other; it can be a process to get buy-in from editors to collaborate with a network of newsrooms, especially at large ones where there are layers of hierarchy. Some reporters agree to join but don’t make it all the way through onboarding, which involves several steps that may require help from others in their newsrooms. Some explore the database and don’t see anything they want to follow up on right away, and then lose interest. And occasionally journalists are so overwhelmed with their day-to-day work that I rarely hear back from them after they’ve joined.

Turnover and layoffs, which are depressingly common in our industry, mean having to find and onboard new contacts in partner newsrooms, or relying on bounce-back emails to figure out who’s left. It also means that sometimes engaged reporters move into positions at new companies where they don’t cover hate, leaving a gap in their old newsrooms. A relentless news cycle doesn’t help, either. For example, after the 2017 violence in Charlottesville, Virginia, caused a renewed surge in interest in the hate beat, a series of deadly hurricanes hit, drawing a number of reporters onto the natural disaster beat for a time.

And because of the sensitivity of the incidents, tipsters sometimes refuse to talk after they’ve written in, which can be discouraging for reporters. Getting a story may mean following up on a dozen tips rather than just one or two. Luckily, since we’ve received thousands of tips and hundreds of records, active participants in our coalition have found plenty of material to work on. The Future of Partnerships

While collaborations aren’t always easy, I believe projects like Documenting Hate are likely to be an important part of the future of journalism. Pooling resources and dividing and conquering on reporting can help save time and money, which are in increasingly short supply.

Some partnerships are the fruit of necessity, linking small newsrooms in the same region or state, like Coast Alaska, or creating stronger ties between affiliates within a large network, like NPR. I think there’s huge potential for more local collaborations, especially with shrinking budgets and personnel. Other partnerships emerge out of opportunity, like the Panama Papers investigation, which was made possible by a massive document leak. If more newsrooms resisted the urge for exclusivity — a concept that matters far more to journalists than to the public — more partnerships could be built around data troves and leaks.

Another area of potential is to band together to request and share public records or to pool funding for more expensive requests; these costs can prevent smaller newsrooms from doing larger investigations. I also think there’s a ton of opportunity to collaborate on specific topics and beats to share knowledge, best practices and reporting.

With new partnerships comes the need for someone at the helm, navigating the ship. While many newsrooms’ finances are shrinking, any collaborative project can have a coordinator role baked into the budget. An ideal collaborations manager is a journalist who understands the day-to-day challenges of newsrooms, is fanatical about project management, is capable of sourcing and shaping stories, and can track the reach and impact of work that’s produced.

We all benefit when we work together — helping us reach wider audiences, do deeper reporting and better serve the public with our journalism.

read more...

New Partnership Will Help Us Hold Facebook and Campaigns Accountable

We launched a new collaboration on Monday that will make it even easier to be part of our Facebook Political Ad Collector project.

In case you don’t know, the Political Ad Collector is a project to gather targeted political advertising on Facebook through a browser extension installed by thousands of users across the country. Those users, whose data is gathered completely anonymously, help us build a database of micro-targeted political ads that help us hold Facebook and campaigns accountable.

On Monday, Mozilla, maker of the Firefox web browser, is launching the Firefox Election Bundle, a special election-oriented version of the browser. It comes pre-installed with ProPublica’s Facebook Political Ad Collector and with an extension Mozilla created called Facebook Container.

The Facebook Container, according to Mozilla, helps users control what data Facebook collects about their browsing habits when they visit sites other than Facebook.

People who choose to download the Firefox Election Bundle will automatically begin participating in the Facebook Political Ad Collector project and will also benefit from the extra privacy controls that come with the Facebook Container project. The regular version of Firefox is, of course, still available.

Think of it as turning the tables. Instead of Facebook watching you, you can maintain control over what Facebook can see while helping keep an eye on Facebook’s ads.

You can download the Firefox Election Bundle here.

If you use Firefox and already have the Facebook Political Ad Collector installed, you can install Mozilla’s Facebook Container add-on here.

If you want to find out more about the Facebook Political Ad Collector project, you can read this story or browse the ads we’ve already collected.

read more...

The Election DataBot: Now Even Easier

We launched the Election DataBot in 2016 with the idea that it would help reporters, researchers and concerned citizens more easily find and tell some of the thousand stories in every political campaign. Now we’re making it even easier.

Just as before, the DataBot is a continuously updating feed of campaign data, including campaign finance filings, changes in race ratings and deleted tweets. You can watch the data come in in real time or sign up to be notified by email when there’s new data about races you care about.

DataBot’s new homepage dashboard of campaign activity now includes easy-to-understand summaries so that users can quickly see where races are heating up. We’ve added a nationwide map that shows you where a variety of campaign activity is occurring every week.

For example, the map shows that both leading candidates in Iowa’s 1st District saw spikes in Google searches in the week ending on Sept. 16 (we track data from Monday to Sunday). The Cook Political Report, which rates House and Senate races, changed its rating of that race from “Tossup” to “Lean Democratic” on Sept. 6.

When super PACs spend a lot of money in a House or Senate race, you’ll see it on the map. When Google search traffic spikes for a candidate, that’ll show up, too. We’re also tracking statements by incumbent members of Congress and news stories indexed by Google News. So when you get an email alerting you to new activity (you did sign up for alerts, right?), you can see at a glance the level of activity in the race.

The new homepage also allows you to look back in time to see how campaign activity has changed during the past 15 weeks, and whether what you’re seeing this week is really different than it was before. We’ve also added a way to focus on the races rated the most competitive by the Cook Political Report.

In order to highlight the most important activity, we weighted activity by type. Independent expenditures — where party committees and outside interest groups are choosing to spend their money — count twice as much as other types of activity.

Instead of state-level presidential election forecasts, we now are tracking changes to FiveThirtyEight’s “classic” forecast for each House and Senate contest. We’ve also added candidate statements for more than 500 campaigns whose websites produce a feed of their content.

The homepage map is just the first step in a more useful experience for DataBot users. We’ll be adding other layers of summary data, including details on social media activity, to the homepage, and additional ways to see how races have changed based on the activity feeds.

We’ll also be working to make the individual firehose item descriptions more useful; for example, saying whether a campaign finance filing has the most money raised or spent for that candidate compared with other reports.

We’d love to hear from you about ways to make Election DataBot more useful as Nov. 6 approaches.

read more...

Shedding Some Light on Dark Money Political Donors

On Wednesday we added details to our FEC Itemizer database on nearly $763 million in contributions to the political nonprofit organizations — also known as 501(c)(4) groups — that have spent the most money on federal elections during the past eight years. The data is courtesy of Issue One, a nonpartisan, nonprofit advocacy organization that is dedicated to political reform and government ethics.

These contributions often are called “dark money” because political nonprofits are not required to disclose their donors and can spend money supporting or opposing political candidates. By using government records and other publicly available sources, Issue One has compiled the most comprehensive accounting of such contributions to date.

To compile the data, Issue One identified the 15 political nonprofits that reported spending the most money in federal elections since the Supreme Court decision in Citizens United v. FEC in early 2010. It then found contributions using corporate filings, nonprofit reports and documents from the Internal Revenue Service, Department of Labor and Federal Election Commission. One of the top-spending political nonprofits, the National Association of Realtors, is almost entirely funded by its membership and has no records in this data.

Get info about new and updated data from ProPublica.

For each contribution, you can see the source document detailing the transaction in FEC Itemizer.

The recipients are a who’s who of national political groups: Americans for Prosperity, the National Rifle Association Institute for Legislative Action, the U.S. Chamber of Commerce and Planned Parenthood Action Fund Inc. account for more than half of the $763 million in contributions in the data. There’s also American Encore, formerly the Center to Protect Patient Rights, one of the main conduits for the conservative financial network created by Charles and David Koch.

The largest donor is the Freedom Partners Chamber of Commerce, a Koch-organized business association that has contributed at least $181 million to the leading political nonprofits. Other donors include the Susan Thompson Buffett Foundation, which has given at least $25 million to the Planned Parenthood Action Fund, and major labor unions like the American Federation of State, County and Municipal Employees, or AFSCME, which has given at least $2.8 million to Democratic political nonprofit organizations.

Also among the donors are major corporations like Dow Chemical (mostly giving to the U.S. Chamber of Commerce), gun manufacturers (to the NRA), 501(c)(3) charities and individuals.

You can read Issue One’s report on its work as well as its methodology for discovering the contribution records. Because many of the sources are documents that are filed annually, this data won’t be updated the same way that FEC Itemizer is for campaign finance filings, but it represents the most comprehensive collection of dark money contributions to date.

read more...

Download Chicago’s Parking Ticket Data Yourself

ProPublica Illinois has been reporting all year on how ticketing in Chicago is pushing tens of thousands of drivers into debt and hitting black and low-income motorists the hardest. Last month, as part of a collaboration with WBEZ, we reported on how a city decision to raise the cost of citations for not having a required vehicle sticker has led to more debt — and not much more revenue.

We were able to tell these stories, in part, because we obtained the city of Chicago’s internal database for tracking parking and vehicle compliance tickets through a Freedom of Information request jointly filed by both news organizations. The records start in 2007, and they show you details on when and where police officers, parking enforcement aides, private contractors and others have issued millions of tickets for everything from overstaying parking meters to broken headlights. The database contains nearly 28.3 million tickets. Altogether, Chicago drivers still owe a collective $1 billion for these tickets, including late penalties and collections fees.

Now you can download the data yourself; we’ve even made it easier to import. We’ve anonymized the license plates to protect the privacy of drivers. As we get more records, we’ll update the data.

We’ve found a number of stories hidden in this data, including the one about city sticker tickets, but we’re confident there are more. If you see something interesting, email us. Or if you use the data for a project of your own — journalistic or otherwise — tell us. We’d love to know.

read more...

How ProPublica Illinois Uses GNU Make to Load 1.4GB of Data Every Day

I avoided using GNU Make in my data journalism work for a long time, partly because the documentation was so obtuse that I couldn’t see how Make, one of many extract-transform-load (ETL) processes, could help my day-to-day data reporting. But this year, to build The Money Game, I needed to load 1.4GB of Illinois political contribution and spending data every day, and the ETL process was taking hours, so I gave Make another chance.

Now the same process takes less than 30 minutes.

Here’s how it all works, but if you want to skip directly to the code, we’ve open-sourced it here.

Fundamentally, Make lets you say:

  • File X depends on a transformation applied to file Y
  • If file X doesn’t exist, apply that transformation to file Y and make file X

This “start with file Y to get file X” pattern is a daily reality of data journalism, and using Make to load political contribution and spending data was a great use case. The data is fairly large, accessed via a slow FTP server, has a quirky format, has just enough integrity issues to keep things interesting, and needs to be compatible with a legacy codebase. To tackle it, I needed to start from the beginning. Overview

The financial disclosure data we’re using is from the Illinois State Board of Elections, but the Illinois Sunshine project had released open source code (no longer available) to handle the ETL process and fundraising calculations. Using their code, the ETL process took about two hours to run on robust hardware and over five hours on our servers, where it would sometimes fail for reasons I never quite understood. I needed it to work better and work faster.

The process looks like this:

  • Download data files via FTP from Illinois State Board Of Elections.
  • Clean the data using Python to resolve integrity issues and create clean versions of the data files.
  • Load the clean data into PostgreSQL using its highly efficient but finicky “\copy” command.
  • Transform the data in the database to clean up column names and provide more immediately useful forms of the data using “raw” and “public” PostgreSQL schemas and materialized views (essentially persistently cached versions of standard SQL views).

The cleaning step must happen before any data is loaded into the database, so we can take advantage of PostgreSQL’s efficient import tools. If a single row has a string in a column where it’s expecting an integer, the whole operation fails.

GNU Make is well-suited to this task. Make’s model is built around describing the output files your ETL process should produce and the operations required to go from a set of original source files to a set of output files.

As with any ETL process, the goal is to preserve your original data, keep operations atomic and provide a simple and repeatable process that can be run over and over.

Let’s examine a few of the steps: Download and Pre-import Cleaning

Take a look at this snippet, which could be a standalone Makefile:

data/download/%.txt : aria2c -x5 -q -d data/download --ftp-user="$(ILCAMPAIGNCASH_FTP_USER)" --ftp-passwd="$(ILCAMPAIGNCASH_FTP_PASSWD)" ftp://ftp.elections.il.gov/CampDisclDataFiles/$*.txt

data/processed/%.csv : data/download/%.txt python processors/clean_isboe_tsv.py $< $* > $@

This snippet first downloads a file via FTP and then uses Python to process it. For example, if “Expenditures.txt” is one of my source data files, I can run make data/processed/Expenditures.csv to download and process the expenditure data.

There are two things to note here.

The first is that we use Aria2 to handle FTP duties. Earlier versions of the script used other FTP clients that were either slow as molasses or painful to use. After some trial and error, I found Aria2 did the job better than lftp (which is fast but fussy) or good old ftp (which is both slow and fussy). I also found some incantations that took download times from roughly an hour to less than 20 minutes.

Second, the cleaning step is crucial for this dataset. It uses a simple class-based Python validation scheme you can see here. The important thing to note is that while Python is pretty slow generally, Python 3 is fast enough for this. And as long as you are only processing row-by-row without any objects accumulating in memory or doing any extra disk writes, performance is fine, even on low-resource machines like the servers in ProPublica’s cluster, and there aren’t any unexpected quirks. Loading

Make is built around file inputs and outputs. But what happens if our data is both in files and database tables? Here are a few valuable tricks I learned for integrating database tables into Makefiles:

One SQL file per table / transform: Make loves both files and simple mappings, so I created individual files with the schema definitions for each table or any other atomic table-level operation. The table names match the SQL filenames, the SQL filenames match the source data filenames. You can see them here.

Use exit code magic to make tables look like files to Make: Hannah Cushman and Forrest Gregg from DataMade introduced me to this trick on Twitter. Make can be fooled into treating tables like files if you prefix table level commands with commands that emit appropriate exit codes. If a table exists, emit a successful code. If it doesn’t, emit an error.

Beyond that, loading consists solely of the highly efficient PostgreSQL \copy command. While the COPY command is even more efficient, it doesn’t play nicely with Amazon RDS. Even if ProPublica moved to a different database provider, I’d continue to use \copy for portability unless eking out a little more performance was mission-critical.

There’s one last curveball: The loading step imports data to a PostgreSQL schema called raw so that we can cleanly transform the data further. Postgres schemas provide a useful way of segmenting data within a single database — instead of a single namespace with tables like raw_contributions and clean_contributions, you can keep things simple and clear with an almost folder-like structure of raw.contributions and public.contributions. Post-import Transformations

The Illinois Sunshine code also renames columns and slightly reshapes the data for usability and performance reasons. Column aliasing is useful for end users and the intermediate tables are required for compatibility with the legacy code.

In this case, the loader imports into a schema called raw that is as close to the source data as humanly possible.

The data is then transformed by creating materialized views of the raw tables that rename columns and handle some light post-processing. This is enough for our purposes, but more elaborate transformations could be applied without sacrificing clarity or obscuring the source data. Here’s a snippet of one of these view definitions:

CREATE MATERIALIZED VIEW d2_reports AS SELECT id as id, committeeid as committee_id, fileddocid as filed_doc_id, begfundsavail as beginning_funds_avail, indivcontribi as individual_itemized_contrib, indivcontribni as individual_non_itemized_contrib, xferini as transfer_in_itemized, xferinni as transfer_in_non_itemized, # …. FROM raw.d2totals WITH DATA;

These transformations are very simple, but simply using more readable column names is a big improvement for end-users.

As with table schema definitions, there is a file for each table that describes the transformed view. We use materialized views, which, again, are essentially persistently cached versions of standard SQL views, because storage is cheap and they are faster than traditional SQL views. A Note About Security

You’ll notice we use environment variables that are expanded inline when the commands are run. That’s useful for debugging and helps with portability. But it’s not a good idea if you think log files or terminal output could be compromised or people who shouldn’t know these secrets have access to logs or shared systems. For more security, you could use a system like the PostgreSQL pgconf file and remove the environment variable references. Makefiles for the Win

My only prior experience with Make was in a computational math course 15 years ago, where it was a frustrating and poorly explained footnote. The combination of obtuse documentation, my bad experience in school and an already reliable framework kept me away. Plus, my shell scripts and Python Fabric/Invoke code were doing a fine job building reliable data processing pipelines based on the same principles for the smaller, quick turnaround projects I was doing.

But after trying Make for this project, I was more than impressed with the results. It’s concise and expressive. It enforces atomic operations, but rewards them with dead simple ways to handle partial builds, which is a big deal during development when you really don’t want to be repeating expensive operations to test individual components. Combined with PostgreSQL’s speedy import tools, schemas, and materialized views, I was able to load the data in a fraction of the time. And just as important, the performance of the new process is less sensitive to varying system resources.

If you’re itching to get started with Make, here are a few additional resources:

In the end, the best build/processing system is any system that never alters source data, clearly shows transformations, uses version control and can be easily run over and over. Grunt, Gulp, Rake, Make, Invoke … you have options. As long as you like what you use and use it religiously, your work will benefit.

read more...

Weekly readings for #pubmedia, 5 Mar 2018and, a change in direction

Hello friends – I’ve had this blog since 2003, but for the past three years it has been largely used to share links of interest to the public media community that I’d posted to my @haarsager Twitter feed.  This link sharing actually dates from 1997, when I started to share them to an email list.  So, for nearly 21 years, the value to me in all this has been as a discipline to force me to keep up on industry developments.  I’m going to keep doing that, but putting this compilation together takes an additional hour per week.  I’d rather use that time to do some related writing, which I intend to have appear here in place of these links, though most of them will be inspired by the reading I will continue to do.  Thanks for the nice notes I’ve received from many of you about this weekly effort. 

For now, here is the last compilation.  You can continue to get these in “real time” from <http://www.twitter.com/haarsager>.  --Dennis

ATSC 3.0/HbbTV

  • Public TV urges FCC to exempt stations from ATSC 3.0 simulcasting rules.  Current

Broadband/Wireless

  • FCC’s new broadband map paints an irresponsibly inaccurate picture of American broadband.  Vice Motherboard

Cable/Satellite TV/MVPD/Pay-TV/Cord-Cutting

  • U.S. cable, satellite, telcoTV lost 3.5M subs in 2017.  nScreenMedia

Digital Video/OTT/VOD

  • Pay streaming households to reach 450M mark by 2022.  Rapid TV News
  • vMVPD customer base reaches 4.6M, but has only captured a third of cord cutters.  FierceCable

Journalism

  • must read  ‘It’s going to end in tears’: Reality check is coming for subscription-thirsty publishers.  Digiday
  • must read  There is no easy fix for Facebook’s reliability problem.  Frédéric Filloux in Monday Note
  • must read  Washington Post Executive Editor Martin Baron delivers Reuters Memorial Lecture at the University of Oxford.  Washington Post [thankfully, at the moment, this doe not seem to behind the Post’s pay wall]

Radio/Podcasting/Digital Audio

Weekend readings for #pubmedia, Feb. 24, 2018

Here’s another collection (a bit slimmer than usual for some reason) of selected links from my @haarsager Twitter feed.  --Dennis

ATSC 3.0/HbbTV

  • HPA panel examines road to ATSC 3.0 and repack.  TVTechnology

Cable/Satellite/MVPD/Cord-Cutting

  • How did satellite TV go from a $50B business to ‘less than zero’ in three short years?  FierceCable  …and…  Dish loses 200K more linear TV subscribers in Q4; value of satellite business ‘less than zero,’ analyst postulates.  FierceCable

Digital Video/OTT/VOD

  • Buoyant SVOD boosts U.S. TV market to be worth $140B by end of 2018.  Rapid TV News
  • TV Everywhere use encouraging SVOD adoption?  nScreenMedia

Journalism

Radio/Podcasting/Digital Audio

Weekly readings for #pubmedia, Feb. 19, 2018

Got 5 inches of snow overnight Sat./Sun., but the temperature could reach 70° on Wednesday.  We live in hope.  Here is a compilation of recent selected links from my @haarsager Twitter feed.  --Dennis

Broadband/Wireless

Why broadband competition at faster speeds is virtually nonexistent.  Vice Motherboard

Digital Video/VOD/OTT

must read  Smartphone video, connected TV increase penetration and usage.  nScreenMedia

Management/Strategy

What makes public radio ‘very personal’ magnifies its #metoo cases.  New York Times

Trolling as a business model is making trollery the dominant form of American discussion.  Umair Haque in Medium

Trump’s budget again proposes elimination of Public TV [and radio] … funding.  Variety

A profit model for 21st century journalism.  Michael Rosenblum in Medium

Radio/Podcasting/Digital Audio

Public radio’s public reckoning.  Village Voice

Repack

Bill to address repack shortcomings advances.  TVTechnology

FCC will open April window for auction-displaced LPTVs.  Broadcasting & Cable

Social Media

Confessions of a publishing consultant on Facebook’s news feed changes.  Digiday

Twitter begins broadcasting local TV news during breaking news events.  FierceCable

must read  TV nets missing opportunity with Instagram.  nScreenMedia

Television

Why have TV viewers stopped channel surfing?  MediaPost

Netflix has taken $3B-$6B of TV ad revenue off the table.  nScreenMedia

Weekend readings for #pubmedia, Feb. 9, 2018

Here is the latest compilation of selected links from my @haarsager Twitter feed.  --Dennis

ATSC 3.0/Next-Gen TV/HbbTV

Digital Video/OTT/VOD

  • 5% of U.S. broadband users subscribe to a vMVPD.  FierceCable
  • must read  Shift in ad spending from TV to OTT expected over next two years.  Video Nuze
  • must read  How Die Welt gets people to watch video on its own site.  Digiday
  • must read  Facebook deëmphasizes news feed video; users’ time spent drops.  Video Nuze  …and…  The local-national news divide on Google and Facebook.  Axios
  • SVODs to boost original content annual spend to $10B by 2022.  TVTechnology
  • U.S. viewers have a love/hate relationship with live streaming.  Rapid TV News
  • Most use smart TVs to stream, won’t displace Roku anytime soon.  nScreenMedia

Journalism

  • YouTube announces it will start flagging videos published by organizations that receive government funding.   The Hill
  • Martha Raddatz: Media ‘watching each other a little more’ after missteps reporting on Trump.  Politico
  • must read  Why news publishers should consider the “smart curation” market.  Frederic Filloux in Monday Note

Radio/Podcasting/Digital Audio

  • must read (TV too)  Local radio’s digital future.  Jacobs Media Strategies
  • Paid listens on...  RadioPublic
  • must read  Podcast listeners really are the holy grail advertisers hoped they’d be.  Wired

Repack

Strategy/Business/Management

Television

  • TV, video ad growth pinned to advanced TV efforts.  MediaPost
  • 4K TVs build market presence as global LCD TV shipments fall to three-year low.  Rapid TV News
  • Inside Jeffrey Katzenberg’s billion-dollar bet to crack the code on mobile video.  Digiday

Weekend readings for #pubmedia, Jan. 26, 2018

Just became unburied from a lengthy writing project, so took a pass on doing this last weekend.  So this collection of links from http://www.twitter.com/haarsager covers the past couple of weeks.  --Dennis

ATSC 3.0/Next-Gen TV/HbbTV

  • ATSC 3.0: Broadcasters tout standard’s power ahead of full implementation.  Cablefax 
  • Sinclair, Nexstar team up with American Tower for ATSC 3.0 SFN sites in Dallas.  FierceCable
  • FCC ponders giving broadcasters ATSC 3.0 carriage flexibility.  Broadcasting & Cable

Digital Video/VoD/OTT

  • must read  Video’s peril – and promise.  Steven Rosenbaum in MediaPost
  • Growing pains for OTT.  TVTechnology
  • Customer experience, usage data can get lost in SVOD distribution deals.  nScreenMedia
  • Why you need a comprehensive OTT strategy.  Nielsen  …and…  Cutting-edge content from digital publishers keeps millennials coming back for more.  Nielsen

Journalism

  • must read  It’s time for journalism to build its own platforms.  Heather Bryant in Monday Note
  • The Guardian heads back into the black.  The Economist

Radio/Podcasting/Digital Audio

Strategy/Management/Business

  • A broadcaster’s guide to Washington issues.  David Oxenford & David O’Connor in TVNewsCheck

Social Media

  • must read  How Facebook’s media divorce could backfire.  Vanity Fair  …and…  Facebook’s move to deemphasize video in news feed has consequences.  Video Nuze  …and…  Facebook’s news feed changes sees brand videos taking hit.  Rapid TV News  …and…  We were all feeling hostage to Facebook.  Digiday  …and… Facebook is done with quality journalism. Deal with it.  Frederic Filloux in Monday Note

Television

  • Groups likely to expand program production.  TVNewsCheck
  • The 360-degree news video siren song.  TVTechnology
  • ‘8K? I don’t even have 4K yet?’ The future of television is still far off.  Digital Trends

Weekend readings for #pubmedia, Jan. 14, 2018

Here is this week’s compilation of selected links from my @haarsager Twitter feed.  CES is responsible for a few entries.  --Dennis

ATSC 3.0/Next-Gen TV/HbbTV

  • Hands-on with 3.0 OTA TV at CES.  Cord Cutters News
  • ATSC 3.0 standard approved, Technicolor HDR proposal expected as part of standard.  CEPro
  • LG plans ATSC 3.0 4k broadcast testing in U.S. this year.  hdreport

Broadband/Wireless/Spectrum

  • AT&T reined in 600-MHz bidding as it closed in on FirstNet.  FierceWireless  …and…  AT&T looks to sell remaining 600-MHz spectrum.  FierceWireless
  • must read  600-MHz incentive auction ‘extravaganza’ ends with a whimper.  FierceWireless
  • Was the spectrum auction necessary?  Radio Magazine

Consumer Electronics

Digital Video/VOD/OTT

  • TiVo: 20% of daily life spent with video.  Video Nuze
  • TV sets remain as prime device for VOD viewing.  Rapid TV News

Radio/Podcasting/Digital Audio

  • must read (TV, too)  How public radio’s risk-adverse culture impedes its chances for success.  Eric Nuzum in Current

Repack

Social Media

  • The problem with Facebook.  MediaPost
  • must read  Facebook tells publishers big change is coming to News Feed.  Digiday

Television

  • must read  TV still claims most-favored screen status.  Rapid TV News  …and…  TV sets remain as prime device for VOD viewing.  Rapid TV News  …and… Three-quarters of U.S. consumers have a connected TV.  Rapid TV News
  • TiVo: 20% of daily life spent with video.  Video Nuze

This weeks readings for #pubmedia, 8 Jan. 2018

First compilation of the new year: This week’s selected links from my @haarsager Twitter feed.  --Dennis

ATSC 3.0/Next-Gen TV/HbbTV

  • Sony, Pearl TV partner for ATSC 3.0 next-gen TV program guide.  FierceCable
  • On tap for TV at CES: 3.0, voice, gadgets.  TVNewsCheck
  • Free TV keeps getting better: Welcome ATSC 3.0.  ElectronicDesign
  • ATSC 3.0: Now it’s up to us.  TVTechnology

Broadband/Wireless

  • AT&T plans to launch mobile 5G in a dozen cities by late 2018.  FierceWireless
  • ‘Alarming’ unlimited [smartphone] data usage: 31.4 GB per month and rising.  FierceWireless

Digital Video/OTT/VOD

  • 2018 could be the year Facebook banishes news from its feed.  Digiday
  • Streaming becomes mainstream as cord-cutting accelerates.  Chicago Tribune

Journalism

  • must read  Fewer Americans rely on TV news; what type they watch varies by who they are.  Pew Research
  • 2018 could be the year Facebook banishes news from its feed.  Digiday
  • Reporters, once set against paywalls, have warmed to them.  Digiday

Radio/Podcasting/Digital Audio

Social Media

  • 2018 could be the year Facebook banishes news from its feed.  Digiday

Strategy/Management/Legal

  • The great consolidation: How 2017 paved the way for the next gilded age of television.  Paste Magazine

Television

  • 487 original programs aired in 2017. Bet you didn’t watch them all.  New York Times
  • Record number of OLED TVs shipped at end of 2017.  Rapid TV News
  • must read  Live TV and movie habits to continue sharp decline in 2018.  nScreenMedia

Research Director Jonathan Albright on Russian Ad Networks

This week, Research Director Jonathan Albright has published a number of articles on research about Russian ad networks and their influence during the 2016 election. Look at Jonathan’s dataset, and follow him on Medium, for more.

Research

The Washington Post, 10/5, “Russian propaganda may have been shared hundreds of millions of times, new research says.” Read here.

“The primary push to influence wasn’t necessarily through paid advertising,” said Albright, research director of the Tow Center for Digital Journalism at Columbia University. “The best way to to understand this from a strategic perspective is organic reach.”

In other words, to understand Russia’s meddling in the U.S. election, the frame should not be the reach of the 3,000 ads that Facebook handed over to Congress and that were bought by a single Russian troll farm called the Internet Research Agency. Instead, the frame should be the reach of all the activity of the Russian-controlled accounts — each post, each “like,” each comment and also all of the ads. Looked at this way, the picture shifts dramatically. It is bigger — much bigger — but also somewhat different and more subtle than generally portrayed.

The New York Times, 10/9, “How Russia Harvested American Rage to Reshape U.S. Politics.” Read here.

“This is cultural hacking,” said Jonathan Albright, research director at Columbia University’s Tow Center for Digital Journalism. “They are using systems that were already set up by these platforms to increase engagement. They’re feeding outrage — and it’s easy to do, because outrage and emotion is how people share.”

All of the pages were shut down by Facebook in recent weeks, as the company conducts an internal review of Russian penetration of its social network. But content and engagement metrics for hundreds of posts were captured by CrowdTangle, a common social analytics tool, and gathered by Mr. Albright.

The Washington Post, 10/9, “Add Google to the list of tech companies used by Russians to spread disinformation.” Read here.

Facebook said last week that modeling showed that 10 million people saw the Russian-bought ads bought by the 470 pages and accounts controlled by the Internet Research Agency. But Albright, the Columbia social media researcher, reported soon after that free Facebook content affiliated with just six of those 470 pages and accounts likely reached the news feeds of users hundreds of millions of times.

Albright also has found links to Russian disinformation on Pinterest, YouTube and Instagram, as well as Twitter, Facebook and Google. Clicking on links on any of these sites allowed Russian operatives to identify and track Web users ­wherever they went on the Internet.

Coverage

Rachel Maddow, audio clip from 10/9. Watch here.

(Again, you can look at Jonathan’s dataset, or follow him on Medium.)

Journalism Educator’s Symposium 2017

The Tow Center is pleased to announce that on Tuesday, September 19th, we will be hosting our Journalism Educator’s Symposium – an event designed to help build a community of interest and exchange around new approaches and best practices to journalism education.

The design of this event is almost entirely participant-driven: we want attendees to share their ideas, concerns and best practices with each other – and have plenty of time to connect one-on-one.

To that end, we are excited to share our call for Lightning Talks on a range of topics – from building credibility to the essentials of AI. We hope that you will submit a talk (or two!) and encourage your colleagues to do the same.

We know that September is a busy time of year, but as we discovered last year, taking a few hours to connect with your colleagues and talk about teaching journalism is a great way to get inspired and energized for the coming semester.

Logistics

The official symposium program is schedule to run from 12pm – 5pm on Tuesday, September 19th, with an optional reception to follow. We are eager to support the attendance of colleagues teaching outside of the New York City area, for whom we can provide parking vouchers and limited travel support.

If you have questions or suggestions, please don’t hesitate to reach out to us! Just send an email to towcenter@columbia.edu with the subject line Educator’s Symposium.

What makes a great photo editing intern (Apply now for Fall 2017!)

NPR Interns at workPhoto by Rachael Ketterer

This is not your standard photo internship!

This internship is an opportunity to learn more about the world of photo editing. Our goal isn’t to make you into a photo editor; we view this internship as a chance for you to understand what it is like to be an editor and improve your visual literacy, which can help you become a better photographer.

The paid internship runs from September 11, 2017 to December 8, 2017. Applications are due Sunday, July 16, 2017 at 11:59pm eastern.

What you will be doing

  • Editing: You’ll be working closely with the Visuals Team’s photo editors (Ariel and Emily) on fast-paced deadlines – we’re talking anywhere from 15 minutes to publication, to short-term projects that are a week out. You’ll dig into news coverage and photo research, learning how to communicate about what makes a good image across a range of news topics, including international, national, technology, arts and more.

  • Photography: Depending on the news cycle, there may be opportunities to photograph DC-area assignments. This can mean you’d have one or two shoots in a week, or maybe just a couple shoots in a month. You’ll work closely with a radio or web reporter while out in the field, and a photo editor will go through your work and provide feedback for each assignment. There will also be a chance to work on portraiture and still lifes in our studio.

  • We also encourage each intern to create a self-directed project to work on throughout the semester. It can be an Instagram series, video, photo essay, text story or anything in-between. You can work independently or with another intern or reporter.

You will be part of NPR’s intern program, which includes 40-50 interns each semester, across different departments. There will be coordinated training and intern-focused programming throughout the semester, which includes meeting NPR radio hosts, career development and other opportunities. As an intern, you will be treated as a member of the team. Many NPR employees are former interns and they’re always willing to help current interns.

Eligibility

Any student (undergraduate or graduate), or person who has graduated no more than 12 months prior to the start of the internship period to which he/she is applying is eligible. Interns must be authorized to work in the United States.

Who should apply

We’re looking for candidates that have a strong photojournalism background. An interest in editing, or experience with video/photo editing is a nice plus. It’s also helpful if you’ve completed at least one photojournalism-focused internship prior to applying (let us know if you have!), though it’s not necessary. A portfolio, however, is required.

We also want folks who can tell us what they would like to accomplish during their time at NPR. What do you want to learn? What do you want to try? We try to shape each internship around our intern, so we rely on you to tell us what goals you have for your time with us!

So how do I apply?

Does this sound like you? Read about our expectations and selection process and then apply now!

Into code, design, and data? Check out our design/development internship.

Be our design/code/??? intern for fall 2017!

Are you data-curious, internet savvy, and interested in journalism? Do you draw, design, or write code? We are looking for you.

We’ve had journalists who are learning to code, programmers who are learning about journalism, designers who love data graphics, designers who love UX, reporters who love data, and illustrators who make beautiful things.

Does this sound like you? Please join our team! It isn’t always easy, but it is very rewarding. You’ll learn a ton and you’ll have a lot of fun.

Here are a few projects our recent interns have worked on:

NPR's Book Concierge 2016 Clinton King
(Developer, Fall 2016)
Semi-Automatic Weapons Without A Background Check Can Be Just A Click Away Brittany Mayes
(Developer, Summer 2016)
You Say You're An American, But What If You Had To Prove It Or Be Deported? Zyma Islam
(Data reporter/developer, Spring 2016)
Using Technology To Keep Carbon Emissions In Check Annette Elizabeth Allen
(Illustrator, Fall 2015)

The paid internship runs from September 11, 2017 to December 8, 2017. Applications are due Sunday, July 16, 2017 at 11:59pm eastern.

Here’s how to apply

Read about our expectations and selection process and then apply now!

Into pictures? Check out our photo editing internship.

Artificial Intelligence: Practice and Implications for Journalism

We have witnessed the first wave of artificial intelligence (AI) in journalism in the form of chatbots, automated story generation, and machine learning techniques applied to news. The big tech companies have pushed AI to the center of their product strategies. How far along is the news business in incorporating these tools into the newsroom, and understanding the broad implications for journalism?

Columbia University’s Tow Center for Digital Journalism and Brown Institute for Media Innovation are excited to present a conference on”Artificial Intelligence: Practice and Implications for Journalism” on June 13. The conference will feature leading journalists, technologists, legal scholars and academics in conversation around the current and near future applications, challenges, and legal implications of AI implementation in newsrooms.

 

Check back on June 13, 2017,  1PM ET for the livestream of the event. 

#TowAI

 

Program

1:00pm Welcome Address from Steve Coll (Columbia Journalism School)

1:15pm Exploring the Ethics of the AI Powered Products
With Angela Bassa (iRobot), Jerry Talton (Slack), Amanda Levendowski (NYU), Madeleine Clare Elish (Columbia University), Gilad Lotan (Buzzfeed), John Keefe (Quartz), moderated by Nick Diakopoulos (University of Maryland)

2:15pm Future Ethical Dilemmas: Joshua Benton (Nieman Journalism Lab) in conversation with Rachita Chandra (IBM Watson Health)

2:45pm AI in the Newsroom: Technology and Practical Applications
With Sam Bowman (NYU), Marc Lavallee (The New York Times), Sasha Koren (The Guardian Mobile Innovation Lab), Judith Donath (Berkman Center), Meredith Whittaker (Google Open Research / AINow), moderated by Christopher Mims (The Wall Street Journal)

3:30pm Closing Remarks from Emily Bell (Tow Center for Digital Journalism)

Tow Center Announces Research Director and 2017 Fellows

Jonathan Albright Joins Tow Center for Digital Journalism as Research Director

The Tow Center is pleased to announce Dr. Jonathan Albright as its new Research Director. Jonathan’s research around networks of propaganda and misinformation has recently captured attention across the world. His research into the use of platforms such as YouTube to proliferate high volumes of automated misinformation has been featured across a broad range of publications including The Guardian, The Washington Post, and Fortune.

His work lies at the intersection of communication, culture, and technology, focusing on the analysis of online and socially mediated news events and activism, data-driven journalistic methods, and visual storytelling. Jonathan joins the Tow Center from Elon University, where he is an assistant professor of media analytics in the school of communication. In his role as Research Director, Jonathan will lead the Center’s fellows and research projects, working closely with Tow Center Director Emily Bell.

We are extremely excited that Jonathan is coming to work at the Tow Center, bringing with him his cutting-edge research into the new ecologies of journalism and misinformation. There is no more pressing issue in the field right now and Dr. Albright’s work will add to the Tow Center’s reputation for examining emerging trends in technology and how they apply to the field of journalism. Jonathan’s understanding of how technologies are being deployed and networked through social platforms to create an ecosystem of targeted misinformation is central to understanding current issues affecting both politics and journalism.

 

Tow Knight Projects and Senior Fellows Focus Tow Center Agenda On Investigating the News Environment of the Social Web.

This new cohort of Knight News Innovation Fellows at the Tow Center brings a wealth of expertise in examining some of the most timely and important issues facing journalism today. They will pursue a range of research topics, including automated journalism, collaborative journalism, information integrity, local journalism, political polarization, and the General Data Protection Regulation (GDPR).

These new fellows join over 60 current and former fellows at the Tow Center. The Fellowship projects are funded by the John S. and James L. Knight Foundation. Read more about all Knight News Innovation research projects at the Tow Center here.

 

2017 Knight News Innovation Fellows Projects:

Fact Trust

Mike Ananny, Assistant Professor of Communication and Journalism, Annenberg School for Communication and Journalism, University of Southern California

How does a Facebook-led partnership of news organizations and fact-checkers mix algorithmic and editorial judgment to fight “fake news”? Through interviews with key personnel and analyses of documents and infrastructures, this project tells the story of how techno-journalistic platforms make facts. Better understanding such hybrids helps scholars, technologists, journalists, and audiences appreciate how to trust and critique news networks—and how to think about and reconfigure power between publishers and platforms.

Engagement with Robot News: How Automated Journalism Affects Credibility and Engagement

Jan Boehmer, Assistant Professor of Journalism in the College of Communications at Pennsylvania State University

Technological advances and societal transformations have shaken up the journalism industry. One of the most disruptive examples of this change is the emergence of automated journalism. While a growing number of news organizations rely on algorithmic processes converting data into narrative, the effects on the audience are not fully explored. This research investigates how attributing authorship of news items to an algorithm affects readers’ perceptions of credibility and intentions to engage with the content.

Collaborative journalism and the creation of a new commons

Carlos Martinez de la Serna, Director of Digital Innovation at Univision News

The SF Homeless Project, the News Integrity Initiative, and ElectionLand are three major examples of an emerging pattern in journalism: the cooperation of multiple organizations and individuals to address big challenges at a scale that no single organization could by itself. This project will research how the combination of decentralized, networked, and traditional models for news production and distribution are creating new opportunities to support journalism.

Partnering with the Public: How ‘Audience Engagement’ is Reinventing Local Journalism

Jacob L. Nelson, PhD Candidate, Northwestern University

This project explores the way that three news organizations (City Bureau, Hearken, and The Chicago Tribune) conceptualize, implement, and measure audience engagement. At a moment when the news media’s credibility and economic sustainability are in doubt, this project examines what journalists in both traditional and innovative newsrooms believe “success” should look like. In doing so, it attempts to answer the question: Are journalism’s goals changing, or just its methods?

From Polarization to Public Sphere

Andrea Wenzel, incoming Assistant Professor, Temple University with Sam Ford, media executive and consultant

This research study examines what political polarization and urban-rural divisions look like in the daily lives of residents at the local level. The project focuses on a case study of a region of Kentucky, including the “purple” college town of Bowling Green and the more “red” and rural area of Ohio County. Drawing from interviews and media diaries, the study examines the communication ecologies of residents and the potential for community engagement across demographic and ideological lines. The study will also explore challenges and opportunities in the rural media landscape through a workshop with local and regional media and community stakeholders.

Bridging Stories: Countering Misinformation in Chinese Language News Ecosystem

Chi Zhang, Doctoral Candidate, Annenberg School for Communication and Journalism, University of Southern California

This project investigates and intervenes in the immigrant Chinese news ecosystem, which has seen significant misinformation, to bridge the information silos between Chinese-speaking immigrants and their surrounding community. In collaboration with Alhambra Source, a trilingual civic news site serving the immigrant majority city of Alhambra, and Asian Americans Advancing Justice-Los Angeles, we monitor ethnic Chinese media and social media outlets, and engage community members to produce and distribute bridging stories.  

The General Data Protection Regulation in a media context: threat or opportunity for media companies?

Hugo Zylberberg, Cyber Fellow, Columbia University’s School of International and Public Affairs with Susan E. McGregor, Assistant Director of the Tow Center for Digital Journalism

The General Data Protection Regulation (GDPR) will go into effect in May 2018 in the EU, yet most companies including media companies still know very little about its implications for their business models. The business models in the media ecosystem have rapidly evolved in the last couple of decades and traditional players have been threatened by new entrants. This project will not only explore the many ways that this imminent legislation will affect media companies, as well as the technology platforms upon which they increasingly depend, but also look at how in return the media could seize this regulation as an opportunity to leapfrog over the digital transformation.

Senior Research Fellow :

Award-winning data journalist Jon Keegan joins the Tow Center in 2017 as a Senior Research Fellow from The Wall Street Journal where he led projects in data journalism and visualization. In the past year Keegan built WSJ’s award-winning “Blue Feed, Red Feed” which visualizes political polarization on Facebook. At the Tow Center, Jon will be leading an initiative to explore partisan sources on social media.

This project—the first in a suite of tools for consumers of news on social media—will build an open database of popular news sources on Facebook, illustrating their reach across platforms, surfacing data about the owners, advertising networks, authors, and affiliations. This will take the form of a user-friendly public website, as well as an API so other developers can build tools that use this database to illuminate the murky world of partisan news on social media. This project aims to empower the public to be more responsible about the news they share with their networks, as well as increase media literacy around online news sources.

Jon joins Senior Research Fellows Pete Brown, Elizabeth Hansen, and Andrea Wenzel.

 

_______________________________________________________

The Fellowships are part of a $3 million research program funded by the Knight Foundation. Since the program began, the Center has published a number of reports as well as shorter guides on key trends including automated journalism, chat apps, and podcasting. The Tow Center also hosts large-scale conferences and smaller, skills-based workshops to further conversation around the published research.

The Tow Center offers fellowships to academics, journalists and technologists, disseminating research for application in newsrooms as well as classrooms. For more information, please email towcenter@columbia.edu.

About the Tow Center for Digital Journalism

The Tow Center for Digital Journalism, established in 2010 through gifts from the Tow Foundation and others, provides journalism students with the skills and knowledge to lead the future of digital journalism and serves as a research and development center for the profession as a whole.

About Knight Foundation

Knight Foundation supports transformational ideas that promote quality journalism, advance media innovation, engage communities and foster the arts. We believe that democracy thrives when people and communities are informed and engaged. For more visit, knightfoundation.org.

 

Symposium: Next Gen Podcast Distribution Protocols

MAY 11, 2017, 9am–5pm
Harvard Law School
Wasserstein Hall
1585 Massachusetts Avenue
Cambridge, MA

Next Gen Podcast Distribution Protocols: Innovation and governance in open development initiatives

Presented by the Berkman Klein Center for Internet & Society at Harvard University and the Tow Center for Digital Journalism at the Columbia Journalism School, in Collaboration with the syndicated.media Open Working Group

 

OVERVIEW

On May 11, 2017, the Berkman Klein Center for Internet & Society and Tow Center for Digital Journalism will host and facilitate a symposium, in collaboration with the syndicated.media open working group, to address the process of developing standards that support the distribution of syndicated audio content.  The event will look back at the evolution of the RSS protocol and look forward at the need for new technical infrastructure to support an expanding podcast distribution landscape.  Participants will have the opportunity to engage in both higher-level policy discussions and technical deep-dives throughout the course of this one-day event.

The goals of the symposium include furthering cooperation among various players in the world of podcast creation and distribution and consideration of recommendations on standards, enhancements, extensions, and other methods to support the growth of podcasting as an open and inclusive medium.  It will bring together academic, non-profit, and commercial constituencies to address, among other things:

 

  • the history of media protocols;
  • promises and pitfalls associated with open development initiatives;
  • rights issues relevant to openly syndicated content;
  • questions of governance and stakeholder engagement; and
  • technical planning and implementation for next generation podcast distribution

The symposium will mix talks and panels that generally address these issues (curated by the Berkman Klein and Tow Center teams) with opportunities for breakouts that allow deeper dives into technical questions around distribution protocols for podcasts and other forms of serialized media (facilitated by members of the syndicated.media community).

Registration is limited; sign up here.  The symposium will be followed by a separate, two-day “Audio for Good” event, co-hosted by PRX, RadioPublic, and the HBS Digital Initiative.  Applications to participate can be submitted here.

 

BACKGROUND

In the last two years, podcasting has hit a tipping point in mainstream adoption. Over fifty-seven million people listen to podcasts each month in the US, growing at over 25% per year. A rapidly expanding industry and ecosystem is taking shape across content creation, publishing, distribution, discovery, and monetization. Apple remains the largest platform for podcasting, with major players like Google, Spotify, Audible, and Pandora beginning to integrate podcasts into their services. Independent creators, content networks, and podcast apps and a variety of service providers are starting to arrive. Public radio remains a foundational force, dominating the charts with shows from NPR, PRX, WNYC, This American Life, and others.

There is also a growing number of industry conferences, events, and associations starting to address myriad needs in the podcasting space, including Podcast Movement, the Podcast Summit, Third Coast International Audio Festival, in addition to an uptick in live events for podcast fans in venues across the country.

Growth in content, audience, and revenue is intensifying the competitive landscape, with resulting pressure to address problems related to metrics, metadata, advertising, audience insight, and more.

These are unique and exciting challenges in developing technologies that power an open standard like podcasting. From the basic and ubiquitous formats that have come to be relied on, to recent advances like dynamic audio serving and advanced metrics and analytics, there is a wide array of topics that must be addressed.  

This convening seeks to provide a forum in which to discuss specific technical details that relate to podcast distribution and to learn from and compare notes with people who have been deeply involved in questions of governance, standard-setting, and open innovation across a wide variety of fields.  A primary goal for the syndicated.media community involves establishing processes and developing timelines for future development initiatives.

 

ABOUT THE HOSTS

The Berkman Klein Center for Internet & Society is a research center based at Harvard University.  The Center’s Center’s mission is to explore and understand cyberspace; to study its development, dynamics, norms, and standards; and to assess the need or lack thereof for laws and sanctions.  Berkman Klein is a research center, premised on the observation that what it seeks to learn is not already recorded. The Center’s method is to build out into cyberspace, record data, self-study, and share. Its mode is entrepreneurial nonprofit.

The Tow Center for Digital Journalism, established in 2010, provides journalists with the skills and knowledge to lead the future of digital journalism and serves as a research and development center for the profession as a whole. Operating as an institute within Columbia University’s Graduate School of Journalism, the Tow Center is poised to take advantage of a unique combination of factors to foster the development of digital journalism. Its New York location affords access to cutting-edge technologists, a strong culture of journalism and multiple journalism and communication schools, with outstanding universities attached to them. The Tow Center is where technology and journalism meet, and where education and practice meet.

Syndicated.media is a community-driven working group with a mission to ensure that podcasting grows to meet the needs of listeners, creators, producers, publishers, advertisers, and developers, without sacrificing the groundwork that has been established to make it an open and inclusive medium. The goal of the working group is to develop clear and comprehensive standards and best practices. The group now includes more than 100 representatives from a growing number of podcast industry stakeholders, including international participants, and intends to incrementally release updates to existing standards and recommendations for new proposals.

Five Local Podcasts to Try for #TryPod

We’re big fans of the simple idea behind the #TryPod campaign: share a podcast you love with someone you love.

At PRX, we work with talented indie producers all over the world, but this month we want to share five podcasts made in our own Boston backyard. Each show tells stories in a unique way and belongs to our growing PRX Podcast Garage community.

In this blog, Podcast Garage Community Manager Alex Braunstein gives you her take on each show and asks their hosts about an episode you should try.

Show: Hiding in the Bathroom, a show for those of us in business who want to embrace our introverted selves.

Episode to Try: How to Do Powerful Work

Alex says: I’m insanely jealous of how at home Morra looks in front of a microphone. As a host, she oozes warmth and a desire to take on the world. It’s no surprise that by day, she runs digital campaigns for mission-driven clients like Planned Parenthood. Her Forbes podcast engages women in frank conversations about introversion, self-care, and feminism in the workplace. Count me in.

Host Morra Aarons-Mele says: “Meighan is humble in the face of a really big life, and she has incredible advice to give those of us who want our work to have meaning. She took Malala Fund from an organization with no logo to a globally-recognized leader in helping educate the world’s girls. And I’ve felt her sacrifices, and admired her fortitude even as she made some really hard decisions and missed her son greatly. Meighan believes she doesn’t choose her work; it chooses her. She wants to serve, she has great skills, and the job finds her. I think this episode is essential listening to anyone who feels like the work they want to do eludes them.”

Show: Soonish, a show about our technological future, and how our choices today will shape that future, though often in ways we can’t predict.

Episode to Try: Meat Without the Moo

Alex says: Wade’s storytelling is so precise and thoughtful that you can just tell the guy has a PhD from MIT. I love his ambitious approach to the show, which is remarkably produced by a team of one. It truly feels like he’s on an epic quest to discover the future and I’m along for the ride. You will literally be smarter just by listening!

Host Wade Roush says: “One of the places this episode ends up is an old automobile factory in San Leandro, CA where a startup called Tiny Farms has built a huge cricket farm. So as the CEO is walking me around the place, I’m trying not to step on any loose crickets, and then I’m trying to stick my mic into their nest to get some cricket-song on tape without scaring them. I’m being so careful! And then the CEO explains that pretty soon they’ll knock out these crickets with carbon dioxide and freeze them and grind them up for cricket flour. And I realize I’m totally okay with that. It’s funny, because I’m vegetarian, so I’m largely against eating animals. But I’d eat crickets all day if it would save a few cows and chickens. I guess we all have our own moral thresholds – and our own choices to make about the future.”

Show: One in a Billion, a show about China, through the voices of Chinese millennials in America.

Episode to Try: Finding Love in America: Reality Bites

Alex says: Being in Mable’s presence is electrifying. She talks fast and dreams big. It’s no wonder she’s put the word “billion” into her show’s title and is personally chasing down the untold stories of Chinese millennials living in America. A former producer for Good Morning America and Dateline, Mable is a seasoned pro exploring a new medium. She’s currently searching for other producers to join her and I can’t wait to hear what they do next.

Host Mable Chan says: “I love Qinghua’s character – adventurous, dutiful and defiant. I find it intriguing that a young woman from the middle of China came alone to America to get her PhD in Engineering. She quickly earned her degree by age 25 and landed her dream job as a data scientist at Silicon Valley! But just as everything seemed to be going well, she was getting bored at work while her 7-year relationship with her boyfriend was suddenly over. How did she turn things around – not only for herself but also for thousands other Chinese looking for love in America? You gotta listen.”

Show: Caught Up, a show with the latest and greatest scoop about South Boston and beyond.

Episode to Try: Losing My Religion

Alex says: The makers of the magazine Caught in Southie have captured my heart with a show about all-things-South-Boston. Even though I’ve never been to Southie (gimme a break, I just moved here), I love eavesdropping on Heather and Maureen’s local take on their neighborhood. They claim to know nothing about podcasting, but they’re clearly naturals when it comes to something pretty unteachable: chemistry. I laugh out loud when they’re recording in our studio and somehow feel nostalgia for a place I’ve never lived.

Hosts Maureen Dahill and Heather Foley say: “In this episode, you get a sense of how we grew up in South Boston. The majority of the kids growing up in Southie went to Catholic School which was taught by nuns. Needless to say, those nuns shaped who we are today – good, bad or otherwise i.e. our love of wine lightening the load of Catholic guilt.”

Show: The Courage to Listen, a show that explores issues of police community relationships, gang violence and race in America.

Episode to Try: Commissioner Ed Davis

Alex says: I crave compassionate leaders like Reverend Brown who know how to listen. It’s a privilege just to be a fly on the wall for his conversations about violence prevention, community mobilization, and policing. He’s credited as “an architect of The Boston Miracle,” in which a group of local preachers cut youth violence in the city by 79%… by listening. I find this show’s straightforward interview style totally gripping.

Host Reverend Jeffrey Brown says: “Ed led the police department for the city of Boston, and was featured in Mark Wahlberg’s film ‘Patriot’s Day.’ We had a fascinating discussion about the Marathon bombing, his personal transformation from traditional to community-oriented policing, and his thoughts on the future of police reform today. Oh, and we asked him how he felt about John Goodman playing him in the movie!”

Learn more about our membership at the Podcast Garage, schedule a session in the studio, or swing by during our open hours for a tour of the space.

The post Five Local Podcasts to Try for #TryPod appeared first on PRX.

YoPro San Diego: Let’s Go!

Believe it or not it’s time to start getting ready for the PBS Annual Meeting. At this year’s meeting PBS Digital will be presenting a FREE young professionals workshop called YoPro: San Diego. The mission of the larger YoPro initiative is to pull together young professionals, 35 and under, within Public Media to network, collaborate and discuss topics relevant to their professional development. This year the YoPro workshop at the PBS Annual Meeting will be held Monday, May 15, 2017 from 8AM-3PM at the Marriott Marquis San Diego Marina in San Diego, CA.

Registration is open, so sign up today! (psst Early-Bird registration ends Friday, April 14, 2017, so register before prices go up) 

Scholarship
YoPro will be offering two scholarships to young professionals at stations to attend this pre-conference workshop event only. Learn more about eligibility requirements and apply now. The deadline for submission is April 12th, and selected applicants will be notified no later than April 14th.

Agenda
Attendees will hear from a range of speakers and participate in discussions topics like building strong digital and content strategies, innovative storytelling methods, building leadership skills, and learning how to fail and succeed professionally as gracefully as possible. All while networking with some of the system’s most engaged, forward thinking and innovative future leaders.

The day starts with a welcome from PBS' CEO Paula Kerger; then hear from system leaders Andrea Downing, Co-President, PBS Distribution and Renard Jenkins, Vice President, PBS Operations about how they got to where they currently are and what career choices they made along the way.

Other speakers participating in the day are VP of Digital at KCPT, Carla McCabe, and Chief Digital Officer & CMO Ira Rubenstein. As the agenda continues to expand, look for additional information on the agenda to be sent out. Stay tuned!

The workshop will also feature an in-depth collaborative exercise with station General Managers and young professionals. These groups will come together and brainstorm ways to create, sustain and lead the digital culture at public media stations.

Have we given you enough reason to get excited and register for this event? We hope so!

If you have any questions about the workshop or scholarship please reach out to yopro@pbs.org.

Jen Hinders | Director | PBS Digital
Amy Lust | Assistant Director | PBS Digital

Announcing the Digital Immersion Project Partners


by Max Duke | Senior Director | PBS Digital & Marketing 

The 2017 Digital Immersion Project application and judging process has completed, and the 25 station individuals participating in the project have been chosen. The response to the project was extraordinary, with over 90 individuals from PBS stations applying to participate. 

All applications were reviewed by a panel comprised of PBS and station colleagues, bringing multiple perspectives and levels of experience to the judging process.

We are excited to kick off this project with such a strong, well-rounded group of public media professionals. The 25 station professionals participating in the Digital Immersion Project are:

Jennifer Amend, WyomingPBS
Ellie Banks, Mississippi Public Broadcasting
Jordan Basham, WKU PBS
Kristin Benjamin, WUCF TV
Alexa Corcoran, Rocky Mountain PBS
Charlotte Cushing, South Florida PBS – WPBT
Makenzie Demmert, KUAC
Dale Fisher, Nine Network of Public Media
Mary Gribulis, WMHT
Kimberly Harbrecht, WPBA
Carl Heidle, KSPS Public Television
Nick Houser, WOSU
Matt Kawamura, NWPTV (KWSU/KTNW)
Barbara Linstrom, WGCU Public Media
Abby Malik, KET
Angela Massino, Community Idea Stations – WCVE
Karen Mell, KCPT
Liberty Peralta, PBS Hawai'i
Joan Rebecchi, New Mexico PBS
Bill Richards, WKAR
Tabitha Safdi, South Carolina Educational Television (SCETV)
Nicholas Scalera, WETA
Jeff Tucker, Idaho Public Television
Matt Wilson, WITF
Chris Zellers, WVIA Public Media

The Digital Immersion Project was created by PBS Digital & Marketing, with support from the Corporation for Public Broadcasting, to help improve digital efforts at stations, while offering a way for station staff to connect to a greater public media community of digital professionals.

The professional development program also focuses on strategic and organizational tactics, with the selected participants being able to draw on the project’s teachings and a national network of public media contacts to further digital success at the local level.

Thank you to all station professionals that applied for this program. PBS and CPB are committed to supporting the development of public media digital professionals, and we hope to continue to offer the Digital Immersion Project and similar opportunities in the years to come.

Veterans Initiative Page Launches on Station Management Center


by Megan E. Paparella | Senior Manager | Station Services

We are pleased to announce the launch of our Veteran Initiatives page on the Station Management Center. This will be a space to share engagement case studies, resources, collaborative events and local film content.  

We hope you will join for our monthly Stories of Service webinars, a multi- platform initiative that unites powerful stories and conversations around our military veterans. Visit the Veterans Initiatives page to watch a recording of the initial webinar and join us for future events. 

Don’t have an account on the SMC? Register today

ValuePBS Bentomatic


VALUE PBS, A NEW BENTOMATIC FOR STATIONS


by Katie Wilson | Senior Manager | PBS Digital & Marketing 

A new ValuePBS website launched this week and it’s available for stations to integrate into their websites, using Bentomatic. It makes the case why PBS and its member stations are a trusted, valued, and essential resource for communities, educators, and families. The new mobile responsive site can help you communicate the value of our collective work.

Site features include:
  • Education station map (same map as the one on the previous ValuePBS site that stations have access to update and edit the information)
  • Brand touchstone videos
  • Testimonial videos
  • Stats and infographics
  • Social sharing features
How to Add ValuePBS to your site: 
Follow these steps to gain access to the ValuePBS Bentomatic

1. Request the ValuePBS experience from Support
2. Insert the ValuePBS Bentomatic into the Page Content placeholder/section of your page

The Bentomatic works best on the Studio One template. But it will also work on the two-column Studio and 1 and 2 column Station Explorer templates.

Don't forget about Curate! 
Stations who use the Bentomatic can also leverage the localization features on PBS.org to promote the ValuePBS campaign on your website.

A station’s ValuePBS Bentomatic page can be linked to the following locations:

1. Local menu dropdown on the top navigation of PBS.org
2. Local homepage carousel feature (3rd slide in carousel)
3. Station hub/landmark image area

Stations who don’t select to use the Bentomatic can also take advantage of the localization feature on PBS.org, either by linking to a page that you have created on your own website or linking to the pbs.org/value website.


As a reminder, you can also find a number of case-building assets on the Source, including infographics highlighting the findings of the recent American Viewpoint-Hart Research Voter Survey and the Top 9 Reasons why the public sees PBS and stations as Trusted, Valued and Essential. PBS’ social media channels have been promoting these materials so also feel free to share or re-post.

Happy Birthday Mister Rogers!


Fred Rogers’ birthday is coming up on Monday, March 20. To mark the occasion, honor Fred's legacy and galvanize fans to share their personal stories of how Fred, PBS and stations have positively impacted them, we have kicked off a social media activation leading up to March 20 and invite you to join the celebration.

This initiative will include engagement activities across PBS KIDS, PBS Parents, PBS Teachers and other social media platforms incorporating the hashtags #BeMyNeighbor and #ILovePBS. We encourage you to join in by spreading the word on your social media platforms using the hashtags, and encouraging fans of your station to share their stories. A toolkit is available on the Source to help you do so, including:

-- Content calendar with template language
-- Facebook and Twitter skins
-- Fred Rogers quotes, images and other graphics
-- Videos
-- Trolley cardboard cutout

In addition, beginning this week we will roll out Fred Rogers-related content on pbsparents.org, PBS Teachers’ Lounge blog, pbskids.org and on the PBS KIDS YouTube channel.

We also encourage you to reach out to us if you’d like to experiment with Fred Rogers’ birthday-themed Snapchat geofilters localized to your station, which we are piloting as part of this effort. In addition, we encourage stations to contact us if you would like to share images of local Fred Rogers-related landmarks (murals, etc.) in your town, which we will incorporate into our social efforts on March 20.  Interested in these opportunities? Email the PBS KIDS Team

We hope you join us as we celebrate Fred Rogers, one of public media's most beloved pioneers whose legacy continues to impact the lives of many today. 

Questions? Reach out to Maria Vera Whelan.

PBS Throwback campaign – coming very soon!



News from Kevin Dando, Sr. Director, PBS Social Media

(This campaign is now live -- find updates from Kevin below)

Update #1

Update #2

---

A scrappy, quick turnaround pilot, the PBS Throwback campaign's goal is to tap into the affection and nostalgia folks have for PBS and its various icons and content (Mister Rogers, Bob Ross, Julia Child, Reading Rainbow, etc.) to help increase relevance and support for PBS and its member stations.

Starting on Thursday, 3/9 with the re-posting of the Mister Rogers "Garden of Your Mind" remix, and a new PBS Throwback FB cover image, PBS will activate this campaign to increase PBS and member stations' profiles and brand presence and push to this Throwback page (pbs.org/specials/throwback-pbs).

More details of the campaign will follow, utilizing Julia Child, Bob Ross and Reading Rainbow, along with trivia, polls and social assets modeled off the always popular "I <3 PBS" buttons and using the hashtag "#ILovePBS". A small amount of paid social will boost assets to younger audiences.

Using existing assets and some conversational posts, we look forward to driving some brand love and heightened affection and welcome your help and ideas to extend this campaign locally. Updates will be posted here as we move forward. Thanks for your support!

----

Stay up to date with the PBS Social Media Facebook Group, and Social Media Resources.

The 13th Annual Zeitfunk Awards

zeitfunk

 

We wear a lot of hats here at PRX, from distributing podcasts of all stripes, to running our Podcast Garage training facility in Boston, to managing our open audio marketplace at PRX.org.

Since our founding in 2003, our PRX.org marketplace has grown to house over 100,000 audio pieces—uploaded from around the world—from short, artsy works to hour-long music specials. Creators post their work on our site, and public radio stations and digital networks shop there for new work for their local audiences. The goal is to give great audio a second home online, and ideally a third home on broadcast and digital, where it can reach even more ears.

To celebrate our marketplace, we host our annual Zeitfunk awards. Below you’ll find the list of producers, programs and stations who sold the most in the PRX marketplace in 2016. These numbers are calculated from individual licenses of audio pieces on PRX. (Subscription-only shows like This American Life and The Moth are not included in these results.)

Most Licensed Pieces
1. Best of the Best: The 2016 Third Coast Festival Broadcast
2. The Rose Ensemble: Christmas in Baroque Malta from The WFMT Radio Network
3. Ten From David: A David Bowie Appreciation from Paul Ingles
4. A Bow To Prince: An Appreciation of The Artist from Paul Ingles
5. The Pioneers of Punk – Please Kill Me: Voices from the Archives from Creative PR

Most Licensed Series
(This list does not include subscription-only series, like This American Life and The Moth)
1. Global Village with Chris Heim
2. Classical Guitar Alive!
3. The International Americana Music Show
4. Travelers In The Night
5. Blue Dimensions
6. The Bluegrass Review
7. Stuck in the Psychedelic Era
8. Strange Currency
9. The Stone Age
10. The Latin Alternative

Most Licensed Producers
These are individual creators on PRX who sold the most.
1. Tony Morris
2. Daniel Wargo
3. Michael Park
4. Al Grauer
5. Philip Nusbaum
6. Stephen R Webb
7. Vic Muenzer
8. Mat Kaplan
9. Chris Kuborn
10. Jamie Hoover

Most Licensed Groups
Teams of producers who sold the most.
1. With Good Reason
2. Deutsche Welle
3. Bluesnet Radio
4. BackStory with the American History Guys
5. Science Update
6. NPR Music
7. L.A. Theatre Works
8. The Steve Pomeranz Show
9. Great Lakes Today
10. Footlight Parade

Most Licensed Stations
Stations are huge creators of work, too. These are the ones that sold the most in 2016.
1. KMUW
2. The WFMT Radio Network
3. WMHT
4. WSIU
5. WBEZ
6. WOUB
7. South Carolina ETV Radio
8. WKSU
9. Kansas Public Radio
10. Louisville Public Media

Most Licensed Debut Producers
Producers who were new to PRX in 2016 who sold the most.
1. Brooke Halpin
2. Matt Davenport
3. Ryan Sweikert
4. Reade Levinson

Most Licensed Debut Groups

Teams of producers who were new to PRX in 2016 who sold the most.
1. Great Lakes Today
2. The World According to Sound
3. On Being with Krista Tippett
4. Safe Space Radio
5. Christopher Kimball’s Milk Street Radio
6. Outside Magazine

Most Licensed Producers by PRX Remix
PRX Remix is our XM Station, app, and broadcast show that purchases work directly from PRX.org. These are individual producers from which Remix purchased the most.
1. Nate DiMeo
2. Eric Molinsky
3. (tie) Erica Heilman and David Green

Most Licensed Groups by PRX Remix
PRX Remix is our Sirius XM station, app, and broadcast show that purchases work directly from PRX.org. These are teams of producers from which Remix purchased the most.
1. The World According to Sound
2. RadioArt
3. Criminal
4. (three-way tie) KCRW’s Independent Producer Project, HowSound, and Out of the Blocks

Stations That Licensed the Most
We like to honor the stations that license (download for air) the most from PRX as well.
1. WOUB
2. WLPR
3. KMXT
4. WMUU-LP
5. KMRE-LP
6. KCMJ Community Radio
7. KSRQ
8. RadioFreePalmer
9. KKRN
10. KKWE Niijii Radio

Most Licensed Piece Lengths
Producers, these are the lengths of pieces on PRX that sell the most. You can see that hour-long and 5-min. or shorter pieces are the most popular.
1. 55-60 min. (20,275 individual licenses on PRX in 2016)
2. 5 min. or less (6,883 licenses)
3. 50-55 min. (4,735 licenses)
4. 25-30 min. (2,746 licenses)
5. 5-10 min. (2,416 licenses)
6. 30-35 min. (982 licenses)
7. 10-15 min. (466 licenses)
8. 40-45 min. (356 licenses)
9. 60-65 min. (350 licenses)
10. 15-20 min. (213 licenses)
11. 35-40 min. (182 licenses)
12. 10-25 min. (176 licenses)
13. 45-50 min. (84 licenses)

The post The 13th Annual Zeitfunk Awards appeared first on PRX.

Farewell NICAR, hello SXSW

Hope you’re all resting after MisinfoCon and NICAR, hacks and hackers. Well stop resting on your laurels, because The Awesomest Journalism Party Ever (VII) is coming! Come party like a...

Visit Hacks/Hackers to read the full post and join our community.

TechCon Sessions: Birds Eye View


by Katie Wilson | Senior Manager | PBS Digital

Spring is on the way, and with it, thoughts of #TechCon17 bloom in a digital mind. This year promises even more sessions dedicated to digital strategy, audience engagement, product discovery and other fascinating topics.

We have 30 Digital sessions for you to select from, and we are adding 4 Development sessions for a total of 34 Digitally focused sessions! You will not want to miss a minute. Leave early and you pass up “10 Low Down Dirty Tricks to Speed Up Your Digital Video Production” from PBS NewsHour. Arrive late and you’ll miss "Investing in Innovation" from KQED's Colleen Wilson. Peppered throughout the conference are sessions dealing with interconnection, public safety, organizational structure, and too many other topics to list.

See below for a few highlights and check out the full agenda here.

COMING UP ACES: MARKETING

Found Audience Using Curate To Attract National Attention to your Local Station: A Workshop Curate is your tool to customize the user experience for your viewers and members across PBS digital properties. Customize links, images, PBS newsletters, show pages and more on this easy-to-use interface! With new features for 2017, spend some time at a workshop (bring your laptop and PBS account login!) with Eric Freeland, Joe Hilton, and Leif Brostrom.

DOUBLE DOWN ON: STRATEGY

YoPro Talks: Leadership Panel
For the first time at TechCon join YoPro (PBS’ young professional initiative) for a leadership panel. Bringing the heavy hitters, hear from PBS’ Chief Digital Officer & CMO, Ira Rubenstein, PBS’ CTO Mario Vecchi and WUCF- TV’s Director of Communications, Jennifer Cook. The panelists will discuss and share their career paths, lessons learned, career advice, and much more. This session is meant to be informal, the audience participates by guiding the direction of panel by asking questions.

Digital Immersion: Building a Digital Department
So your station wants to build a “Digital Department” but doesn't know where to start. How do you create an organizational structure that prepares you for growth and evolution? How do you get buy-in as priorities shift and workloads change? How do you prepare for the constant change that digital brings? You'll hear from KPBS's Tammy Carpowich and ThinkTV's Justine Moore who will discuss the obstacles and victories in building their digital departments. This session is part of the Digital Immersion Project from PBS Digital and CPB.

KNOW HOW TO HOLD 'EM: CONTENT AND ENGAGEMENT

Experimenting with Digital Engagement Tools to Have Fun and Make an Impact
This session will present several mini case studies involving innovative experiments in social and digital engagement at WCVE/Community Idea Stations. Supercharge your social media engagement with available digital tools to increase your reach and engagement and to strengthen your community. By shifting the way you view and use social media, your station can move beyond simple promotion of your content to creating strong connections with - and among - your fans and other community non-profits. Learn how the Community Idea Stations/WCVE used scavenger hunts, infographics, video storytelling, Facebook Live, Google Maps, SnapChat filters and other digital tools to reach new audiences, increase the fun quotient in our programming, strengthen our connections with other non-profits, and create a sense of community around important issues. We will also reflect on creating a realistic budget for social media, managing (and creating) realistic expectations for management, and evaluating the success of your digital experiments.

Engaging Niche Audiences Through Digital Marketing
This session will discuss the value in identifying and reaching niche audiences with hyperfocused social campaigns. Rewire's Bryce Kirchoff, KCTS's Stacey Jenkins and PBS's Matt Schoch will share strategies for catering to your unique audiences to figure out what genres they love, and illustrating that your brand understands why they love it. Using The Great British Baking Show and content verticals as examples, learn the value of live-tweeting, GIFs, original digital content, and leveraging a “human” voice.

ANTE UP: DEVELOPMENT & FUNDRAISING

Measure the Money
Google Analytics is the standard tool that most stations use to measure their digital experiences. But few have taken advantage of a powerful feature in that tool that allows them to measure online donations. Google Analytics has the power to tell you exactly what marketing activities are driving donations, how donors enter and leave your site and what content they consume when they are there. Dan Haggerty will walk you step-by-step through an implementation of this feature (known as Enhanced eCommerce Tagging), show working examples, share insights and give you tips and tricks for avoiding some of the most common challenges. You will leave this session with a clear understanding of how you can specifically track the online behavior of your membership and the exact dollar amounts that each type of behavior generates.

HIT THE JACKPOT WITH: PRODUCTS & TECHNOLOGY

Accessibility is an Everyone Issue
What if you came to a show's website, but couldn't watch any videos? What if you couldn't make a donation to a station? That wouldn't be acceptable. Yet this is the reality for many users on the internet today. Lars Klores will be giving an introduction to the reasons why accessibility on the web matters. He will discuss the efforts of PBS to make PBS.org and other PBS digital products accessible to users with a range of disabilities. This includes color blindness, inability to use a mouse, deafness, or complete blindness. He will also discuss the legal dangers of remaining inaccessible. Chip Cullen will then give a live demonstration of navigating a website as a disabled user, using a keyboard and a screen reader. He will also discuss common accessibility issues and basic fixes.

Facebook Live: Best Practices And Basic Production Setups
Although they didn't invent live streaming, Facebook's streaming product "Facebook Live" has emerged as a unique way to reach and directly engage your fans, viewers, listeners and members. In this session, WGBH's Tory Starr and Shane Miner will walk through some great use cases of stations using Facebook Live and introduce you to what WGBH has classified as the "four tiers of Facebook Live Page 26 TechCon Agenda as of 2/27/17 production," ranging from basic iPhone broadcasting to a full studio setup using the Facebook Live API. We'll also leave time at the end to answer your questions, and demo the MEVO camera, a low-cost pocket-sized live video camera that lets you edit while you film.

Reimagining Websites Through a Better Bento
At last year’s PBS TechCon, Bento 3.0 was introduced as a key priority for improving the station and producer communities. Bento is PBS Digital’s website building tool, which is leveraged by a majority of stations and producer. Bento 3.0 aims to address concerns In this session PBS's Jen Hinders will take attendees through a live demo of the environment, outlined feature roadmap, and product insights from peers who have launched using the new version of the tool.

Have you registered yet?

Additional Information to plan your trip:

Registration - http://www.pbstechconference.org/registration/
Hotel - http://www.pbstechconference.org/hotel/
Session Agenda - http://bit.ly/2mcu2AN (subject to change)
Schedule - http://bit.ly/2lLO87D (subject to change)

See you in Las Vegas!


From Whiteboard to Web, a Bento 3.0 Success Story

In preparation for the launch of Bento 3.0, PBS Digital began a beta group consisting of stations not previously using Bento to build a site from scratch with the goal of launching that site. The group consisted of thirteen stations. Ray Walters, Television Operations Support Engineer from KMOS participated and was the first station to launch on Bento 3.0. In the coming weeks we will have more information about Bento migration and upcoming training opportunities. 

Ray Walters | Television Operations Support Engineer | KMOS-TV

Stop me if any of this sounds familiar to you when it comes to your station’s website:
  1. You need an easy way to rapidly create content pages that actually look good;
  2. You need a way to decentralize content creation to the rest of your staff so that the responsibility doesn’t rest with one person, thus one point of failure;
  3. You need an easy way to tap into PBS Digital's product and service offerings like COVE, Merlin and Passport;
  4. You need a way to allow users to easily donate and become members; 
  5. You need a fresh start to your site, giving you the ability to implement lessons you’ve learned when it comes to bringing content to the web; and 
  6. You need it all now.
It certainly sounded familiar to the staff here at KMOS, because these were all issues that we had identified during our journey of learning about creating an excellent web presence for our station.

The good news for us here at KMOS and you and the staff at your station is that because of the hard work of the people at PBS Digital, we have a solution that provides for all of the items above in the form of Bento 3.0. While still a work in progress, the new 3.0 release of the Bento platform provides a powerful web publishing tool for PBS stations.

Our journey with the KMOS website has been a bit of a long one. From a small standalone server running a static set of information pages to a Bento 2.0 site that we created in an over-complicated fashion, we had learned a tremendous amount of information from our experiences, and really needed to do a whole-site refresh to re-align our web presence towards our station’s digital goals. So when the opportunity arose to be a part of the beta pilot group for Bento 3.0, KMOS jumped at the chance as it would provide a platform to start from scratch and do things right.

Our development process for the Bento 3.0 site centered around the singular idea that our GM put forth to us, to make KMOS.org an online “marketplace” for all things KMOS. Unpacking that a bit, we came up with the following things that we needed to be successful in meeting this goal:


  1. All content should be no more than one click away. 
  2. PBS KIDS content should be front and center rather than buried behind a menu system. 
  3. Users should be able to easily donate and become members. 
  4. Navigation should be clear and not convoluted. 
  5. Users should be provided a “one-stop” shop for our original content offerings. 
  6. Users should be able to leverage content from Passport, Cove and Merlin seamlessly. 
  7. Site content updates need to be decentralized so there is not one point of failure on updates. 
  8. The site needed to be able to leverage Google Analytics. 
  9. The site needed to be able to leverage underwriting opportunities with Google DFP 
  10. The site needed to be dynamic with front page content being kept relevant and fresh. 
  11. The site needed to look clean and professional. 
  12. The site needed to be responsive for mobile users, since our analytics have shown that most visitors are using mobile devices to access KMOS.org. 
With these key indicators of a successful site in place we began our work not on a computer, but on a whiteboard with various staff members. We sat down and sketched out our entire site using a couple of dry erase markers, a camera and about 4 hours of meeting time. You might be asking yourself “How in the world were they able to plan out an entire site in just 4 hours?” It’s a valid question with a very simple answer. We had access to the Bento 3.0 framework and it’s powerful page layout editor.

The new editor in Bento 3.0 gives you a specific set of features, layout guides and components, making it incredibly easy to plan pages while at the same time providing balance and flexibility in design choice. Because of this we were able to go to our staff and provide them a list of design options that gave them choice while still maintaining a linear workflow for those of us whose job it was to transfer the whiteboard plan to the actual Web. This allowed us to rapidly design the new KMOS web presence in a way that satisfied all of our milestone goals listed above as well as gave us design choices to avoid a cookie-cutter look and feel. With the planning and sketching done, we were able to switch quickly to getting the pages and full site built.

As far as the actual development work of the site and its content, this proceeded just as rapidly as the whiteboarding process, albeit in a relative fashion. Where in the past it would take us days and even weeks to develop pages and content, Bento 3.0 allowed us to create fully populated show pages complete with inline video playlists in literally minutes. The interface to work with layouts and content is intuitive and easy-to-understand. Page previews were a huge help to seeing progress and making changes on the fly.

From planning to implementation we estimated that KMOS.org running on Bento 3.0 took us between 35 and 40 hours to create and launch. This is an incredibly fast turnaround time that is obviously very exciting for us. PBS Digital has created a platform that is both powerful and easy-to-learn and use in Bento 3.0, a technological feat that isn’t easy on the scale that it’s working with. We’re looking forward to seeing how the platform continues to evolve with new features and formatting. Bento 3.0 FTW!


Radiotopia Live: Announcing Our West Coast Tour!

Radiotopia Live

Radiotopia is headed out on our first West Coast tour! Radiotopia Live brings extraordinary, cutting-edge podcasts out of your headphones and onto the stage.

Join us in Seattle, Portland, San Francisco and LA for live radio, conversations, stories and music from your favorite Radiotopia podcasts including 99% Invisible with Jon Mooallem and the Brink Players, Criminal, The Allusionist, The Memory Palace, Mortified and more. Plus a performance of The West Wing Weekly in LA only.

Full tour schedule:

Monday, May 8 – Aladdin Theater in Portland, OR

Tuesday, May 9 – Moore Theatre in Seattle

Thursday, May 11 – Nourse Theater in San Francisco

Friday, May 12 – Theatre at the Ace Hotel in LA

Get all the info, including tickets, at Radiotopia.fm/live. Enter code “RTLIVE” to get the best seats before the public; our exclusive pre-sale runs through 11:59 p.m. on March 2rd. Hope to see you there!

The post Radiotopia Live: Announcing Our West Coast Tour! appeared first on PRX.

YoPro Talks: Leadership Panel At TechCon


Facetime with those in leadership is a rarity in a lot of organizations. When do you get the time to hear from leadership, more often than not it’s in a big room at conferences or staff meetings. Topics like how those in leadership got to where they are, career advice and lessons learned are rarely discussed.

To help break down the walls that can form between leaders and staff in an organization, YoPro looks to better connect system leaders with those in non-leadership roles, specifically young professionals within public media. Kicking off the conference season, at this year’s PBS TechCon, YoPro will be presenting a leadership panel.

During this panel, audience members will hear from PBS’ Chief Digital Officer & CMO, Ira Rubenstein, PBS’ CTO Mario Vecchi and WUCF- TV’s Director of Communications, Jennifer Cook. The panelist will discuss and share their career paths, lessons learned, career advice, and much more. This session is meant to be informal, the audience participates by guiding the direction of panel by asking questions.

The session will take place on Wednesday, April 19th at 5:30PM. Click here to download a calendar reminder! As the last session of the day, we want to encourage all attendees to head over to the PBS kiosk reception to continue conversations with the panelist and their young professional peers.

Scholarship Opportunity
In case you missed the call out, YoPro will be offering two scholarships to young professionals at stations to attend TechCon this year. Learn more about eligibility requirements and how to apply by viewing the previous SPI blog post about YoPro at TechCon.The deadline for submission is March 10th and selected applicants will be notified no later than March 15th.


Amy Lust | Assistant Director | PBS Digital
Jen Hinders | Director | PBS Digital