Illustration by Fernando Vicente |
Books that have stayed with us
by Lada Adamic and Pinkesh Patel
Favorite books are something friends like to share and discuss. A Facebook meme facilitates this very interaction. You may have seen one of your friends post something like “List 10 books that have stayed with you in some way. Don't take more than a few minutes, and don't think too hard. They do not have to be the 'right' books or great works of literature, just ones that have affected you in some way." If not great works of literature, what are the books that have stayed with us?
The following analysis was conducted on anonymized, aggregate data.
To answer this question we gathered a de-identified sample of over 130,000 status updates matching “10 books” or “ten books” appearing in the last two weeks of August 2014 (although the meme has been active over at least a year). The demographics of those posting were as follows: 63.7% were in the US, followed by 9.3%in India, and 6.3% in the UK. Women outnumbered men 3.1:1. The average age was 37. We therefore expect the books chosen to be reflective of this subset of the population.
We programmatically segmented the posts into lists, and found the most frequently occurring substrings, which corresponded to different books, e.g. “Anna Karenina by Leo Tolstoy”. However, the same book could appear as different substrings: e.g. just “Anna Karenina” or “Anna Karenina - Leo Tolstoy”. We clustered similar variants programmatically, hand tuning where the algorithm had failed to merge two popular variants. We then used the clusters to automatically match the book lists against the common variants of the top 500 most popular books.
Here are the top 20 books, along with a percentage of all lists (having at least one of the top 500 books) that contained them.
One can also look at connections between the books, e.g. 'people who listed X also listed Y', using pointwise mutual information. In the network visualization, each node represents a book, sized by the frequency with which it was mentioned, as an edge represents an unusual number of co-occurrences of the two books in the lists.
There is actually another kind of network that forms. While some people shared the meme without tagging, calling on all their friends to make their own posts, others tagged specific friends whose favorite books they'd like to know about. Even a small fragment of the cascade shows long (tangled) tagging chains through which it diffused.
Do friends tend to like the same books? We computed the number of books shared between lists linked via tags, which was a mere 0.4 books on average! This number was 4 times greater than the overlap of 0.1 books between any two random lists. It is also an underestimate, since our automated matching identifies only 5.3 books/list on average (rather than the full 10), due to matching on just the 500 most commonly mentioned titles. Nevertheless, the low overlap underlines that even in a world of relatively few highly successful bestsellers, lists of favorites tend to be rather different, even between friends.
Finally, the remaining top 100 books were:
Favorite books are something friends like to share and discuss. A Facebook meme facilitates this very interaction. You may have seen one of your friends post something like “List 10 books that have stayed with you in some way. Don't take more than a few minutes, and don't think too hard. They do not have to be the 'right' books or great works of literature, just ones that have affected you in some way." If not great works of literature, what are the books that have stayed with us?
The following analysis was conducted on anonymized, aggregate data.
To answer this question we gathered a de-identified sample of over 130,000 status updates matching “10 books” or “ten books” appearing in the last two weeks of August 2014 (although the meme has been active over at least a year). The demographics of those posting were as follows: 63.7% were in the US, followed by 9.3%in India, and 6.3% in the UK. Women outnumbered men 3.1:1. The average age was 37. We therefore expect the books chosen to be reflective of this subset of the population.
We programmatically segmented the posts into lists, and found the most frequently occurring substrings, which corresponded to different books, e.g. “Anna Karenina by Leo Tolstoy”. However, the same book could appear as different substrings: e.g. just “Anna Karenina” or “Anna Karenina - Leo Tolstoy”. We clustered similar variants programmatically, hand tuning where the algorithm had failed to merge two popular variants. We then used the clusters to automatically match the book lists against the common variants of the top 500 most popular books.
Here are the top 20 books, along with a percentage of all lists (having at least one of the top 500 books) that contained them.
- 21.08 Harry Potter series - J.K. Rowling
- 14.48 To Kill a Mockingbird - Harper Lee
- 13.86 The Lord of the Rings - JRR Tolkien
- 7.48 The Hobbit - JRR Tolkien
- 7.28 Pride and Prejudice - Jane Austen
- 7.21 The Holy Bible
- 5.97 The Hitchhiker's Guide to the Galaxy - Douglas Adams
- 5.82 The Hunger Games Trilogy - Suzanne Collins
- 5.70 The Catcher in the Rye - J.D. Salinger
- 5.63 The Chronicles of Narnia - C.S. Lewis
- 5.61 The Great Gatsby - F. Scott Fitzgerald
- 5.37 1984 - George Orwell
- 5.26 Little Women - Louisa May Alcott
- 5.23 Jane Eyre - Charlotte Bronte
- 5.11 The Stand - Stephen King
- 4.95 Gone with the Wind - Margaret Mitchell
- 4.38 A Wrinkle in Time - Madeleine L'Engle
- 4.27 The Handmaid's Tale - Margaret Atwood
- 4.05 The Lion, the Witch, and the Wardrobe - C.S. Lewis
- 4.01 The Alchemist - Paulo Coelho
One can also look at connections between the books, e.g. 'people who listed X also listed Y', using pointwise mutual information. In the network visualization, each node represents a book, sized by the frequency with which it was mentioned, as an edge represents an unusual number of co-occurrences of the two books in the lists.
There is actually another kind of network that forms. While some people shared the meme without tagging, calling on all their friends to make their own posts, others tagged specific friends whose favorite books they'd like to know about. Even a small fragment of the cascade shows long (tangled) tagging chains through which it diffused.
Do friends tend to like the same books? We computed the number of books shared between lists linked via tags, which was a mere 0.4 books on average! This number was 4 times greater than the overlap of 0.1 books between any two random lists. It is also an underestimate, since our automated matching identifies only 5.3 books/list on average (rather than the full 10), due to matching on just the 500 most commonly mentioned titles. Nevertheless, the low overlap underlines that even in a world of relatively few highly successful bestsellers, lists of favorites tend to be rather different, even between friends.
Finally, the remaining top 100 books were:
21 3.95 Anne of Green Gables - L.M. Montgomery
22 3.88 The Giver - Lois Lowry
23 3.67 The Kite Runner - Khaled Hosseini
24 3.53 Ender's Game - Orson Scott Card
25 3.39 The Poisonwood Bible - Barbara Kingsolver
26 3.38 Lord of the Flies - William Golding
27 3.38 The Eye of the World - Robert Jordan
28 3.32 The Book Thief by Markus Zusak
29 3.26 Wuthering Heights - Emily Bronte
30 3.22 Hamlet - William Shakespeare
31 3.21 The Little Prince - Antoine de Saint-Exupery
32 3.15 Sherlock Holmes - Sir Arthur Conan Doyle
33 3.15 Fahrenheit 451 - Ray Bradbury
34 3.12 Animal Farm - George Orwell
35 3.08 The Book of Mormon
36 3.05 The Diary of Anne Frank - Anne Frank
37 3.02 Dune - Frank Herbert
38 2.98 One Hundred Years of Solitude - Gabriel Garcia Marquez
39 2.83 The Autobiography of Malcolm X
40 2.78 Of Mice and Men - John Steinbeck
41 2.72 The Giving Tree - Shel Silverstein
42 2.68 The Fault in Our Stars - John Green
43 2.68 On the Road - Jack Kerouac
44 2.58 Lamb - Christopher Moore
45 2.54 Slaughterhouse Five - Kurt Vonnegut
46 2.53 A Prayer for Owen Meany - John Irving
47 2.52 Good Omens - Neil Gaiman and Terry Pratchett
48 2.45 The Help - Kathryn Stockett
49 2.44 The Outsiders - S.E. Hinton
50 2.42 American Gods - Neil Gaiman
51 2.41 Where the Red Fern Grows - Wilson Rawls
52 2.39 Stranger in a Strange Land - Robert Heinlein
53 2.38 The Secret Garden - Frances Hodgson Burnett
54 2.35 Little House on the Prairie - Laura Ingalls Wilder
54 2.35 Little House on the Prairie - Laura Ingalls Wilder
55 2.31 The Count of Monte Cristo - Alexandre Dumas
56 2.31 Pillars of the Earth - Ken Follett
57 2.29 The Da Vinci Code - Dan Brown
58 2.24 Brave New World - Aldous Huxley
59 2.21 A Tale of Two Cities - Charles Dickens
60 2.21 Les Miserables - Victor Hugo
61 2.16 Great Expectations - Charles Dickens
62 2.12 Night - Elie Wiesel
63 2.12 The Dark Tower Series - Stephen King
64 2.07 Outlander - Diana Gabaldon
65 1.92 The Color Purple - Alice Walker
66 1.89 A Thousand Splendid Suns - Khaled Hosseini
67 1.88 The Art of War - Sun Tzu
68 1.85 Catch 22 - Joseph Heller
69 1.85 The Bell Jar - Sylvia Plath
70 1.83 The Perks of Being a Wallflower - Stephen Chbosky
71 1.78 The Old Man and the Sea - Ernest Hemingway
72 1.76 Memoirs of a Geisha - Arthur Golden
73 1.75 Tuesdays with Morrie - Mitch Albom
74 1.73 The Road - Cormac McCarthy
75 1.72 Watership Down - Richard Adams
76 1.72 A Tree Grows in Brooklyn - Betty Smith
77 1.68 Where the Sidewalk Ends - Shel Silverstein
78 1.65 The Girl with the Dragon Tattoo - Stieg Larsson
79 1.65 A Song of Ice and Fire - George R. R. Martin
80 1.65 Are You There God? It's Me, Margaret - Judy Blume
81 1.64 Charlotte's Web - E.B. White
82 1.63 The Time Traveler's Wife - Audrey Niffenegger
83 1.62 Anna Karenina - Leo Tolstoy
84 1.62 Crime and Punishment - Fyodor Dostoyevsky
85 1.61 The Adventures of Huckleberry Finn - Mark Twain
85 1.61 The Adventures of Huckleberry Finn - Mark Twain
86 1.58 The Shack - William P. Young
87 1.56 Watchmen - Alan Moore
88 1.55 Interview with the Vampire - Anne Rice
89 1.54 The Odyssey - Homer
90 1.54 The House of the Spirits - Isabel Allende
91 1.53 The Stranger - Albert Camus
92 1.52 Call of the Wild - Jack London
93 1.51 The Five People You Meet in Heaven - Mitch Albom
94 1.51 Siddhartha - Herman Hesse
95 1.50 East of Eden - John Steinbeck
96 1.50 Matilda - Roald Dahl
97 1.49 The Picture of Dorian Gray - Oscar Wilde
98 1.47 Zen and the Art of Motorcycle Maintenance - Robert Pirsig
99 1.45 Love in the Time of Cholera - Gabriel Garcia Marquez
100 1.45 Where the Wild Things Are - Maurice Sendak
[An earlier version of this post had 2 clusters representing the Chronicles of Narnia series. When these were merged, the series rose up to #10]
[An earlier version of this post had 2 clusters representing the Chronicles of Narnia series. When these were merged, the series rose up to #10]
No comments:
Post a Comment