[Rhodes22-list] Web Interface: Searching the Archives

Thu Apr 4 18:42:57 EDT 2024

Wow. Thanks Peter.(I think). I will be giving the new process a try soon. After almost two years of rebuild, we managed to get our first sail in. SHE FLOATS. Some questions remain and I look forward to searching the new system.
Barry Ruehlen
SV Perseverance '87

Sent from my Verizon, Samsung Galaxy smartphone
Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Rhodes22-list <rhodes22-list-bounces at rhodes22.org> on behalf of Peter Nyberg <peter at sunnybeeches.com>
Sent: Thursday, April 4, 2024 4:47:11 PM
To: rhodes22-list at rhodes22.org <rhodes22-list at rhodes22.org>
Subject: [Rhodes22-list] Web Interface: Searching the Archives

As I mention in my last post, in order to support the web interface for the Rhodes 22 email list, the entire history of the emails sent to the list has been extracted from the Mailman archives and placed into a relational database.  This was primarily done to make sure that the user interface performed well, but it also has benefits for facilitating searches.

Previously, the only way to search the email list archives was to search the archive web pages.  On rhodes22.net, a search of the archives will instead search the database.  This allows searches to be more narrowly tailored.  For instance, you can choose to search just subject lines.

But there’s more…

As I also mentioned in my last post, most emails sent to the list have two sections of content: the new text written by the sender; and older content that was in the message being replied to.  The search page refers to the new content as ‘Original Text’, and the older content as ‘Quoted Text’.

A search of the archive web pages will look through and potential find hits in both Original Text and Quoted Text.  Often, the search word or phrase will be found over and over in the same segment of Quoted Text which reappears in many messages.  This can result in some pretty muddy water.

The process that extracts messages from the archives splits the original text from the quoted text and stores them in separate database tables.  This allows the search to optionally ignore the quoted text and just search the original text, which will probably produce a better result set.

The actual search engine is a black box provided by the Database Management System (DBMS).  If it doesn’t produce the expected results, there’s not much we can do about it.  But the limited testing that I’ve done indicates that it works pretty well.

If you’d like to check it out, you can find it at https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rhodes22.net%2Femail-search.html&data=05%7C02%7C%7Cc276b209eff44e38381d08dc54e86950%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638478604392887448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=fZ6ptn3%2F%2ByXlc%2B3DwCE%2FKUEQi%2F4n4prjgqeGCfIc55A%3D&reserved=0<https://www.rhodes22.net/email-search.html>

—Peter

[ Sent From rhodes22.net ]