Saturday, April 13, 2024

On data and types

TL;DR -- Recent events in computing have many worried or concerned. Some are running after the opportunity for new ways of being mean-spirited and domineering. A little word of magic might be "transform" which has been touted many ways. Of course, this goes into the mathematics. We can shed a new light on the issues but have to go back further than the 20 years of the mess making. We go back to 1837. That is how old many of these ideas are. Over time, we'll come forward with a proper history of computing and its enabler based upon mathematics. 

--

In the 21st century, we have had over 200 years of experience of data on individuals in the U.S. Why we use the U.S. is due to the historical aspect of this discussion. Now, people have been tracked all over the place. Russia (Czar on down) has been good at this. Their novelists told the tale. Tolstoy may have acted like a peasant (serf), but he was none. And, we have information on his life and activity, some of it given to us by the man himself. No doubt, there are lots of the peasant class who never got any attention. 

So it went across the whole world.  The U.S. started their systems from the getgo, 400 years ago. In some towns, at least the birth, marriage, and death were recorded for a person. Many have less information or none. Some have more, including books about their lives. Other places did similar recording. But, the U.S. is unique for several reasons to discuss. 

For now, let's remind ourselves that data handling involves technology. The colonial U.S. used pen and paper. Some of those records have been digitized which is not of interest until there is an effort at transcription and labelling. Sometime in  the 19th century, there was a change. We found a way to print records; too, the typewriter came to be. Both of these increased the quality of the record, somewhat. Though, such an evaluation would require us to discuss content versus configuration. There have been some posts on that, but a more full exploration of the topic is on our to-do list. 

Sometime in the 20th century, the computer came to be. GIGO (garbage in; garbage out) was coined to account for how willy-nilly use of the technology did not contribute to quality. The data industry got better at handling errors (of many types to discuss); however, abilities of the artificial elements outran our capacity for pursing quality and pushed us toward a lackadaisical approach. We are now paying the price. 

What the 21st century just brought must be linked back to the mobile device that came into existence in 2008 which represents what can be called "edge" computing that we'll hear more about as time goes on. Not only was data accumulation increased, there was no way to do the proper deed of curating nor was there any inkling of desire to recognize the coming problem. In 2022, things changed when a focus on machine learning (ML) reared its head. ML is mathematics in action. Below, we will briefly touch upon one huge issue. 

First, though, that little system (let's use ChatGPT - and call it CG) hit the airways and the cloud (which we'll go into in depth), millions signed up. Myself, I was not aware of the event and the reactions for two months for many reasons. One of these is that this stuff is old hat (below), and I made my peace with the technology decades ago. But, when I did become aware, I looked back and forward and sideways. The first was refreshing my memory of the long trek of technology; the second was considering all of the possible ramifications, most of which would turn out to be unintended consequences of an other than positive nature. 

However, insightful people saw immediately that this was toy stuff. Too, it was more for entertainment and gaming. Folks in the real world of metrics and accomplishments of notable scope saw the surface nature. Myself, in terms stated above, we saw more configuration than content even though the purveyours of the mess touted that they had assorted hundreds of millions of parameters in their effort at having the machine learn. 

That statement goes along with what people think of in terms of power (omnipotent) and knowledge (omniscient) and a few more. Why was that a choice? It turns out that studies show that removing the "crap" (so now, we have CICO) is not possible given the current state of computing. And, in terms of complexity, most likely never. 

So, we get to the gist, immediately. People have been solving problems from the getgo. Where was the user respected and given proper control beyond the playing around aspect of "prompt engineering' which would not converge to any "truth" of value. 

We will address all of this again. But, let's go back a few years. A huge player touted that they went with "transforms" and accomplished remarkable results. Sure, like the "fakery" of several types. That is not a new concept. I have written that several times. This is old math. 

------

So, let's step back a couple hundred years. Oh yes, think of what was going on in the U.S. at the time. We have lots of posts about events on these shores and the people involved. In the west, Jedediah Strong Smith had been out west for a while having crossed the continent by foot and horse (and water, at times). He was about to meet his demise in KS on his way back. 

There's a lot more. We will look at a book (below) which is a summary of the collection by the Bourbaki writers who used a common pseudonym. These were books that covered mathematics from an axiomatic basis while developing what might be called a standard view. We are skipping around more detail while we focus on the mathematics largely behind the CG and its peers. 

The author, quoted by Bourbaki, in 1837 writes about the growing interest in transforms as they showed promise. There are many names to mention, however the techniques of Fourier which we will look at later are an example. Now, he says that anyone using the techniques can generalize and obtain new truths. Too, he uses the metaphor of adding "a stone to the structure" which very much applies to what we are seeing. 

We will go into the AIn't part, regularly. But, as a reminder, there is no creativity involved except on the part of the human. There ain't no creature in the box. There is something, oh yes. Superb mathematics which was not possible before the computing. 

Overview of Bourbaki's work
In this sense, we need to review the claims. All of the methods are of old age. They were very difficult to grasp and never easy to handle by manual computation even with a sophisticated slide rule. Early computing spent a lot of time to learn how to approach the use using methods that are called numeric processing. Algorithms are the name of the game. And, these were never simple. 

Note, too, that the author (Chasles) addresses the issue of being confounded by the results. Remember last year's discussions about black box and lack of understanding. Well, that state of confusion was a contrivance, mostly unexpected. In some cases, we could look for actual intentions to have this effect. 

This year, 2024, has been encouraging in terms of people recognizing the problems and trying to figure out how to go forward. We'll be putting our hat in the ring there.  

Now, who is the author? Michel Floréal Chasles was a French mathematician. He might seem to have been obscure, but in the U.S., he was recognized for his work. He lived from 1793 to 1880. His name is enscribed on the Eiffel Tower

Remarks: Modified: 11/13/2024

04/14/2024 --  So, in the post, I mentioned some good news. But now, bad news? The IEEE Spectrum gloated about perplexity.ai. So, I had to go look. And, my first two tests failed. That is, the thing pulled from the appropriate sources. It was too creative or something else in the output. ... My suggestion to you guys doing this stuff: some of us would prefer that your give us some summary without the embellishments (as your approach is not creative) with footnotes. In fact, the footnote ought to be the phrases that fed into the summary. Oh, cannot be done? Then, let's go back to the drawing board. This old guy has time and knows how to do this. It's related to my work on truth engineering.  
I'm referring to a context sensitive encyclopedic approach. That is, without the "omni" aspect that seems to excite the younger crowd. Want to know the real "Omni"? We can talk that. ... Some of my grief is that the thing is following what might be misguided text, anyway. The hope would be that enough sources would be good enough to pull toward something useful. Or, do curating before hand, as many are saying (the older xNN experts). ... 

11/13/2024 -- Moved image. 

No comments:

Post a Comment