Published on in Notes

I’ve been experimenting with AI music generation software lately and I have found it to be quite interesting. I’ve tried two programs, Mubert and Soundful, and I was pleasantly surprised with the results I got. Although the music generated wasn’t very creative, it was good at replicating common background music styles. There was no noticeable distortion or excessively syllabic sounds in the audio, which I have seen in other services in the past.

The biggest issue with these AI music services is licensing. Most of the services I’ve tried claim copyright over the music generated using their tool and put many restrictions on how the music can be licensed back to the user. You can’t use it in certain projects, or distribute the music on its own. Mubert has particularly restrictive licensing terms. To obtain full rights to the audio, the cost can be exorbitant, with Mubert charging upwards of $400, while Soundful, which has more reasonable terms, charges around $50. It almost seems like these services are trying to price their product just below what human artists would charge. This business model doesn’t make much sense.

When it comes to the ethics of AI art, I don’t think it will harm artists as much as some people fear. For example, these AI music services I talked about can only replicate the most basic and generic types of music. They are still far inferior to human artists, even for creating background music for YouTube videos. I plan to get into streaming and making YouTube videos, and for that, I will still probably go with conventional music. I believe AI art will simply become another tool for creating art, reshuffling the deck a bit and potentially putting some people out of business or into business, but it won’t be a sea change in the industry.


Some thoughts on the ethics of AI art/generative AI

Published on in Notes

AI art is getting a lot of controversy for its implications for current artists. What will it do to employment prospects in the arts? What about the copyright implications? What about all of the art that is used to train these models?

All of those questions are important things to think about.

I for one think that some of the fears of human artists getting fully displaced by automation is a bit over stated.

I think it won’t displace artists as much as people are worried about. I think it will be just another tool that’s used to create art.

It may reshuffle the decks a bit and maybe put some people out of business, put some people into business, but it won’t be as much of a sea change in that regard as many think.

However, what worries me the most about the increasing role of AI tools is their closed nature. As these increasingly sophisticated AI models do more and more, not just in the field of art but in every aspect of our lives, it’s crucial that these tools are open and accessible to everyone.

Unfortunately, that is not the case with most of these tools. Currently, the AI models and their outputs and inputs are owned by just a few companies, leaving most users locked out.

I have a strong concern that this will concentrate the art market, displacing the decentralized infrastructure and ecosystem of small business artists with a much more centralized art world, dominated by a few companies that provide tools that play an increasingly critical role in creating art in the modern world.

The majority of the significant recent generative AI models are proprietary, from AI music generators to tools like GPT and MidJourney. These tools are not even available for use on your own computer, instead, you have to send your inputs to be processed on a cloud server owned and maintained by the authors of the AI model. Even a few models that are source available (and even marketed as open source), like Stable Diffusion, are not fully free and open source.

One reason for these models not being free and open source is what some sources call “toxic candy models.”

As per this memo by a contributor to the Debian Linux distribution, writing in regards to determining which AI software should be included as FOSS, these are models where the algorithm’s weights and other parts are a complete black box, and you only receive the final output of the model generation process without information on how it was generated.

This includes models based on data/input scraped from the internet. This results in situations where the art used to create the final model is usually proprietary, and the legality of even doing this scraping is in dispute. And of course the companies can’t distribute that art to anyone who wants to modify or fully understand the model. They can’t provide a full list of every bit of art they used.

So you as a user, if you want to fundamentally modify what’s fed into those models, if you want to see what’s fed into those models and figure out where the model gets what it gets from, you fundamentally can’t under the current ecosystem.

Access to the data is necessary for users to fully understand, modify, use, and build own their versions of the model based on this.

I think that this issue is adjacent to one of the more plausible arguments that the models should be considered a derivative work of the input art – but I am not sure if I endorse such an argument.

A concerning trend is for companies producing source-available AI models to release them under non-free and open-source licenses that do not meet standard guidelines for open-source licenses, such as the FSF definition, the OSI open source definition, or the Debian Free Software Definition.

The most notable of these licenses is the Responsible Artificial Intelligence Source Code (RAIL) License, which imposes restrictions on how users can use the output generated by the tool.

This is similar to proprietary companies that claim copyright interest in the output of their program

This is a departure from the open-source community’s consensus that the software developer does not have ownership over what people use the software for – despite the fact that some of the companies involved still attempt to claim to be open source friendly.

There is a movement in the software industry, particularly in the AI world, for developers to dictate what users can do with their software.

This mindset and movement asserts that the developer of the software has, in terms of both moral obligation and right, and in terms of the legal ability to enforce this, has the duty and ability to basically dictate what users do with this software.

https://facctconference.org/static/pdfs_2022/facct22-63.pdf

(From the Abstract) A number of organizations have expressed concerns about the inappropriate or irresponsible use of AI and have proposed ethical guidelines around the application of such systems. While such guidelines can help set norms and shape policy, they are not easily enforceable. In this paper, we advocate the use of licensing to enable legally enforceable behavioral use conditions on software and code and provide several case studies that demonstrate the feasibility of behavioral use licensing. ```

From Pg 4 In this paper, we seek to encourage entities and individuals who create AI tools and applications, to leverage the existing IP license approach to restrict the downstream use of their tools and applications (i.e., their “IP”). Specifically, IP licensors should allow others to use their IP only if such licensees agree to use the IP in ways that are appropriate for the IP being licensed. While contractual arrangements are not the only means to encourage appropriate behaviour, it is a mechanism that exists today, is malleable to different circumstances and technologies, and acts as a strong signaling mechanism that the IP owner takes their ethical responsibilities seriously. ```

This has the potential to spread beyond the AI world and impact the norms of the software industry as a whole. This mindset is in blatant contradiction of not just the norms of the open source community, but also the old norms of the software industry as a whole.

The expansion of copyright for AI technology is a big concern. The RAIL license, used by Stable Diffusion among others, is an interesting and notable case.

The developers behind this license believe it is necessary to prevent harmful and irresponsible uses of their products, and they believe that AI technology has a lot of potential for misuse. They argue for the need to come up with a legally enforceable mechanism to limit potentially irresponsible uses.

https://www.licenses.ai/

Responsible AI Licenses (RAIL) empower developers to restrict the use of their AI technology in order to prevent irresponsible and harmful applications. These licenses include behavioral-use clauses which grant permissions for specific use-cases and/or restrict certain use-cases. In case a license permits derivative works, RAIL Licenses also require that the use of any downstream derivatives (including use, modification, redistribution, repackaging) of the licensed artificial must abide by the behavioral-use restrictions.

But I don’t particularly agree with the necessity of using copyright as a means for this.

However, I do not agree with the use of copyright as a means to achieve this.

AI art generation may do different things than traditional art methods, but it’s not as much of a game-changer as some people claim. AI is just a buzzword for things that seem computationally practical based on everyday experiences, but where practical algorithms are new or nonexistent. Today’s AI techniques will become tomorrow’s conventional art techniques, and software tools for modifying and creating art have existed for a long time, such as Photoshop and GIMP. These AI tools are just an extension of digital art.

Artistic controversies, such as whether or not something is real art, have arisen before with new forms of art, such as photography. AI art is just another method of art that uses technology to probe and sample an extrinsic space outside of the artist’s mind, similar to how photography creates art by sampling from from the physical environment.

In both cases, the artist’s creativity comes from knowing what to sample, and how to sample it, thereby the creation of novel art is possible.

Both conventional art methods and the new AI art have a lot of the same ethical issues.

For example, one that gets mentioned a lot is the ability of AI art to potentially create fake media. Images that look like they’re of a real person or of a real event, but aren’t actually representative of the world.

However, traditional means for visual art also have lots of ways to be misleading, manipulated, edited, and staged in a way that doesn’t reflect the real world. People often overestimate how accurate visual arts, especially photographic arts, are at truly representing the world. The new technology driving new ways to manipulate and generate imagery may reset the social environment around visual arts to something that’s actually more healthy and representative of not just what AI art is, but what visual art has always been.

The extreme (but unlikely) case might be when fakes become so common that the only way to trust an image is to know where it came from, its history. This would reduce visual imagery to how it was perceived before modern photography became widely available, in which you had to trust the testimony of the artist or the author for wherever you were getting the image from.

Every medium of art or expression has the ability to mislead and be misused, and the mechanisms that society has to limit that misuse don’t need to change with this new technology. The needed legal mechanisms already exist, such as defamation law, to limit the use of faked images to lie about someone. Attempting to bring copyright into what’s been traditionally handled by defamation law is an attempt to rewrite the balance. Copyright has different, often more extreme penalties, than society has seen fit to impose for conventionally.

And how society has deemed it proper to handle things like lying about people or deliberately misleading has been constructed by the process of democracy and centuries of societal experience, to optimize various societal trade-offs. To balance negative social effects of potentially dangerous content and or damage to people’s reputation, disseminating, versus the importance of freedom of expression.

This type of rulemaking is fundamentally anti-democratic and technocratic, as it appoints those who write the license and push the rules as arbiters of how society should handle these risks. It also doesn’t take into account the ways in which humans can fail, sometimes more than machines can fail. For example, traditional human forensic methodologies can also be very inaccurate, yet still entered as evidence.

The use of AI technology raises many important questions about its potential misuse and accountability.

But it is not necessarily true that AI technology is worse than humans in many cases often discussed.

For instance, consider the process of creating a sketch of a suspect. A witness description could be interpreted by a human sketch artist or an AI model, both of which are interpretations and not the ground truth. The AI system may even come up with an equal or better interpretation than the human.

It is crucial to have a wide social debate about the trade-offs of AI and where its limits lie. When is AI better than humans, and when has society already gone too far in trusting human methods? AI has many of the same limitations as humans, but it may demonstrate those limits in a way that prompts society to reconsider its past decisions and to be more responsible with both human and automated decision making.

There is also the issue of accountability, especially when it comes to the normal legal system. A top-down institutional approach to limiting technology has much less accountability to the public and lacks a wide range of perspectives, leading to less legitimate and often worse results.

I believe this mindset could spread throughout the software industry, including to places where it would be very dangerous.

If this idea of social responsibility of companies and developers to restrict their users becomes more widespread, it would rewrite the balance of power between software companies and consumers in favor of the companies.

Imagine if this mindset is taken to conventional tools. Imagine the world in which Microsoft is treated as both in terms of legal power and in terms of generally perceived ethical responsibility as responsible for what a writer does with Microsoft Word. Or if Adobe is considered in the same way responsible for what an artist does with Photoshop or Illustrator and so on.

It would be no longer a world where you can do what you want with a piece of software that runs on your computer. Someone else, someone with limited accountability to you, would have a lot more power over what you can do on your own computer.

The companies who make the software you use would have more power over what you can do with their software, and this change could make the world a much worse place.

A point raised in the previously linked discussion of responsible AI licenses is the idea of authorial integrity over software. The developer or the company who produced it holds the mindset and vision that should influence what users do with the software. It is contended that this artistic or authorial vision should also affect everyone downstream. However, using the software in a way that is not part of that vision is essentially violating the rights of the author or the developer or the company.

https://facctconference.org/static/pdfs_2022/facct22-63.pdf

(Pg 2) The context in which a model is applied can be far removed from that which the developers had intended, a major point of concern from the perspective of human-centered machine learning [31] … applications that may be of concern, such as large-scale surveillance or the creation of “fake” media. In some cases, the developers or technology creators may legitimately want to control the use of their work due to concerns arising out of the data that it was trained on, the technology’s underlying assumptions about deploy-time characteristics, or the lack of sufficient adversarial testing and testing for bias. This is especially true of AI models that are difficult or expensive to recreate. For example, given that models such as GPT-3 [17] reportedly cost over $10 million (U.S.) to train, very few organizations are positioned to train (and potentially, need to retrain) a model of similar size

The mindset that the developer or the company has control over the software is incorrect. There is a big difference between functional works and creative works, and software falls into the category of functional works. Software is essentially a description of a process and a set of instructions, a tool that is used to guide a method.

It’s like a recipe or a textbook telling how you need to mix the paints to get a color. It’s not the painting that uses that color.

Control over the software used to make art, is fundamentally exerting control over a method, over a technique that’s represented by that software.

A work of art is a final product that can stand on its own, a work that’s enjoyed by itself. In that case, an artist can have an actual creative vision that’s put through into their art. And I think that doesn’t work when you get a tool like software.

The paper raises the cost of creating the software as a reason for preserving the vision, but I believe that considering the cost of software development moves things in the opposite direction.

In the art world, there is potential for substitutes, for other artists to come in and make a work of art that reflects their vision without necessarily needing to modify or use what another artist has done. The resources available to make art are often common enough or inexpensive enough that many visions of what art should be can coexist with each other. You can have many artists creating many works, and each of those works with their own vision.

But when you get a software program that costs tens of millions of dollars, even hundreds of millions of dollars to produce, a normal person can’t step into that competition, they can’t step into that creative process around developing software.

With software, the cost of production is so high that a normal person cannot compete in the creative process of developing software, giving the developer or copyright holder a lot of power over society.

Once you include the case of interlinked supply chains, programs that are dependent on other programs, and the entire tech stack would have to be rebuilt from the ground up to have a different vision, which is infeasible even for the wealthiest person on the planet.

This is why the freedom to use and modify software and expand upon it is important and critical. Asserting that copyright holders or companies or software developers have the right or obligation to restrict its use is very dangerous.


Published on in Notes

Experimenting with Owncast – an open source twitch-like streaming application.

First stream will probably be sometime this evening.

Because of the potentially high bandwidth usage, I’m setting it up on a VPS that has higher internet bandwidth than anything I use. This VPS will probably get used for other livestreaming/messaging tasks, and various other things I don’t want to be 100% dependent on my home internet connection.


Published on in Notes

I believe that the best camera for beginners today is typically a fixed-lens bridge camera or a point-and-shoot. In the past, I have suggested to people that the first camera they purchase should be an entry-level DSLR or other interchangeable lens camera. However, it appears that the majority of camera makers are abandoning the sub-thousand-dollar interchangeable lens camera market in favor of the high end market.

I recently used Canon’s entry-level DSLR, the EOS T7, and it was not a positive experience. I experimented with it to get a feel for how it captured stills, how it took video, and everything else, but primarily I was curious as to whether or not it would make a suitable streaming camera to keep in a fixed spot and hook up as essentially a webcam utilizing Canon’s webcam tool. And even compared to the experience of using some simple point-and-shoot cameras, it was a step down.

In many ways, the experience was inferior to that of a cell phone camera, and it felt antiquated. It also appeared as if the manufacturer developed the camera as a low-effort, entry-level product.

In addition, Canon is discontinuing its EOS M line of interchangeable-lens mirrorless cameras, which was formerly a very capable system. It is  the first major camera platform I got into, but it appears like Canon won’t really develop that many new cameras for that platform. I’m kind of bummed about that. They are moving on to the EOS R full frame system, which is more expensive, thus they have no plans to continue making lenses for that format.

In contrast, the market for budget-friendly point-and-shoot cameras has greatly improved with the introduction of optical image stabilization and computational photography features. A point-and-shoot camera with a one-inch sensor gives an excellent experience for a variety of everyday situations. It can perform well enough in low light that you can use it for the most of your daily tasks. Moreover, bridge cameras are becoming increasingly competent. Bridge cameras are, of course, fundamentally a more expensive market if you choose something with low-light performance. However, I believe that even the cheapest bridge cameras and super zoom cameras may produce decent results in the typical situations where one would use them.

In addition, there is a world of premium point-and-shoot or fixed-lens cameras that have also become quite good. So I sold my interchangeable lens system save for film cameras and switched to fixed lens cameras for most of my hobby work, because I believe it is a better deal these days.


Published on in Notes

Created another script that uses a single run of the edit mode of GPT-3 with a high temperature (ie. giving GPT-3 a high degree of creativity). But it runs it three distinct times.

https://gogs.theopjones.blog/theo/LittleScripts/src/master/transcribefoldermultiple.py The results are interesting

Unedited Transcript

I’ve been experimenting a bit with using GPT-3 to process speech-to-text transcripts, which in their raw form contain no line breaks, no paragraph breaks, kind of off-text because it’s a direct transcription of my speech, like not how I would normally write it. I’m feeding, and I have a little Python script written to feed these raw, unprocessed speech-to-text transcripts into GPT-3. Of course, GPT-3 can’t be ran locally, so it has to make an external API call. But how the script I wrote works is it makes one API call to have the text split up into individual paragraphs, and it makes another set of API calls for each paragraph to correct the grammar, style, spelling, and all of that. I did the two-part thing because based on my experimentation, GPT-3 doesn’t really like being given a huge wall of text, so splitting it up into paragraphs is one of the best techniques I found to get GPT-3 not to remove a lot of text without creating replacement text or add totally new text. From what I can tell, the little script I wrote is able to keep things pretty faithful to how I originally dictated while still punching up the grammar and resolving a lot of the editing I would have to do to make a speech-to-text transcript usable on my blog or something. So I think it’s helpful because it reduces a lot of really error-prone stuff that comes with using speech-to-text to write. I’ve uploaded a little Python script. I’ve used use slash created, and you can find it below.

Run 1 of GPT-3

I’ve been experimenting a bit with using GPT-3 to process speech-to-text transcripts, which in their raw form contain no line breaks, no paragraph breaks, kind of off-text because it’s a direct transcription of my speech, like not how I would normally write it. I’m feeding, and I have a little Python script written to feed these raw, unprocessed speech-to-text transcripts into GPT-3.

Of course, GPT-3 can’t be run locally, so it has to make an external API call.

But how the script I wrote works is it makes one API call to have the text split up into individual paragraphs, and it makes another set of API calls for each paragraph to correct the grammar, style, spelling, and all of that.

I did the two-part thing because based on my experimentation GPT-3 doesn’t really like being given a huge wall of text, so splitting it up into paragraphs is one of the best techniques I found to get GPT-3 not to remove a lot of text without creating replacement text or add totally new text.

From what I can tell, the little script I wrote is able to keep things pretty faithful to how I originally dictated while still punching up the grammar and resolving a lot of the editing I would have to do to make a speech-to-text transcript usable on my blog or something.

So I think it’s helpful because it reduces a lot of really error-prone stuff that comes with using speech-to-text to write.

I’ve uploaded a little Python script. You can find it below.

Run 2 of GPT-3

Currently, all of the transcripts on my YouTube channel are just raw, unedited speech-to-text outputs. They’re posted in their raw form so that they’ll be more accessible to people with all types of cognitive impairments who may not have the mental capacity to parse a transcript that’s been edited to improve grammar, spelling, sentence structure, style, and probably many more things I’m forgetting. Unedited speech-to-text transcripts are also more accurate since there’s no bias about word usage. Since speech-to-text is closer to merely transcribing all the words that I actually said, it’s more accurate than an edited transcript will be because I’m speaking with some bias and editing to clean up the transcript would increase that bias. However, the unedited speech-to-text transcripts increase the difficulty of parsing the text. For someone with severe speech-to-text-related cognitive impairments, they may continue to be beneficial, but for a lot of people with some type of speech-to-text-related cognitive impairment, an edited transcript would be helpful. That’s where GPT-3 comes in. GPT-3, because of its ability to edit, process, and rework text, can be used to make speech-to-text transcripts more accessible for people with a wider range of learning styles and ways of processing text. In short, GPT-3 can help turn unedited speech-to-text transcript from something that’s not very accessible to something that’s more accessible for a lot of people with different kinds of cognitive impairments. This seems like a common sense thing. Since most online videos have speech-to-text transcripts, most online speech-to-text transcript must be pretty difficult to parse, especially if someone has a mild learning disability of some sort.

GPT-3 can be used to punch up unedited speech to text transcripts, improving their readability and overall quality. I’ve been experimenting with using GPT-3 to process speech-to-text transcripts, which in their raw form contain no line breaks, no paragraph breaks, and are generally kind of off-text because it’s a direct transcription of my speech, not how I would normally write it. I’ve written a little Python script to feed these raw, unprocessed speech-to-text transcripts into GPT-3. Of course, GPT-3 can’t be ran locally, so it has to make an external API call. But the script I wrote works by making one API call to have the text split up into individual paragraphs and making another set of API calls for each paragraph to correct the grammar, style, spelling, and all of that. I did the two-part thing because based on my experimentation, GPT-3 doesn’t really like being given a huge wall of text, so splitting it up into paragraphs is one of the best techniques I found to get GPT-3 not to remove a lot of text without creating replacement text or add totally new text. From what I can tell, the little script I wrote is able to keep things pretty faithful to how I originally dictated while still punching up the grammar and resolving a lot of the editing I would have to do to make a speech-to-text transcript usable on my blog or something. So I think it’s helpful because it reduces a lot of really error-prone stuff that comes with using speech-to-text to write. I’ve uploaded a little Python script. You can find it below.

Run 3 of GPT-3

I’ve been using GPT-3 to process speech-to-text transcripts, which are quite rough in their original form.

They have no line breaks or paragraph breaks, and their formatting is off because GPT-3 has to read directly from audio.

But I wrote a little Python script which takes these unprocessed transcripts, inputs them into GPT-3, and the script outputs the corrected versions.

Since GPT-3 can’t be accessed locally, the script has to make external API calls.

The script works by making one API call to have the text split into individual paragraphs, and then another call for each paragraph to correct the grammar, style, spelling and such.

I did the two-part thing because, from my experimentation, GPT-3 doesn’t really like being given large walls of text, so splitting it into paragraphs is one of the best ways to get GPT-3 to not remove a lot of text without creating replacement text or adding totally new text.

The script I made seems to be able to keep things pretty faithful to the original while still enhancing the grammar and resolving a lot of the editing I would have to do to make the transcript usable on my blog or elsewhere.

So I think it’s helpful because it reduces a lot of the really error-prone stuff that comes with using speech-to-text to write.

I’ve uploaded the Python script to use slash created - you can find it below.


Published on in Notes

I’ve been experimenting with using GPT-3 to process speech-to-text transcripts. These transcripts, in their raw form, contain no line breaks or paragraph breaks, and are not how I would normally write because they are direct transcriptions of my speech. I have written a small Python script to feed these unprocessed transcripts into GPT-3. Of course, GPT-3 cannot be run locally and requires an external API call.

But how the script I wrote works is that it first makes one API call to split the text into individual paragraphs, and then it makes another set of API calls for each paragraph to correct the grammar, style, and spelling. I opted for the two-part approach because, based on my experimentation, GPT-3 doesn’t really handle large blocks of text very well. So, splitting it up into paragraphs is one of the best techniques I’ve found to prevent GPT-3 from removing too much text without creating replacement text or adding totally new text.

From what I can tell, the small script I wrote is able to keep things faithful to how I originally dictated, whilst still improving the grammar and resolving much of the editing I would have to do to make a speech-to-text transcript usable on my blog or something. Thus, I think it’s helpful as it reduces a lot of the error-prone aspects associated with using speech-to-text to write.

The script can be found here https://gogs.theopjones.blog/theo/LittleScripts/src/master/transcribefolder.py (this post is just the output of this workflow, with minimal additional editing)


Published on in Notes

In response to the post quoted below

“So what’s the deal with Mastodon anyway. Is it the prospective post-Twitter Musk-hater meeting place? Why would anyone choose to name their company after a prehistoric animal that humans hunted to extinction?”

The short and quick answer to that is that Mastodon is an open source program that provides Twitter like functionality. It’s something that you can use to set up a social media website of your own.

It is possible for different instances of Mastodon to talk to each other but this highly depends on how the particular administrators have their instances configured and it is fairly common for instances to refuse to communicate with each other often for trivial reasons or just the administrator’s personal preference.

So I would call Mastodon at best a semi-decentralized system because the general assumption of Mastodon is that most users will join an instance that’s ran by someone else and that most users won’t run their own instance. There is very limited portability of accounts between instances. Identity on Mastodon is completely tied to the individual instance.

It is possible to run your own instance just for you, and get some other instances to talk to your instance, but most people use other instances. The software isn’t really built for one user instances, and generally assumes that an instance has a lot of users. Managing a Mastodon instance is relatively complicated compared to a lot of other server software.

The protocol that allows Mastodon instances to talk to each other kind of sort of resembles RSS but it’s push based so an instance will notify other instances of new posts instead of pulling a list of posts from the instances. This results in a pretty different ecosystem because content tends to propagate from one instance to another and the usual configuration of a Mastodon instance is that it will mirror the content of the instances that it is connected to, and in some cases give users a feed of that content.

Mastodon instances usually have much heavier handed moderation than other social media.

Mastodon.social is probably the most popular Mastodon instance and when you hear people talk about Mastodon it or instances under similar management are what people are talking about.

Truth Social and Gab are also Mastodon instances, but probably aren’t what people who talk about “Mastodon” mean. Most other Mastodon instances are very left-wing in their management.

My take on Mastodon is fairly negative. I think it’s a system that somehow manages to reproduce kind of worst of Twitter but also has none of the benefits of true decentralization.

Most of what Mastodon is good at can be fundamentally done by other ways. My opinion on this type of thing is that the protocols of the old open blogosphere fundamentally worked. The reasons why the old blogosphere kind of died out are unrelated to the things that Mastodon is optimizing for.


Published on in Notes

I’ve been doing a bit of research to see what blogging engines exist that are in between WordPress (which is kind of a bloated mess) for running a small blog and Hugo and other static site generators which don’t have web based UIs and a few other features.

An interesting one that’s a minimalistic blogging engine but still not quite static site generator level minimalistic is Bludit. It looks like a very minimalistic blog that’s just that doesn’t have like a lot of extra features or bloat to it. It supports markdown.

It’s not a static site generator but it has a flat file data structure to it so it’s easy to backup because it doesn’t have a MySQL database and there’s not kind of that extra bloat of a MySQL database running on your server. That’s either you have to tolerate more RAM usage or break containerization by using a kind of shared MySQL across all services on your server. So it looks like a pretty good option. I haven’t replaced Hugo with it on my blog yet but it’s a very interesting minimalistic blogging engine from what I can tell and how I’ve experimented with it a bit so far.


Published on in Notes

I’ve done some investigation recently to try to figure out what’s the cheapest GPUs around that would work for machine learning type tasks like running whisper or similar. I have a fairly beefy GPU in my computer, the A4000, which is an unusual configuration. It’s a workstation GPU, not a consumer GPU. And it’s a fairly high end GPU. I kind of got it because I mostly do productivity stuff on my computer, like photo, video, editing, some GPU intensive compute processes and things like that. But looking a bit into if there are lesser GPUs around just for recommendations to other people that would work. I think the obvious thing, and it’s the one situation I tested with old equipment I have around would be the RTX 2060. It’s kind of a consumer GPU, it goes used for about $200 from what I can tell on eBay and new for about $300 in a 12GB model. It’s the cheapest consumer GPU that has high VRAM.

And for most machine learning tasks that I’m interested in, VRAM is the limiting factor to an extent that’s not true of gaming. On eBay I was able to find old workstation graphics cards that have a lot of RAM. One good example is the Nvidia M40, it has 12GB of RAM and I’m seeing it used for around $100. Like the absolute cheapest one that has enough RAM that I’m seeing is the K40, the Nvidia K40. And that also has 12GB of RAM. I would say the M40 would get pretty reasonable performance. The M40 has a pass mark score on GPU compute of 3775 operations per second. Kind of comparing that to the GPUs that I’ve kind of ran Whisper on, I guess it would do the large model in approximately one to one timing. One minute of audio input would take about a minute to process.

The GPU that I have, the A4000, gets about four to one. Four minutes of audio input would take a minute to process. The cheapest GPU that I’ve found that has enough VRAM has the $45 K40, has a pass mark score of around 2000 ops per second and that would like I think get like two minutes of processing time for like each minute of audio or maybe slightly worse than that. But I think like, I think there are a lot of kind of cheap GPU options if you’re using the type of workflow that I use. And you just feed the speech tech software a pre-recorded recording and let it transcribe.


Published on in Notes

I am talking while going through my feed on Tumblr. I am going to talk about interesting posts as I see them. And then I am going to feed this recording into transcription software.

https://northshorewave.tumblr.com/post/697583950171865088/whats-the-issue-with-your-macbook-as-far-as-i

The first interesting post that I see is North Shore Wave talking about switching away from macOS. I recently actually switched away from macOS myself. A lot of the reasons why I have been switching away from Apple products is that Apple’s business practices have become a lot worse. So last year I decided to just start doing a switch over to alternatives. I bought a proper workstation desktop and put Linux on it. I sold my Macbook and I switched to basically using Linux and Windows as my two primary OSs.

I think the biggest issues I ran into with the transition from macOS are just device incompatibilities. I have a few specialized devices like a sound recording DAC and a few things like that that are kind of paired to either the Macbook hardware like the Thunderbolt port that isn’t like really common on PCs or is also dependent on some of the macOS software and doesn’t work well on Linux and or Windows. Then I’ve also ran into the issue where at some point my workflow kind of depends on proprietary software a bit. I eventually will run into something where there’s not a good open source alternative or the open source alternative is really different and I have to kind of experiment around to find the proper alternative.

For me that came up a lot when it comes to photography stuff because like my workflow got like built around Photoshop and Lightroom and the thing with Photoshop and Lightroom is there is not really a single program that does like everything that Lightroom does. And for Photoshop there are open source alternatives like GIMP but it’s kind of just not the same in terms of how good the UI is and how good just the user experience and functionality is.

For Lightroom the issue is basically that like Lightroom does a lot of stuff. Lightroom kind of does digital asset management and backup. You put your photos into it and it backs it up to remote storage which kind of sort of has privacy implications since Adobe’s remote storage but a lot of my hobby photography stuff isn’t that important from a privacy perspective. So it automatically makes backups. It automatically is able to sync between all devices so if you’re working on a tablet you can sync to that or if I take a bunch of photos I’m on a computer that I don’t have Photoshop installed or I don’t want to have Photoshop installed I can go into the web app and just upload things from there.

Fortunately due to the pervasiveness of Chrome OS Adobe is starting to have an actually really good web app so it’s possible to just use Lightroom as I normally did now. That a little while ago didn’t used to be the case and still with Photoshop the actual web application version of it is just garbage. It’s totally terrible.

So like for switching away from Mac OS it’s a process that took a while and like I think I’m finally getting rid of the last Apple device that I have like that I use on a regular basis. I recently bought an Android phone and I moved my phone plan over to the Android phone. It’s a Google Pixel. It’s a small Google Pixel not the full one. I kind of still have my iPhone because I’m kind of just moving data between the two but pretty soon I’ll get rid of the iPhone and just only have Android. That will be the last big Apple device.

I’ve kind of already switched away from the iPad. I’ve switched away from the Macbook and like yeah I don’t rely on Apple services as much. Like it kind of does feel a bit weird switching to Google since Google is also a big company that does a lot of things wrong. The thing with Google is where they get really bad is privacy and like that feels like kind of a lost cause. What Apple’s doing that’s kind of new in its badness compared to whatever other tech companies do is the extent to which Apple doesn’t let you treat your device as your device and like tries to block what you can do with it.

Like when Apple pressured Tumblr into blocking certain content on their site and just didn’t do it. And because their App Store just refused to accept the Tumblr app. That’s novel. The fact that Apple uses the fact that your device is locked to them and like you can’t sideload apps. That’s been an Apple thing for a while but what’s new and pernicious is that Apple is using that to really control what users can do with their device and kind of just restricting. It’s a threat to software freedom that’s new. Like with a classic proprietary OS like Windows like you can put whatever software you want on it.

Apple not only prevents you from putting your own software on it but is now using that power to kind of just dictate what you can do with your device and like what activities Apple finds acceptable. And that’s really bad. And it’s not saying I want to spread throughout the software world. I’m scrolling over to the next post now and seeing if I can find any posts that look interesting.