Towards Trustworthy AI with Decentralization Technologies – Part 1: Concepts and Approaches12/2/2024 By Pete Harris, Co-Founder and Executive Director, DecentraTech Collective This post originally appeared on the Lighthouse Beacon blog. Lighthouse Partners, Inc. works with innovators to create go-to-market programs that leverage advocacy, thought leadership and community building. In particular, Lighthouse is focused on advancing transformational technologies, including to support business decentralization. Artificial intelligence (AI) is now pretty much ubiquitous. With ChatGPT having beaten the Turing Test, AI is being increasingly leveraged for both business and consumer applications across finance, healthcare, education, entertainment, e-commerce, logistics and government. It’s already used in self-driving cars and could one day find itself assisting pilots on aircraft flight decks, or in autonomous aircraft. Moreover, AI built into combat drones is already a reality in Ukraine, and its adoption for many other military uses is surely not far off. How long, one wonders, before it’s deployed for nuclear missile defense systems, as depicted in the 1983 movie Wargames? Given its widespread use, and potential catastrophic scenarios for its misuse, much attention is being directed to governing and managing AI, to ensure that it operates in an ethical, responsible, and trustworthy manner. Ethical and Responsible AI … Built on Trust Ethical AI and Responsible AI are strategic imperatives that refer to the principles and procedures that set a vision and guide the development and use of AI to ensure it is beneficial to society and that it respects human values, such as fairness, accountability and being mindful of privacy and security. Trustworthy AI has a more focused and tactical mission and generally has more immediate and direct impact on companies implementing AI and their customers, who increasingly want to know more about their service providers, and especially how they are making use of their personal information. Trustworthy AI is generally considered to be a prerequisite for and a building block of ethical and responsible AI. Drilling down, for AI models to be trusted, they need to be accurate, reliable, and transparent in their decision making. They not only need to make the right decisions based on data inputs, but they need to demonstrate that the decisions are indeed correct. Many facets of AI model design and training need to be considered to create trustworthy algorithms, including legal, organizational, procedural and technology aspects. By their very nature. Decentralized AI (DeAI) architectures are generally positioned as open source and transparent – and so more trusted. Centralized models from IT heavyweights are often presented as “black box” offerings, which are marketed on their accuracy, flexibility, and ease of use credentials. Such services generally draw on investments and exclusive licensing arrangements – such as Microsoft’s relationship with OpenAI, and Amazon’s deal with Anthropic – and benefit from cloud services that feature significant GPU server power. But the downside of these AI black boxes is that users cannot inspect models or track data flows, and so cannot tell whether bias or tampering of results has taken place. For some users, the perception that models are vulnerable might cause them to look at alternatives. It's All About Data But given that the effectiveness of AI models is dependent on the data that is used to teach them, and the data presented to them at inference time, then the data management and integrity aspects of AI is a hot topic. Another hot topic is privacy, which might seem impossible to achieve given AI’s need for accurate data from sources that don’t want to provide it. Decentralized AI architectures, and decentralization technologies (DecentraTech) – which are built on blockchain platforms, cryptographic primitives, and token incentive models – can be leveraged to address several trust, data, and privacy related aspects of AI. Considerations for creating trustworthy AI include: Data Availability and Privacy – adopting a decentralized and federated approach to AI datasets allows model code to work only on subsets of data that might be stored behind a corporate firewall. Just the outputs of the AI model are exposed to the outside world, and not the raw input data. These outputs can be aggregated so that an aggregated output can be produced that encompasses all the datasets as inputs, including those that remain private. Beyond federation, a set of cryptographic techniques, known as Privacy Enhancing Technologies (PETs) can allow data elements to be included in AI model processing without exposing the values of the data. PETs include zero-knowledge proofs, multi-party computation and homomorphic encryption. While PETs are considered cutting edge and are not yet widely adopted in production environments, one can expect to see their increasing real-life rollout in the next year. Data Monetization – it is common for incentive models – often leveraging tokenization approaches – to be leveraged to attach a monetary value to private datasets, and so make it more likely that their owners will want to share them in a privacy preserving way. Data marketplaces are emerging that make it easier for model providers to discover and integrate diverse datasets to power their models. Traceability and Provenance – by cryptographically signing individual data elements and models and including them in an audit trail allows a robust record of provenance to be created. The audit trail – using (virtually) tamper-proof blockchain technology – would include not both data inputs but also outputs, and which models processed them. Subsequent analysis of this provenance record stream allows both data and models to be traced from their source until the AI outputs are presented to users. Improving Trust for Model Providers and Users Leveraging a combination of decentralization technologies and techniques as outlined above, providers and users of AI models can benefit and take comfort from increased trust profiles. To summarize: Because AI models can learn from datasets that are otherwise private, the accuracy of models tends to be improved, which in turn feeds into better trust outcomes. Provenance of data and models also underpins transparency of AI models, including determining how data inputs have been processed by models. Knowledge of provenance processes underpins increased trust. Determination of whether data inputs might be subject to bias and how this has been addressed/neutralized also is a key input to determination of trust in models. Building trustworthy AI models requires providers to understand all the data sourcing and processing concepts and issues outlined in this blog. Leveraging available decentralization and DeAI platforms and tools can also accelerate their creation – and some of those will be covered in Part 2.
0 Comments
How DecentraTech Can Reduce Costs for Artificial Intelligence – Part 2: Decentralized Storage11/25/2024 By Pete Harris, Co-Founder and Executive Director, DecentraTech Collective In the last blog, I discussed the considerable compute needs of artificial intelligence (AI) applications and how Decentralized Physical Infrastructure Networks (DePIN) can offer a solution. As it happens, DePIN can also help address another infrastructure bottleneck that often limits AI – access to and storage of vast amounts of trustworthy data. For generative AI models to produce accurate results – an essential capability to implement responsible and trustworthy AI requirements – they need to be trained. This activity teaches models how to perform a particular task and so they can identify patterns in data that is presented to them when run in real time. This training typically requires high quality data, and lots of it. And storage of that data can be challenging. Even though physical storage costs are continually falling, building storage infrastructure that is continuously available, high performance and secure is a significant investment. Moreover, given that AI models will likely want to have access to increasing quantities of data as they are developed, ongoing costs will almost certainly increase over time. Adopting a DePIN approach to address AI data storage capacity has the potential to accelerate scale up while reducing costs. As with AI compute, DePIN would tap into infrastructure that is provided by many entities, including startups, small/medium enterprises, communities, and individuals, which would participate in the storage pool in return for cryptocurrency-based rewards. In addition to reduced build out time and costs, the decentralized architecture of DePIN can offer other benefits compared to traditional centralized data centers, including tamper and censorship resistance (by storing hashes of data to determine whether it has been modified), improved resilience (by replicating data blocks across network nodes), and increased performance. Reduced power consumption is another likely benefit. Compared to other #DecentraTech projects, including DePIN compute farms, DePIN data storage is already well developed and established, with several open source and commercial offerings available and significant production use cases to learn from. Examples include Arweave, Codex, DeNet, Filecoin, and Storj. A number of decentralized storage offerings are based on a set of open-source Distributed Hash Table protocols known as the Interplanetary File System (IPFS). The project was started in 2014 by Juan Benet and his company Protocol Labs and the first (generally considered) useable implementation of the IPFS protocols was released as Kubo in April 2016. Both IPFS and Kubo have since been updated, and Kubo is cited by the project as the most popular implementation of IPFS in use today. Another implementation of IPFS that has been widely adopted for real world applications by businesses, the scientific community, activist groups, and socially oriented nonprofits is Filecoin. Launched in 2017 and now governed by the Filecoin Foundation, Filecoin adds a “Proof of Storage” incentive function to the core IPFS protocols to reward those providing the physical storage to the network. As of August 2024, the total capacity of the Filecoin network was 23 exbibytes across some 40 countries, with 2 exbibytes of user data being stored by around 2,000 entities (one exbibyte = 1,152,921,504,606,846,976 bytes, roughly equivalent to 1 quadrillion pages of plain text type). Some of the higher profile users of Filecoin include NASA, the US Geological Survey and the National Institutes of Health. More than 500 entities have datasets of more than 1,000 tebibytes, while the Internet Archive’s Democracy Library stores more than a pebibyte of open government data. Proponents of Filecoin cite its ability to store vast quantities of data, provide proof that it has been stored securely and not altered, and its decentralized architecture that makes it resistant to tampering, as ideal attribute for responsible AI workloads, where an audit trail of data inputs and models provides provenance and promotes trust in the outputs of models. Recognizing that it offers such audibility and security benefits, Filecoin has recently announced several partnerships related to expanding its use for AI applications. For example, SingularityNET is focusing on securing metadata for verifiable model training, while Eternal AI and EQTY Lab is tapping Filecoin to validate model lineage. How DecentraTech Can Reduce Costs for Artificial Intelligence – Part 1: Decentralized Compute11/4/2024 By Pete Harris, Co-Founder and Executive Director, DecentraTech Collective It’s no secret that the popularity of artificial intelligence (AI) has exploded for both business and consumer applications. Along with that trend, public cloud platforms from Amazon Web Services, Microsoft, and Google have upgraded with the latest GPU chips to power AI apps.
While not the only enabler of success, the winners in the world of AI will likely be the companies that can amass substantial compute power to underpin their services.But for now there are a couple of issues with that endeavor. Firstly, the demand for the most powerful GPU chips is outstripping supply. And those chips are pricey. For example, the most powerful H100 chips from GPU leader Nvidia sell for around $40,000, while renting time on a GPU at a cloud provider costs about 100 bucks an hour. No wonder then that some AI startups are raising billions of dollars in funding simply to pay for the compute infrastructure that they will need to operate. One example is Coreweave, a specialist AI cloud provider, which recently closed on $7.5 billion in debt financing to double its datacenter capacity. Meanwhile, leading crypto VC firm Andreessen Horowitz is building GPU server clusters for companies to use. It expects to host some 20,000 GPUs as part of this initiative, known as oxygen, which it hopes to use as a competitive tool to lure startups to its portfolio. Other companies, including Elon Musk's xAI and Meta are rolling out AI clusters with 100,000 GPUs. Clearly, not many startups (or even enterprises for that matter) have the funding to be able to throw masses of GPUs at their AI endeavors. Which is where DecentraTech approaches – specifically leveraging Decentralized Physical Infrastructure Networks (or DePIN) – might present a path forward. Unlike traditional, centralized datacenters, which are typically built by single companies, such as Microsoft (which by the way expects to spend $50 billion on new datacenters for AI), DePINs leverage blockchain technology to decentralize control, ownership and the cost of building and maintaining physical infrastructure. In a DePIN model, this infrastructure is provided by large numbers of entities, including startups, small/medium enterprises, communities and individuals which make it available (often part of the time or as a background task when it is underutilized for its primary use) to the DePIN operator. For compute services, this generally requires the providers to install or run a software process on their workstation or server to register and manage their participation. In return, the providers are rewarded, generally in an operator-specific cryptocurrency. Note: for more on DePINs in general, see this blog from Multicoin Capital. Assuming enough providers can be appropriately incentivized to offer their infrastructure to the operator, and the DePIN design offers easy integration for providers and standards-based access for end-user applications, the DePIN approach can provide massive quantities of compute power, at a fraction of the cost of traditional cloud services. While DePIN compute offerings began offering generic CPU power, the rise of AI applications has led a number of DePIN operators to offer GPUs as well, while others specialize in GPU compute specifically for AI. See below for some examples of AI-oriented DePIN operators. Akash – established way back on 2015 and with plenty of experience in decentralized compute, last year it began to roll out various flavors of GPU. Its AKT token is built on Cosmos, a blockchain ecosystem with a mission to establish the 'Internet of Blockchains' by enabling secure communication and interoperability amongst various blockchains. Influx Technologies (Flux) – a decentralized infrastructure provider that began life in 2020. It’s recent FluxEdge offering provides access to a range of GPUs and is targeted at AI applications. GAIMIN – is actually positioned as a gaming network that rewards its users to play games in return for tapping into the GPUs in their PCs to power applications, including for AI. IO Research – a decentralized GPU network aligned to the Solana blockchain, originally focused on financial trading but which refocused on the AI space. The Render Network – focused on rendering of 3D graphics for the entertainment industry, Render allows participants in its GPU network to make available unused compute for AI applications. Decentralized compute is not the only way that DePINs can reduce costs for AI. As well as compute power, AI models generally need to have access to large volumes of data for training. So DePIN for decentralized storage of that data is potentially of interest too. That technology will be covered in Part 2 of this blog.
By Pete Harris, Co-Founder and Executive Director, DecentraTech Collective
Many in the blockchain and MedTech world have fond memories of this day a year ago, when the Blockchain and Digital Transformation in Health 2020 symposium was held in Austin, TX. One of the conversations among the 100+ participants that gathered that day was the breaking news that the first cases of coronavirus had been detected in the U.S. A question that many had was what the impact here would be. Tragically, we now have at least a partial answer to that one.
The symposium – a unique collaboration between the Austin Blockchain Collective and Dell Medical School – featured keynotes from UCSD Health and IBM and combined presentations and panels from commercial innovators and academia. As such, it acted as a catalyst for a number of important conversations and partnerships that have continued since.
In particular, the symposium addressed several key themes that have risen in visibility and importance over the past year. Notable topics included:
While hosting a follow up in-person symposium in 2021 is unlikely to happen due to the ongoing pandemic, we at the DecentraTech Collective continue to maintain a focus on the decentralization of healthcare and look forward to running online webinars and video conversations and will continue to engage with the medical and healthcare community. See you in person in 2022! By Pete Harris, Co-Founder and Executive Director, DecentraTech Collective DecentraTech is the short notation for decentralization technology. The business world already commonly uses terms like FinTech to mean financial technology, or MedTech for medical technology, or MarTech for marketing technology. Thus, adopting DecentraTech to refer to a collection of technologies that allows decentralization to be adopted by businesses is both consistent and long overdue. To step back for a moment to consult the Miriam-Webster dictionary, decentralization is “the process by which the activities of an organization, particularly those regarding planning and decision making, are distributed or delegated away from a central, authoritative location or group.” The concept of decentralization as an organizational approach is not at all new, having originated in France 200 years ago. Technological decentralization – which in current times is a requirement of most manifestations of decentralization – became a possibility with the creation of the internet, conceived as an open network that anyone can access, based on common standards and with no central governance. Ironically, it is now increasingly accepted that the most popular commercial services delivered via the internet, especially the world wide web, are products of centralized corporations. And many consider that some of those players – including “Big Tech” companies such as Google, Facebook and Twitter – now wield too much control over service provision and personal data ownership for the public good. One of the many with concerns about the power that Big Tech has amassed is Tim Berners-Lee, who created the web in 1990. He is now working on Solid, an open-source effort to reinvent the web so as to allow individuals to control their personal data. More broadly, Web 3.0, a term coined by Ethereum co-founder Gavin Wood, who went on to form the Web3 Foundation, is focused on creating “a decentralized and fair internet where users control their own data, identity and destiny.” As a result of the recent actions of a few high profile companies – including Twitter cancelling the account of the then-sitting POTUS, and online financial app Robinhood suspending retail investor trading in certain wildly popular stocks – calls for the establishment of decentralized services that are censorship-resistant have grown loud. As it happens, Twitter – or at least its CEO Jack Dorsey – has been pondering the responsibilities that come with operating a forum for open discourse in an era where some use it to spread misinformation and hate speech that can have public safety consequences. Following Twitter’s recent decision to suspend and remove a number of accounts, Dorsey wrote (on Twitter) that “I feel a ban is a failure of ours ultimately to promote healthy conversation.” In the future, Twitter might be able to sidestep similar actions by becoming a decentralized service where (presumably) moderation processes would be under the jurisdiction of the crowd. At the end of 2019, it kicked off the bluesky project, which Dorsey characterizes as “an initiative around an open decentralized standard for social media. Our goal is to be a client of that standard for the public conversation layer of the internet.” A few weeks ago, the project released a research paper covering a number of existing decentralized social media ecosystems and noted that it is eager to hear about other projects. “Our DMs are open!” By Pete Harris, Executive Director, Austin Blockchain Collective Hyperledger, the Linux Foundation’s blockchain initiative, just celebrated its fifth birthday. Initially known as the Hyperledger Project, it originally focused on a single enterprise blockchain development – a private/permissioned platform dubbed Fabric. Its early members were a couple of dozen major corporates, with IBM as a key champion, influencer and code contributor. Five years on, Hyperledger has grown substantially. It now has more than 200 members big and small and comprises 16 core projects, including competing ledgers, development tools, identity frameworks, etc. Until recently, however, its focus remained on the private blockchain space. That changed in 2019 with the announcement of Hyperledger Besu, an Ethereum client built by ConsenSys, which supports both private and public chains. According to Hyperledger’s annual report for that year, “Besu represents the growing interest of enterprises to build both permissioned and public network use cases for their applications.” Another report in 2019 – commissioned by consultants EY and analyst firm Forrester Research – found that some 75% of enterprises were interested in exploring public blockchains. As those enterprises embark on their public blockchain research, they will find more than a few variants to choose from. Ethereum is by far the most established, and with its ETH 2.0 scalability upgrade now begun, it will surely remain the leader for some time to come. But many other public chains are now launched and being developed, including the Web3 Foundation’s Polkadot, Cosmos, Cardano, DigiByte, Decred, EOS, NEAR, Solana, NEM’s Symbol, VeChain, etc. With so much innovation focused on (and investment in) the public blockchain space, and relatively little on private variants, bets against enterprise adoption of public chains would attract very long odds indeed. Scalability aside, perhaps the most common criticism of public chains is that they are not relevant for – indeed not usable by – enterprises that need to keep their vast data assets private for competitive or regulatory reasons. It’s a valid POV, but this year the Baseline Protocol has emerged to counter it. Baseline is going to be very useful for businesses. And it’s going to be huge. By Pete Harris, Executive Director, Austin Blockchain Collective I’ve known for a long time that healthcare relies on data in order to operate (no pun intended) effectively, safely and efficiently. It’s also been apparent that current processes to capture and share data are pretty much broken — one example that most patients can relate to are badly designed paper forms, usually partially completed with bad handwriting, and then faxed between providers. Some of it might end up being manually entered into an EHR system, mistakes and all, which of course can’t communicate digitally with similar systems at other providers.
All that paper, manual processing and silo-ed databases will need to change, and not just to make healthcare delivery more efficient and less costly for payers and providers, and to offer better outcomes for patients. Here’s why … By Pete Harris, Executive Director, Austin Blockchain Collective
It’s too late for 2020 predictions so here is one for the decade ahead … read this blog to the very end … Web 3.0 describes the next phase of the internet where control and privacy, and hence power, will be restored back to individuals and away from big tech companies that millions of people and businesses have come to rely on (to their benefit for sure, but at potentially great cost). Blockchain technology — specifically public blockchain — is a key enabler of Web 3.0. So too is decentralized storage and query mechanisms, decentralized telecommunications, tools to build decentralized applications. And integration gateways with other public blockchains, private blockchains and non-blockchain applications and databases. The Web 3.0 term was coined in 2014 by Gavin Wood, a Brit computer scientist who was the first CTO of Ethereum. He went on to found Parity Technologies and the Web3 Foundation, which is now rolling out its first public blockchain projects, known as Substrate and Polkadot. |
About the CURATORPete Harris is the Co-Founder and Executive Director of the DecentraTech Collective. He is also Principal of Lighthouse Partners, which provides business strategy services to developers of transformational technologies. He has 40+ years of business and technology experience, focusing in recent years on business applications of blockchain and Web3 technologies. Curated AND FOCUSED CONTENTThe Collective recognizes that there is a wealth of information available from publications, newsletters, blogs, and more. So we are curating 'must read' content in specific areas to help our community to cut through the noise and focus only on what’s important. Archives
November 2024
|