sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

710
active users

#LSTM

0 posts0 participants0 posts today

xLSTM is a combination of transformer technology and long-term memory. The result is an architecture that performs better in terms of performance and scalability than the transformers currently in use, the researchers write.

arxiv.org/abs/2405.04517

arXiv.orgxLSTM: Extended Long Short-Term MemoryIn the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.

Das war eine sehr unterhaltsame Recherche, bei der ich Sepp Hochreiter getroffen habe - ein Pionier des maschinellen Lernens, der mit seiner alten Idee (#lstm) jetzt OpenAi „vom Markt fegen“ will.

Ob dieser alte Algorithmus wirklich das Zeug dazu hat, große Sprachmodelle zu revolutionieren, kann ich schwer einschätzen. Was mir aber immer klarer wurde in letzter Zeit: Transformermodelle sind an ihrer Grenze. Von daher wird sich was bewegen müssen.

zeit.de/digital/2024-05/sepp-h

ZEIT ONLINESepp Hochreiter: Sepp strikes backSepp Hochreiter hat in den Neunzigerjahren die künstliche Intelligenz revolutioniert. Dann kamen andere. Jetzt will er wieder angreifen und ein besseres ChatGPT bauen.

1997 with the advent of Long Short-Term Memory recurrent marks the subsequent step in our brief history of )large) from last week's lecture. Introduced by Sepp Hochreiter and Jürgen Schmidhuber enabled efficient processing of sequences of data.
Slides: drive.google.com/file/d/1atNvM
@fizise

"Predicting Stock Price Changes Based on the Limit Order Book: A Survey"

A comprehensive survey of the latest papers in this exciting new field.

mdpi.com/2227-7390/10/8/1234/h

MDPIPredicting Stock Price Changes Based on the Limit Order Book: A SurveyThis survey starts with a general overview of the strategies for stock price change predictions based on market data and in particular Limit Order Book (LOB) data. The main discussion is devoted to the systematic analysis, comparison, and critical evaluation of the state-of-the-art studies in the research area of stock price movement predictions based on LOB data. LOB and Order Flow data are two of the most valuable information sources available to traders on the stock markets. Academic researchers are actively exploring the application of different quantitative methods and algorithms for this type of data to predict stock price movements. With the advancements in machine learning and subsequently in deep learning, the complexity and computational intensity of these models was growing, as well as the claimed predictive power. Some researchers claim accuracy of stock price movement prediction well in excess of 80%. These models are now commonly employed by automated market-making programs to set bids and ask quotes. If these results were also applicable to arbitrage trading strategies, then those algorithms could make a fortune for their developers. Thus, the open question is whether these results could be used to generate buy and sell signals that could be exploited with active trading. Therefore, this survey paper is intended to answer this question by reviewing these results and scrutinising their reliability. The ultimate conclusion from this analysis is that although considerable progress was achieved in this direction, even the state-of-art models can not guarantee a consistent profit in active trading. Taking this into account several suggestions for future research in this area were formulated along the three dimensions: input data, model’s architecture, and experimental setup. In particular, from the input data perspective, it is critical that the dataset is properly processed, up-to-date, and its size is sufficient for the particular model training. From the model architecture perspective, even though deep learning models are demonstrating a stronger performance than classical models, they are also more prone to over-fitting. To avoid over-fitting it is suggested to optimize the feature space, as well as a number of layers and neurons, and apply dropout functionality. The over-fitting problem can be also addressed by optimising the experimental setup in several ways: Introducing the early stopping mechanism; Saving the best weights of the model achieved during the training; Testing the model on the out-of-sample data, which should be separated from the validation and training samples. Finally, it is suggested to always conduct the trading simulation under realistic market conditions considering transactions costs, bid–ask spreads, and market impact.