Extending Context Window of Large Language Models via Positional Interpolation

PDF 1 year ago 733.61 kB

Copy Link

PDF 1 year ago 11.81 MB

Copy Link

PDF 1 year ago 106.58 kB

Copy Link

PDF 1 year ago 248.19 kB

Copy Link

PDF 1 year ago 472.23 kB

Copy Link

PDF 1 year ago 1.82 MB

Copy Link

PDF 1 year ago 1.72 MB

Copy Link

PDF 1 year ago 709.54 kB

Copy Link

PDF 1 year ago 500.21 kB

Copy Link

PDF 1 year ago 13.03 MB

Copy Link

PDF 1 year ago 3.45 MB

Copy Link

PDF 1 year ago 572.58 kB

Copy Link

PDF 1 year ago 206.70 kB

Copy Link

PDF 1 year ago 1.91 MB

Copy Link

PDF 1 year ago 2.24 MB

Copy Link

PDF 1 year ago 185.34 kB

Copy Link

PDF 1 year ago Get Size

Copy Link

PDF 1 year ago 884.70 kB

Copy Link

PDF 1 year ago 741.24 kB

Copy Link

PDF 1 year ago 321.81 kB

Copy Link

Extending Context Window of Large Language Models via Positional Interpolation_ arxiv2306.15595.pdf

Parent Folder

Extending Context Window of Large Language Models via Positional Interpolation_ arxiv2306.15595.pdf

Efficient streaming language models with attention sinks_ arxiv2309.17453.pdf

GLU Variants Improve Transformer_arxiv2002.05202.pdf

GQA_ Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints_ arxiv2305.13245.pdf

Climbing towards Natural Language Understanding_ On Meaning Form and Understanding in the Age of Data_ Emily M Bender- Alexander Koller_ 2020.pdf

Are Emergent Abilities of Large Language Models a Mirage_ arxiv2304.15004.pdf

An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technology_ M Reza Mahmoodi_ Dmitri Strukov_ 2018.pdf

LLaMA_ Open and Efficient Foundation Language Models_ arxiv2302.13971.pdf

Landmark Attention_ Random-Access Infinite Context Length for Transformers_ arxiv2305.16300.pdf

Llama 2_ Open Foundation and Fine-Tuned Chat Models_ arxiv2307.09288.pdf

Photonic Matrix Computing_ From Fundamentals to Applications_ Junwei Cheng_ Hailong Zhou_ Jianji Dong_ Nanomaterials 2021.pdf.pdf

RoFormer_ Enhanced Transformer with Rotary Position Embedding_ arxiv2104.09864v4.pdf

SentencePiece_ A simple and language independent subword tokenizer and detokenizer for Neural Text Processing_ arxiv1808.06226.pdf

Stay on topic with Classifier-Free Guidance_ arxiv2306.17806.pdf

The Curse of Recursion_ Training on Generated Data Makes Models Forget_ arxiv2305.17493.pdf

The Poison of Alignment_ arxiv2308.13449.pdf

The Transformer Model in Equations_ John Thickstun_ 2023.pdf

The case for 4-bit precision_ k-bit Inference Scaling Laws_ arxiv2212.09720.pdf

Train Short, Test Long_ Attention with Linear Biases Enables Input Length Extrapolation_arxiv2108.12409.pdf

Unigram Algorithm_ Subword Regularization_ Improving Neural Network Translation Models with Multiple Subword Candidates_ arxiv1804.10959.pdf

Join FilePursuit on chat for discussions and more information.