工具大全
developer2026年3月18日148 次阅读约 5 分钟阅读

Regular Expressions for Beginners: 10 Practical Examples You'll Actually Use

Introduction

The first time most developers encounter a regular expression, the reaction is the same: what is this line noise? A pattern like ^[\w.-]+@[\w-]+\.[a-zA-Z]{2,}$ looks more like a cat walked across the keyboard than a coherent instruction.

The reality is that regex has a small, learnable core. Once you internalize about 20 syntax elements, you can read and write patterns confidently. And the payoff is enormous: regex is supported in every major programming language, in VS Code, Vim, grep, sed, SQL LIKE patterns, spreadsheet formulas, and even browser DevTools. It is the closest thing to a universal text-processing superpower.

This guide skips the theory and focuses on 10 practical patterns with full explanations, test strings, and notes on language differences.


Quick Reference: Core Syntax

Character Classes

Syntax Matches
. Any character except newline
\d Any digit (0–9)
\D Any non-digit
\w Word character (a–z, A–Z, 0–9, underscore)
\W Non-word character
\s Whitespace (space, tab, newline)
\S Non-whitespace
[abc] Any of: a, b, or c
[^abc] Anything except a, b, or c
[a-z] Any lowercase letter
[a-zA-Z0-9] Alphanumeric

Quantifiers

Syntax Meaning
* Zero or more
+ One or more
? Zero or one (optional)
{n} Exactly n times
{n,m} Between n and m times
{n,} n or more times

Anchors and Boundaries

Syntax Meaning
^ Start of string (or line with multiline flag)
$ End of string (or line with multiline flag)
\b Word boundary
\B Non-word boundary

Groups and Alternation

Syntax Meaning
(abc) Capturing group
(?:abc) Non-capturing group
(?<name>abc) Named capturing group
a|b Alternation: a or b

10 Practical Regex Examples

1. Email Validation

Pattern: ^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$

Explanation:

  • ^[\w.+-]+ — one or more word chars, dots, plus, or hyphen (local part)
  • @ — literal @ symbol
  • [\w-]+ — domain name (alphanumeric and hyphens)
  • \. — literal dot
  • [a-zA-Z]{2,}$ — TLD of 2 or more letters

Matches: alice@example.com, user.name+filter@sub.domain.org Does not match: @example.com, user@, user@.com

Note: True RFC 5322 email validation is extraordinarily complex. This pattern catches 99% of real-world cases without false positives.


2. URL Matching (HTTP/HTTPS)

Pattern: https?://[\w.-]+(?:\.[a-zA-Z]{2,})(?:/[\w./?=%&-]*)?

Explanation:

  • https?:// — http or https (the s? makes the s optional)
  • [\w.-]+ — domain name characters
  • (?:\.[a-zA-Z]{2,}) — TLD (non-capturing group)
  • (?:/[\w./?=%&-]*)? — optional path, query string

Matches: https://example.com, http://sub.domain.org/path?q=1&p=2


3. US Phone Numbers

Pattern: (?:\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}

Matches: (555) 867-5309, 555.867.5309, +1-555-867-5309, 5558675309

This pattern accommodates the many formats Americans actually use: parentheses around area code, hyphens, dots, spaces, and international prefix.


4. Replace Multiple Spaces with Single Space

Pattern: \s+
Replace with: (single space)

Example:

const cleaned = "Hello    world   from  regex".replace(/\s+/g, ' ');
// Result: "Hello world from regex"

\s+ matches one or more whitespace characters (spaces, tabs, newlines). The g flag makes the replacement global across the entire string.


5. Extract All Numbers from Text

Pattern: -?\d+(?:\.\d+)?

Explanation:

  • -? — optional negative sign
  • \d+ — one or more digits
  • (?:\.\d+)? — optional decimal portion

Example:

import re
text = "The temperature is -12.5°C and altitude 3500m"
numbers = re.findall(r'-?\d+(?:\.\d+)?', text)
# Result: ['-12.5', '3500']

6. Validate Date Format (YYYY-MM-DD)

Pattern: ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Explanation:

  • \d{4} — four-digit year
  • (0[1-9]|1[0-2]) — month 01–12 (not 00 or 13)
  • (0[1-9]|[12]\d|3[01]) — day 01–31

Matches: 2024-01-15, 2000-12-31 Does not match: 2024-13-01, 2024-1-5, 24-01-15

Note: Regex cannot validate that February 30 is invalid — use date library parsing for semantic validation.


7. Password Strength Check

Pattern: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$

Explanation:

  • (?=.*[a-z]) — lookahead: must contain at least one lowercase letter
  • (?=.*[A-Z]) — lookahead: must contain at least one uppercase letter
  • (?=.*\d) — lookahead: must contain at least one digit
  • .{8,} — minimum 8 characters total

Example:

const isStrong = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/.test(password);

Lookaheads ((?=...)) are zero-width assertions — they check without consuming characters, allowing multiple independent conditions on the same string.


8. Match IPv4 Addresses

Pattern: \b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b

Explanation: Each octet is constrained to 0–255:

  • 25[0-5] — 250–255
  • 2[0-4]\d — 200–249
  • [01]?\d\d? — 0–199

Matches: 192.168.1.1, 0.0.0.0, 255.255.255.255 Does not match: 256.0.0.1, 192.168.1 (incomplete)


9. Extract HTML Tag Contents

Pattern: <(\w+)[^>]*>(.*?)<\/\1>

Explanation:

  • <(\w+) — captures the tag name (group 1)
  • [^>]*> — any tag attributes
  • (.*?) — captures content lazily (group 2)
  • <\/\1> — closing tag matching the captured tag name via backreference

Example: Applied to <h1 class="title">Hello World</h1> — captures h1 and Hello World.


10. Strip HTML Tags

Pattern: <[^>]+>
Replace with: (empty string)

Example:

import re
html = "<p>Hello <strong>World</strong></p>"
text = re.sub(r'<[^>]+>', '', html)
# Result: "Hello World"

This works for simple cases. It fails on nested angle brackets in attribute values and malformed HTML. See the FAQ below.


Greedy vs Lazy Matching

By default, quantifiers are greedy — they consume as much as possible.

Input:  <div>first</div><div>second</div>
Greedy  (<div>.*</div>):  matches the ENTIRE string (both divs)
Lazy    (<div>.*?</div>): matches <div>first</div> then <div>second</div>

Add ? after any quantifier to make it lazy: *?, +?, {n,m}?.


Regex Across Languages: Key Differences

Feature JavaScript Python Java Go
Syntax /pattern/flags r"pattern" "pattern" `pattern`
Global flag g flag re.findall() Matcher loop FindAllString()
Named groups (?<name>...) (?P<name>...) (?<name>...) (?P<name>...)
Lookahead Yes Yes Yes No
Lookbehind Yes Yes Yes No
Backreferences Yes Yes Yes No

Go's regexp package uses RE2 syntax, which deliberately excludes lookaheads and backreferences to guarantee linear-time matching. This prevents catastrophic backtracking but limits expressiveness.


Common Pitfalls

Catastrophic Backtracking: Patterns like (a+)+ applied to a long string of as followed by a non-matching character can cause exponential time complexity, freezing your application. Avoid nested quantifiers on overlapping character sets.

Regex for HTML Parsing: Do not use regex to parse HTML for structural purposes (navigating the DOM, extracting nested elements). Use a proper HTML parser: BeautifulSoup in Python, cheerio in Node.js, or the browser's built-in DOMParser. Regex is suitable for simple pattern extraction in known, controlled HTML fragments.


Testing Tools

  • regex101.com — the gold standard. Paste a pattern, see matches highlighted in real time, with a detailed explanation of every token. Supports PCRE, Python, JavaScript, Java, Go flavors.
  • RegExr.com — clean UI, great for visual learning, built-in reference sidebar.
  • VS Code built-in search — enables regex in Find & Replace with the .* icon. Great for one-off text transformations in your codebase.

FAQ

Should I use regex to parse HTML? No, not for structural parsing. HTML is not a regular language — it has recursive nesting that regex cannot handle correctly. Use a dedicated HTML parser. Regex is fine for finding specific patterns within known, simple HTML strings (like stripping tags from a blog excerpt), but not for navigating arbitrary HTML documents.

How do I make regex case-insensitive? Most languages support an i flag: /pattern/i in JavaScript, re.IGNORECASE in Python, Pattern.CASE_INSENSITIVE in Java. Go requires (?i) inline: (?i)pattern.

Why does my regex work in one language but not another? Different languages implement different regex "flavors" with varying feature support. The main divisions are PCRE (Perl-Compatible) and RE2. PCRE supports lookaheads, lookbehinds, and backreferences. RE2 (used by Go, and optionally by RE2 libraries in other languages) does not, in exchange for guaranteed linear-time matching. JavaScript, Python, Java, and .NET all use PCRE-compatible flavors with minor differences in named group syntax.


Conclusion

Regular expressions reward investment. The first hour feels like deciphering an alien language. By the third hour, patterns start reading almost naturally. By the time you have written and debugged a dozen of your own, you start seeing text problems differently — noticing when a quick regex would eliminate twenty lines of string manipulation code.

Start with the 10 examples in this guide. Modify them, break them, test edge cases on regex101.com, and read the explanations the tool generates. That tight feedback loop builds intuition faster than any tutorial.

The core syntax is stable across decades and programming languages. It is one of the few skills where learning it once pays dividends everywhere.

相关文章

正则表达式入门完全指南:10 个实用例子从零学会

正则表达式是每个开发者必须掌握的文本处理利器,但语法看起来像乱码让很多人望而却步。本文用最清晰的方式讲解核心语法,并提供 10 个直接可用的实战示例,从邮箱验证到 IP 地址匹配一网打尽。

regex正则表达式+2
developer2026年3月18日4 min
156

公司给了一套 Claude Code,我还留着自己订阅的那份:cooconscc 双入口实践

公司 API 通道走默认 claude,个人订阅走一个叫 cooconscc 的 shell function——开代理、校验 JP 出口 IP、unset 公司变量、跑 claude、退出恢复。这篇讲为什么这么设计,以及值不值。

claude-codeai-工具+3
developer2026年4月21日7 min
755

颜色转换完全指南:HEX、RGB、HSL 与现代 CSS 配色(2026)

前端工程师每天都在做颜色转换。本指南对比 HEX、RGB、HSL、OKLCH 四大主流格式的优劣,给出转换公式与 JavaScript 代码,覆盖 Figma 落地、暗色模式、主题色阶生成、WCAG 对比度等实战场景,并讲解 2026 年现代设计系统的工程化最佳实践。

前端工具颜色转换+3
developer2026年4月20日5 min
509

URL 编码与解码完全指南:开发者必读(2026)

URL 编码(百分号编码)是 Web 基础设施的隐形骨架。本文详解 URL 编码原理、保留字符分类、encodeURI 与 encodeURIComponent 区别、5 大常见踩坑,以及在 OAuth、表单提交等典型场景下的实战技巧。

url-编码web-开发+2
developer2026年4月20日5 min
179